Arxiv Day: Article

Hyperparameter Optimization for Randomized Algorithms: A Case Study for Random Features

Randomized algorithms exploit stochasticity to reduce computational complexity. One important example is random feature regression (RFR) that accelerates Gaussian process regression (GPR). RFR approximates an unknown function with a random neural network whose hidden weights and biases are sampled from a probability distribution. Only the final output layer is fit to data. In randomized algorithms like RFR, the hyperparameters that characterize the sampling distribution greatly impact performance, yet are not directly accessible from samples. This makes optimization of hyperparameters via standard (gradient-based) optimization tools inapplicable. Inspired by Bayesian ideas from GPR, this paper introduces a random objective function that is tailored for hyperparameter tuning of vector-valued random features. The objective is minimized with ensemble Kalman inversion (EKI). EKI is a gradient-free particle-based optimizer that is scalable to high-dimensions and robust to randomness in objective functions. A numerical study showcases the new black-box methodology to learn hyperparameter distributions in several problems that are sensitive to the hyperparameter selection: two global sensitivity analyses, integrating a chaotic dynamical system, and solving a Bayesian inverse problem from atmospheric dynamics. The success of the proposed EKI-based algorithm for RFR suggests its potential for automated optimization of hyperparameters arising in other randomized algorithms.

Updated: 2024-07-19 23:38:10

标题: 随机算法的超参数优化：随机特征案例研究

摘要: 随机算法利用随机性来降低计算复杂性。一个重要的例子是随机特征回归（RFR），它加速了高斯过程回归（GPR）。RFR用一个随机神经网络来近似一个未知函数，其隐藏权重和偏差从一个概率分布中抽样。只有最终的输出层适合数据。在像RFR这样的随机算法中，表征抽样分布的超参数极大地影响性能，但并不直接从样本中获取。这使得通过标准（基于梯度的）优化工具进行超参数优化无法应用。受到GPR的贝叶斯思想的启发，本文介绍了一个针对矢量值随机特征的超参数调整的随机目标函数。这个目标通过集成卡尔曼反演（EKI）进行最小化。EKI是一种无梯度基于粒子的优化器，可扩展到高维度并对目标函数中的随机性具有鲁棒性。数值研究展示了在几个对超参数选择敏感的问题中学习超参数分布的新黑盒方法：两个全局敏感性分析、集成一个混沌动态系统，以及解决大气动力学中的贝叶斯反问题。所提出的基于EKI的RFR算法的成功表明了其在其他随机算法中出现的超参数自动优化的潜力。

更新时间: 2024-07-19 23:38:10

领域: cs.LG,stat.CO,stat.ML

下载: http://arxiv.org/abs/2407.00584v2

Composer's Assistant 2: Interactive Multi-Track MIDI Infilling with Fine-Grained User Control

We introduce Composer's Assistant 2, a system for interactive human-computer composition in the REAPER digital audio workstation. Our work upgrades the Composer's Assistant system (which performs multi-track infilling of symbolic music at the track-measure level) with a wide range of new controls to give users fine-grained control over the system's outputs. Controls introduced in this work include two types of rhythmic conditioning controls, horizontal and vertical note onset density controls, several types of pitch controls, and a rhythmic interest control. We train a T5-like transformer model to implement these controls and to serve as the backbone of our system. With these controls, we achieve a dramatic improvement in objective metrics over the original system. We also study how well our model understands the meaning of our controls, and we conduct a listening study that does not find a significant difference between real music and music composed in a co-creative fashion with our system. We release our complete system, consisting of source code, pretrained models, and REAPER scripts.

Updated: 2024-07-19 23:28:09

标题: 作曲家助手2：具有细粒度用户控制的交互式多轨MIDI填充

摘要: 我们介绍了Composer's Assistant 2，这是一个用于在REAPER数字音频工作站中进行交互式人机合作创作的系统。我们的工作升级了Composer's Assistant系统（该系统在轨道-小节级别上执行符号音乐的多轨道填充），增加了一系列新的控制功能，使用户可以对系统的输出进行精细控制。这项工作引入的控制包括两种类型的节奏调节控制、水平和垂直音符起始密度控制、几种类型的音高控制以及一个节奏兴趣控制。我们训练了一个类似T5的变压器模型来实现这些控制，并作为我们系统的主干。通过这些控制，我们在原始系统的客观指标上取得了显著的改进。我们还研究了我们的模型对控制含义的理解程度，并进行了一个听力研究，结果表明，使用我们系统以协同创作方式创作的音乐与真实音乐之间没有显著差异。我们发布了我们完整的系统，包括源代码、预训练模型和REAPER脚本。

更新时间: 2024-07-19 23:28:09

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2407.14700v1

A Comprehensive Guide to Combining R and Python code for Data Science, Machine Learning and Reinforcement Learning

Python has gained widespread popularity in the fields of machine learning, artificial intelligence, and data engineering due to its effectiveness and extensive libraries. R, on its side, remains a dominant language for statistical analysis and visualization. However, certain libraries have become outdated, limiting their functionality and performance. Users can use Python's advanced machine learning and AI capabilities alongside R's robust statistical packages by combining these two programming languages. This paper explores using R's reticulate package to call Python from R, providing practical examples and highlighting scenarios where this integration enhances productivity and analytical capabilities. With a few hello-world code snippets, we demonstrate how to run Python's scikit-learn, pytorch and OpenAI gym libraries for building Machine Learning, Deep Learning, and Reinforcement Learning projects easily.

Updated: 2024-07-19 23:01:48

标题: 一个全面指南：将R和Python代码结合用于数据科学、机器学习和强化学习

摘要: Python因其高效性和丰富的库在机器学习、人工智能和数据工程领域广受欢迎。而R则在统计分析和可视化方面仍然是一种主导语言。然而，某些库已经过时，限制了它们的功能和性能。用户可以通过结合这两种编程语言来同时使用Python的先进机器学习和人工智能功能以及R的强大统计包。本文探讨了使用R的reticulate包从R调用Python的方法，提供了实际示例，并突出了这种集成增强生产力和分析能力的场景。通过一些简单的代码片段，我们展示了如何轻松地运行Python的scikit-learn、pytorch和OpenAI gym库来构建机器学习、深度学习和强化学习项目。

更新时间: 2024-07-19 23:01:48

领域: cs.LG,cs.PL

下载: http://arxiv.org/abs/2407.14695v1

Guiding and Diversifying LLM-Based Story Generation via Answer Set Programming

Instruction-tuned large language models (LLMs) are capable of generating stories in response to open-ended user requests, but the resulting stories tend to be limited in their diversity. Older, symbolic approaches to story generation (such as planning) can generate substantially more diverse plot outlines, but are limited to producing stories that recombine a fixed set of hand-engineered character action templates. Can we combine the strengths of these approaches while mitigating their weaknesses? We propose to do so by using a higher-level and more abstract symbolic specification of high-level story structure -- implemented via answer set programming (ASP) -- to guide and diversify LLM-based story generation. Via semantic similarity analysis, we demonstrate that our approach produces more diverse stories than an unguided LLM, and via code excerpts, we demonstrate the improved compactness and flexibility of ASP-based outline generation over full-fledged narrative planning.

Updated: 2024-07-19 22:50:46

标题: 通过答案集编程引导和丰富基于LLM的故事生成

摘要: 经过调整的大型语言模型（LLMs）能够生成对开放式用户请求的故事，但所产生的故事在多样性方面往往存在局限性。较旧的符号方法（如规划）可以生成更加多样化的情节大纲，但限于产生重新组合一组固定手工设计的角色行动模板的故事。我们能否结合这些方法的优点，同时减轻它们的弱点？我们提议通过使用更高级和抽象的高级故事结构符号规范--通过答案集编程（ASP）实现--来引导和多样化基于LLM的故事生成。通过语义相似性分析，我们展示了我们的方法产生比未引导的LLM更多样化的故事，并通过代码摘录，我们展示了基于ASP的大纲生成相对于完整叙事规划的改进的紧凑性和灵活性。

更新时间: 2024-07-19 22:50:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.00554v2

Data Poisoning: An Overlooked Threat to Power Grid Resilience

As the complexities of Dynamic Data Driven Applications Systems increase, preserving their resilience becomes more challenging. For instance, maintaining power grid resilience is becoming increasingly complicated due to the growing number of stochastic variables (such as renewable outputs) and extreme weather events that add uncertainty to the grid. Current optimization methods have struggled to accommodate this rise in complexity. This has fueled the growing interest in data-driven methods used to operate the grid, leading to more vulnerability to cyberattacks. One such disruption that is commonly discussed is the adversarial disruption, where the intruder attempts to add a small perturbation to input data in order to "manipulate" the system operation. During the last few years, work on adversarial training and disruptions on the power system has gained popularity. In this paper, we will first review these applications, specifically on the most common types of adversarial disruptions: evasion and poisoning disruptions. Through this review, we highlight the gap between poisoning and evasion research when applied to the power grid. This is due to the underlying assumption that model training is secure, leading to evasion disruptions being the primary type of studied disruption. Finally, we will examine the impacts of data poisoning interventions and showcase how they can endanger power grid resilience.

Updated: 2024-07-19 22:00:52

标题: 数据污染：对电网韧性的一个被忽视的威胁

摘要: 随着动态数据驱动应用系统的复杂性增加，保持其弹性变得更具挑战性。例如，由于随机变量（如可再生能源产出）和极端天气事件的增加，维持电网的弹性变得越来越复杂。当前的优化方法很难适应这种复杂性的增加。这加剧了对用于操作电网的数据驱动方法的兴趣，导致更容易受到网络攻击的影响。其中一种常被讨论的干扰是对抗性干扰，即入侵者试图对输入数据进行微小扰动以“操纵”系统运行。在过去几年里，对对抗性训练和对电力系统的干扰的研究变得越来越受欢迎。本文首先将回顾这些应用，特别是对逃避和毒化干扰这两种最常见的对抗性干扰类型进行研究。通过这个回顾，我们强调了毒化和逃避研究在应用于电网时的差距。这是由于基础假设模型训练是安全的，导致逃避干扰成为主要研究的干扰类型。最后，我们将研究数据毒化干预的影响，并展示它们如何危及电网的弹性。

更新时间: 2024-07-19 22:00:52

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.14684v1

Value Internalization: Learning and Generalizing from Social Reward

Social rewards shape human behavior. During development, a caregiver guides a learner's behavior towards culturally aligned goals and values. How do these behaviors persist and generalize when the caregiver is no longer present, and the learner must continue autonomously? Here, we propose a model of value internalization where social feedback trains an internal social reward (ISR) model that generates internal rewards when social rewards are unavailable. Through empirical simulations, we show that an ISR model prevents agents from unlearning socialized behaviors and enables generalization in out-of-distribution tasks. We characterize the implications of incomplete internalization, akin to "reward hacking" on the ISR. Additionally, we show that our model internalizes prosocial behavior in a multi-agent environment. Our work provides a foundation for understanding how humans acquire and generalize values and offers insights for aligning AI with human values.

Updated: 2024-07-19 21:53:33

标题: 价值内化：从社会奖励中学习和概括

摘要: 社会奖励塑造了人类行为。在发展过程中，照顾者引导学习者的行为朝着文化对齐的目标和价值观发展。当照顾者不再存在时，学习者必须继续自主时，这些行为如何持续并泛化？在这里，我们提出了一个价值内化模型，其中社会反馈训练了一个生成内部奖励的内部社会奖励（ISR）模型，在社会奖励不可用时生成内部奖励。通过实证模拟，我们展示了ISR模型如何防止代理人忘记社会化行为，并在分布外任务中实现泛化。我们对ISR上的不完全内化的影响进行了表征，类似于“奖励黑客”。此外，我们展示了我们的模型如何在多智能体环境中内化亲社会行为。我们的工作为理解人类如何获得和泛化价值提供了基础，并为将人工智能与人类价值观对齐提供了见解。

更新时间: 2024-07-19 21:53:33

领域: cs.LG,cs.AI,cs.MA

下载: http://arxiv.org/abs/2407.14681v1

Compact Language Models via Pruning and Knowledge Distillation

Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-training it with a fraction (<3%) of the original training data can be a suitable alternative to repeated, full retraining. To this end, we develop a set of practical and effective compression best practices for LLMs that combine depth, width, attention and MLP pruning with knowledge distillation-based retraining; we arrive at these best practices through a detailed empirical exploration of pruning strategies for each axis, methods to combine axes, distillation strategies, and search techniques for arriving at optimal compressed architectures. We use this guide to compress the Nemotron-4 family of LLMs by a factor of 2-4x, and compare their performance to similarly-sized models on a variety of language modeling tasks. Deriving 8B and 4B models from an already pretrained 15B model using our approach requires up to 40x fewer training tokens per model compared to training from scratch; this results in compute cost savings of 1.8x for training the full model family (15B, 8B, and 4B). Minitron models exhibit up to a 16% improvement in MMLU scores compared to training from scratch, perform comparably to other community models such as Mistral 7B, Gemma 7B and Llama-3 8B, and outperform state-of-the-art compression techniques from the literature. We have open-sourced Minitron model weights on Huggingface, with corresponding supplementary material including example code available on GitHub.

Updated: 2024-07-19 21:47:57

标题: 通过修剪和知识蒸馏实现紧凑语言模型

摘要: 目前，针对不同规模和大小的大型语言模型（LLMs）通常通过从头开始训练每个变体来生产；这是非常计算密集型的。在本文中，我们研究了是否通过修剪现有的LLM，然后使用原始训练数据的一小部分（<3%）重新训练它，可以作为重复完整重新训练的合适替代方法。为此，我们开发了一组实用和有效的LLM压缩最佳实践，结合了深度、宽度、注意力和MLP修剪以及基于知识蒸馏的重训练；通过对每个轴的修剪策略、轴的组合方法、蒸馏策略和搜索技术进行详细的实证探索，我们得出了这些最佳实践。我们使用这个指南将Nemotron-4系列的LLMs压缩2-4倍，并将它们的性能与类似大小的模型在各种语言建模任务上进行比较。使用我们的方法从已经预训练的15B模型派生8B和4B模型，相比从头开始训练，每个模型需要的训练令牌数量最多减少了40倍；这导致了对训练完整模型系列（15B、8B和4B）的计算成本节约了1.8倍。Minitron模型在MMLU分数上比从头开始训练表现出高达16%的改进，表现与其他社区模型（如Mistral 7B、Gemma 7B和Llama-3 8B）相当，并且优于文献中的最先进的压缩技术。我们已经在Huggingface上开源了Minitron模型权重，并提供了相关的补充材料，包括在GitHub上提供的示例代码。

更新时间: 2024-07-19 21:47:57

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.14679v1

DefTesPY: Cyber defense model with enhanced data modeling and analysis for Tesla company via Python Language

Several types of cyber-attacks on automobiles and business firms keep on rising as we are preparing to counter cybercrimes with several new technologies and defense models. Cyber defense (also, counter intelligence) is a computer network defense mechanism that involves response to activities, critical infrastructure protection, and information assurance for corporations, government bodies, and other conceivable networks. Cyber defense focuses on preventing, detecting, and responding to assaults or threats in a timely manner so that no infrastructure or information is compromised. With the increasing volume and complexity of cyber threats, most companies need cyber defense to protect sensitive information and assets. We can control attacker actions by utilizing firewalls at different levels, an intrusion detection system (IDS), with the intrusion prevention system (IPS) which can be installed independently or in combination with other protection approaches. Tesla is an American clean energy and automotive company in Austin, Texas, USA. The recent data breach at Tesla affected over 75,000 individuals as the company pinpoints two former employees as the offender revealing more than 23,000 internal files from 2015 to 2022. In this work, we will emphasize data modeling and data analysis using cyber defense model and python with a survey of the Tesla company. We have proposed a defense model, DefTesPY, with enhanced data modeling and data analysis based on the encountered cyber-attacks and cybercrimes for Tesla company till date.

Updated: 2024-07-19 21:22:40

标题: DefTesPY：通过Python语言为特斯拉公司提供增强数据建模和分析的网络防御模型

摘要: 随着我们正在准备用几种新技术和防御模式来对抗网络犯罪，对汽车和商业公司进行的几种类型的网络攻击不断增加。网络防御（又称反情报）是一种涉及对活动、关键基础设施保护和信息保障的计算机网络防御机制，适用于公司、政府机构和其他可能的网络。网络防御侧重于及时防止、检测和应对攻击或威胁，以确保没有基础设施或信息被损害。随着网络威胁的数量和复杂性不断增加，大多数公司需要网络防御来保护敏感信息和资产。我们可以通过在不同层级使用防火墙、入侵检测系统（IDS）和入侵防御系统（IPS）来控制攻击者的行动，这些系统可以独立安装或与其他保护方法结合使用。特斯拉是一家位于美国德克萨斯州奥斯汀的清洁能源和汽车公司。最近特斯拉的数据泄露影响了超过75,000人，公司指认两名前员工为罪犯，泄露了2015年至2022年间超过23,000个内部文件。在本研究中，我们将强调使用网络防御模型和Python进行数据建模和数据分析，并对特斯拉公司进行调查。我们提出了一种名为DefTesPY的防御模型，基于截至目前为止特斯拉公司遭遇的网络攻击和网络犯罪，增强了数据建模和数据分析。

更新时间: 2024-07-19 21:22:40

领域: cs.CR

下载: http://arxiv.org/abs/2407.14671v1

Towards a "universal translator" for neural dynamics at single-cell, single-spike resolution

Neuroscience research has made immense progress over the last decade, but our understanding of the brain remains fragmented and piecemeal: the dream of probing an arbitrary brain region and automatically reading out the information encoded in its neural activity remains out of reach. In this work, we build towards a first foundation model for neural spiking data that can solve a diverse set of tasks across multiple brain areas. We introduce a novel self-supervised modeling approach for population activity in which the model alternates between masking out and reconstructing neural activity across different time steps, neurons, and brain regions. To evaluate our approach, we design unsupervised and supervised prediction tasks using the International Brain Laboratory repeated site dataset, which is comprised of Neuropixels recordings targeting the same brain locations across 48 animals and experimental sessions. The prediction tasks include single-neuron and region-level activity prediction, forward prediction, and behavior decoding. We demonstrate that our multi-task-masking (MtM) approach significantly improves the performance of current state-of-the-art population models and enables multi-task learning. We also show that by training on multiple animals, we can improve the generalization ability of the model to unseen animals, paving the way for a foundation model of the brain at single-cell, single-spike resolution.

Updated: 2024-07-19 21:05:28

标题: 朝着在单细胞、单尖峰分辨率下的神经动力学的“通用翻译器”

摘要: 神经科学研究在过去十年取得了巨大进展，但我们对大脑的理解仍然是片段化和零碎的：探究任意大脑区域并自动读取其神经活动中编码的信息的梦想仍然遥不可及。在这项工作中，我们建立了一个用于神经尖峰数据的第一个基础模型，该模型可以解决跨多个脑区的各种任务。我们引入了一种新颖的自监督建模方法，用于对人口活动进行建模，其中模型在不同时间步长、神经元和脑区之间交替屏蔽和重构神经活动。为了评估我们的方法，我们设计了使用国际脑实验室重复位置数据集进行无监督和监督预测任务，该数据集由针对48只动物和实验会话中相同脑部位的Neuropixels记录组成。预测任务包括单神经元和区域级活动预测、向前预测和行为解码。我们展示了我们的多任务屏蔽（MtM）方法显著提高了当前最先进的人口模型的性能，并实现了多任务学习。我们还表明，通过在多个动物上进行训练，我们可以提高模型对未知动物的泛化能力，为在单细胞、单尖峰分辨率下建立大脑基础模型铺平了道路。

更新时间: 2024-07-19 21:05:28

领域: q-bio.NC,cs.LG,cs.NE

下载: http://arxiv.org/abs/2407.14668v1

CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving

As large language models (LLMs) take on complex tasks, their inputs are supplemented with longer contexts that incorporate domain knowledge. Yet using long contexts is challenging, as nothing can be generated until the whole context is processed by the LLM. While the context-processing delay can be reduced by reusing the KV cache of a context across different inputs, fetching the KV cache, which contains large tensors, over the network can cause high extra network delays. CacheGen is a fast context-loading module for LLM systems. First, CacheGen uses a custom tensor encoder, leveraging KV cache's distributional properties to encode a KV cache into more compact bitstream representations with negligible decoding overhead, to save bandwidth usage. Second, CacheGen adapts the compression level of different parts of a KV cache to cope with changes in available bandwidth, in order to maintain low context-loading delay and high generation quality. % When available bandwidth drops, CacheGen may raise the compression level for a part of the context or recompute its KV cache on the fly. We test CacheGen on popular LLMs and datasets. Compared to the recent systems that reuse the KV cache, CacheGen reduces the KV cache size by 3.5-4.3x and the total delay in fetching and processing contexts by 3.2-3.7x with negligible impact on the LLM response quality. Our code is at: https://github.com/UChi-JCL/CacheGen.

Updated: 2024-07-19 21:04:14

标题: CacheGen：用于快速大型语言模型服务的KV缓存压缩和流式处理

摘要: 随着大型语言模型（LLMs）承担复杂任务，它们的输入被补充了包含领域知识的更长上下文。然而，使用长上下文是具有挑战性的，因为在整个上下文被LLM处理之前无法生成任何内容。虽然通过在不同输入之间重复使用上下文的KV缓存可以减少上下文处理延迟，但通过网络获取包含大型张量的KV缓存可能会引起高额外网络延迟。 CacheGen是用于LLM系统的快速上下文加载模块。首先，CacheGen使用自定义张量编码器，利用KV缓存的分布特性将KV缓存编码为更紧凑的比特流表示，几乎没有解码开销，以节省带宽使用。其次，CacheGen调整KV缓存不同部分的压缩级别，以适应可用带宽的变化，以保持低上下文加载延迟和高生成质量。当可用带宽下降时，CacheGen可能会提高部分上下文的压缩级别或在运行时重新计算其KV缓存。我们在流行的LLMs和数据集上测试了CacheGen。与最近重复使用KV缓存的系统相比，CacheGen将KV缓存大小减少了3.5-4.3倍，并将获取和处理上下文的总延迟降低了3.2-3.7倍，对LLM响应质量几乎没有影响。我们的代码位于：https://github.com/UChi-JCL/CacheGen。

更新时间: 2024-07-19 21:04:14

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2310.07240v6

Is $F_1$ Score Suboptimal for Cybersecurity Models? Introducing $C_{score}$, a Cost-Aware Alternative for Model Assessment

The cost of errors related to machine learning classifiers, namely, false positives and false negatives, are not equal and are application dependent. For example, in cybersecurity applications, the cost of not detecting an attack is very different from marking a benign activity as an attack. Various design choices during machine learning model building, such as hyperparameter tuning and model selection, allow a data scientist to trade-off between these two errors. However, most of the commonly used metrics to evaluate model quality, such as $F_1$ score, which is defined in terms of model precision and recall, treat both these errors equally, making it difficult for users to optimize for the actual cost of these errors. In this paper, we propose a new cost-aware metric, $C_{score}$ based on precision and recall that can replace $F_1$ score for model evaluation and selection. It includes a cost ratio that takes into account the differing costs of handling false positives and false negatives. We derive and characterize the new cost metric, and compare it to $F_1$ score. Further, we use this metric for model thresholding for five cybersecurity related datasets for multiple cost ratios. The results show an average cost savings of 49%.

Updated: 2024-07-19 21:01:19

标题: $F_1$得分对于网络安全模型是否不够优化？引入$C_{score}$，一种成本感知的模型评估替代方案

摘要: 机器学习分类器相关的错误成本，即假阳性和假阴性，并不相等，且取决于应用。例如，在网络安全应用中，未能检测到攻击的成本与将良性活动标记为攻击是完全不同的。在构建机器学习模型时的各种设计选择，如超参数调整和模型选择，允许数据科学家在这两种错误之间进行权衡。然而，大多数常用的评估模型质量的指标，如$F_1$分数，该指标是以模型精确度和召回率为基础定义的，将这两种错误视为同等重要，使用户难以优化这些错误的实际成本。在本文中，我们提出了一种新的基于精确度和召回率的成本感知度量，$C_{score}$，可用于替代$F_1$分数进行模型评估和选择。它包括一个考虑处理假阳性和假阴性的不同成本的成本比率。我们推导并表征了新的成本指标，并将其与$F_1$分数进行了比较。此外，我们在五个与网络安全相关的数据集上使用此指标进行模型阈值设置，用于多个成本比率。结果显示平均成本节省了49%。

更新时间: 2024-07-19 21:01:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.14664v1

Relational Composition in Neural Networks: A Survey and Call to Action

Many neural nets appear to represent data as linear combinations of "feature vectors." Algorithms for discovering these vectors have seen impressive recent success. However, we argue that this success is incomplete without an understanding of relational composition: how (or whether) neural nets combine feature vectors to represent more complicated relationships. To facilitate research in this area, this paper offers a guided tour of various relational mechanisms that have been proposed, along with preliminary analysis of how such mechanisms might affect the search for interpretable features. We end with a series of promising areas for empirical research, which may help determine how neural networks represent structured data.

Updated: 2024-07-19 20:50:57

标题: 神经网络中的关系组合：调查和行动呼吁

摘要: 许多神经网络似乎将数据表示为“特征向量”的线性组合。发现这些向量的算法最近取得了令人印象深刻的成功。然而，我们认为这种成功是不完整的，如果不了解关系合成：神经网络如何（或者是否）将特征向量组合起来表示更复杂的关系。为了促进这一领域的研究，本文提供了各种已经提出的关系机制的导览，以及对这些机制可能如何影响可解释特征的搜索的初步分析。最后我们提出了一系列有前景的实证研究领域，这些研究可能有助于确定神经网络如何表示结构化数据。

更新时间: 2024-07-19 20:50:57

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.14662v1

Augment then Smooth: Reconciling Differential Privacy with Certified Robustness

Machine learning models are susceptible to a variety of attacks that can erode trust, including attacks against the privacy of training data, and adversarial examples that jeopardize model accuracy. Differential privacy and certified robustness are effective frameworks for combating these two threats respectively, as they each provide future-proof guarantees. However, we show that standard differentially private model training is insufficient for providing strong certified robustness guarantees. Indeed, combining differential privacy and certified robustness in a single system is non-trivial, leading previous works to introduce complex training schemes that lack flexibility. In this work, we present DP-CERT, a simple and effective method that achieves both privacy and robustness guarantees simultaneously by integrating randomized smoothing into standard differentially private model training. Compared to the leading prior work, DP-CERT gives up to a 2.5% increase in certified accuracy for the same differential privacy guarantee on CIFAR10. Through in-depth persample metric analysis, we find that larger certifiable radii correlate with smaller local Lipschitz constants, and show that DP-CERT effectively reduces Lipschitz constants compared to other differentially private training methods. The code is available at github.com/layer6ailabs/dp-cert.

Updated: 2024-07-19 20:42:41

标题: 增强然后平滑：将差分隐私与认证鲁棒性协调统一

摘要: 机器学习模型容易受到各种攻击，这些攻击可能会破坏信任，包括针对训练数据隐私的攻击，以及危及模型准确性的对抗性示例。差分隐私和认证鲁棒性是有效的框架，用于分别应对这两种威胁，因为它们各自提供了未来的保证。然而，我们发现标准的差分隐私模型训练对于提供强大的认证鲁棒性保证是不足够的。事实上，将差分隐私和认证鲁棒性结合在一个系统中是非平凡的，导致先前的工作引入了缺乏灵活性的复杂训练方案。在这项工作中，我们提出了DP-CERT，这是一种简单而有效的方法，通过将随机平滑集成到标准的差分隐私模型训练中，同时实现隐私和鲁棒性保证。与先前的工作相比，DP-CERT在CIFAR10上提供了高达2.5%的认证准确性增加，而具有相同的差分隐私保证。通过深入的逐样本度量分析，我们发现更大的可证半径与更小的局部Lipschitz常数相关，并且显示与其他差分隐私训练方法相比，DP-CERT有效地减少了Lipschitz常数。代码可在github.com/layer6ailabs/dp-cert获取。

更新时间: 2024-07-19 20:42:41

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2306.08656v2

Data Generation Using Large Language Models for Text Classification: An Empirical Case Study

Using Large Language Models (LLMs) to generate synthetic data for model training has become increasingly popular in recent years. While LLMs are capable of producing realistic training data, the effectiveness of data generation is influenced by various factors, including the choice of prompt, task complexity, and the quality, quantity, and diversity of the generated data. In this work, we focus exclusively on using synthetic data for text classification tasks. Specifically, we use natural language understanding (NLU) models trained on synthetic data to assess the quality of synthetic data from different generation approaches. This work provides an empirical analysis of the impact of these factors and offers recommendations for better data generation practices.

Updated: 2024-07-19 20:37:17

标题: 使用大型语言模型生成数据进行文本分类：一个实证案例研究

摘要: 使用大型语言模型（LLMs）生成合成数据用于模型训练在近年来变得越来越流行。虽然LLMs能够产生逼真的训练数据，但数据生成的有效性受到各种因素的影响，包括提示的选择、任务复杂性以及生成数据的质量、数量和多样性。在这项工作中，我们专注于仅使用合成数据进行文本分类任务。具体来说，我们使用在合成数据上训练的自然语言理解（NLU）模型来评估来自不同生成方法的合成数据的质量。这项工作提供了对这些因素影响的经验分析，并提供了更好的数据生成实践建议。

更新时间: 2024-07-19 20:37:17

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.12813v2

A New Lightweight Hybrid Graph Convolutional Neural Network -- CNN Scheme for Scene Classification using Object Detection Inference

Scene understanding plays an important role in several high-level computer vision applications, such as autonomous vehicles, intelligent video surveillance, or robotics. However, too few solutions have been proposed for indoor/outdoor scene classification to ensure scene context adaptability for computer vision frameworks. We propose the first Lightweight Hybrid Graph Convolutional Neural Network (LH-GCNN)-CNN framework as an add-on to object detection models. The proposed approach uses the output of the CNN object detection model to predict the observed scene type by generating a coherent GCNN representing the semantic and geometric content of the observed scene. This new method, applied to natural scenes, achieves an efficiency of over 90\% for scene classification in a COCO-derived dataset containing a large number of different scenes, while requiring fewer parameters than traditional CNN methods. For the benefit of the scientific community, we will make the source code publicly available: https://github.com/Aymanbegh/Hybrid-GCNN-CNN.

Updated: 2024-07-19 20:34:40

标题: 一种新型轻量级混合图卷积神经网络 -- 用于场景分类的基于对象检测推理的CNN方案

摘要: 场景理解在几个高级计算机视觉应用中起着重要作用，如自动驾驶车辆、智能视频监控或机器人。然而，针对室内/室外场景分类提出的解决方案太少，无法确保计算机视觉框架对场景上下文的适应性。我们提出了第一个轻量级混合图卷积神经网络（LH-GCNN）-CNN框架，作为物体检测模型的附加组件。所提出的方法使用CNN物体检测模型的输出来预测观察到的场景类型，通过生成一个连贯的GCNN来表示观察到的场景的语义和几何内容。这种新方法应用于自然场景，在一个包含大量不同场景的COCO衍生数据集中，实现了超过90%的场景分类效率，同时比传统CNN方法需要更少的参数。为了造福科学界，我们将使源代码公开可用：https://github.com/Aymanbegh/Hybrid-GCNN-CNN。

更新时间: 2024-07-19 20:34:40

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.14658v1

Injectivity of ReLU-layers: Perspectives from Frame Theory

Injectivity is the defining property of a mapping that ensures no information is lost and any input can be perfectly reconstructed from its output. By performing hard thresholding, the ReLU function naturally interferes with this property, making the injectivity analysis of ReLU-layers in neural networks a challenging yet intriguing task that has not yet been fully solved. This article establishes a frame theoretic perspective to approach this problem. The main objective is to develop the most general characterization of the injectivity behavior of ReLU-layers in terms of all three involved ingredients: (i) the weights, (ii) the bias, and (iii) the domain where the data is drawn from. Maintaining a focus on practical applications, we limit our attention to bounded domains and present two methods for numerically approximating a maximal bias for given weights and data domains. These methods provide sufficient conditions for the injectivity of a ReLU-layer on those domains and yield a novel practical methodology for studying the information loss in ReLU layers. Finally, we derive explicit reconstruction formulas based on the duality concept from frame theory.

Updated: 2024-07-19 20:25:58

标题: ReLU层的可逆性：从框架理论的视角看

摘要: 注入性是确保映射不丢失任何信息并且可以完美地从输出中重建任何输入的定义属性。通过执行硬阈值处理，ReLU函数自然干扰了这一属性，使得神经网络中ReLU层的注入分析成为一个具有挑战性但令人着迷的任务，尚未完全解决。本文建立了一个框架理论的视角来解决这个问题。主要目标是以所有三个涉及因素（i）权重、（ii）偏置和（iii）数据来发展ReLU层的注入行为的最一般特征化。保持对实际应用的关注，我们将注意力限制在有界域上，并提出了两种用于数值逼近给定权重和数据域的最大偏置的方法。这些方法为ReLU层在这些域上的注入性提供了充分条件，并产生了一种新颖的实用方法来研究ReLU层中的信息丢失。最后，我们基于框架理论的对偶概念导出了显式的重建公式。

更新时间: 2024-07-19 20:25:58

领域: cs.LG

下载: http://arxiv.org/abs/2406.15856v2

OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning

Offline safe reinforcement learning (RL) aims to train a policy that satisfies constraints using a pre-collected dataset. Most current methods struggle with the mismatch between imperfect demonstrations and the desired safe and rewarding performance. In this paper, we introduce OASIS (cOnditionAl diStributIon Shaping), a new paradigm in offline safe RL designed to overcome these critical limitations. OASIS utilizes a conditional diffusion model to synthesize offline datasets, thus shaping the data distribution toward a beneficial target domain. Our approach makes compliance with safety constraints through effective data utilization and regularization techniques to benefit offline safe RL training. Comprehensive evaluations on public benchmarks and varying datasets showcase OASIS's superiority in benefiting offline safe RL agents to achieve high-reward behavior while satisfying the safety constraints, outperforming established baselines. Furthermore, OASIS exhibits high data efficiency and robustness, making it suitable for real-world applications, particularly in tasks where safety is imperative and high-quality demonstrations are scarce.

Updated: 2024-07-19 20:15:00

标题: OASIS：离线安全强化学习的条件分布塑形

摘要: 离线安全强化学习（RL）旨在使用预先收集的数据集训练一个满足约束条件的策略。大多数当前的方法在不完美的演示和期望的安全和有益性能之间存在不匹配的困难。在本文中，我们介绍了OASIS（条件分布塑形），这是一种新的离线安全RL范式，旨在克服这些关键限制。OASIS利用条件扩散模型来合成离线数据集，从而将数据分布形状朝着有益的目标领域。我们的方法通过有效的数据利用和正则化技术来确保遵守安全约束，从而使离线安全RL训练受益。在公共基准测试和不同数据集上进行的全面评估展示了OASIS在使离线安全RL代理受益以实现高奖励行为并满足安全约束方面的优越性，胜过已建立的基线。此外，OASIS表现出高数据效率和稳健性，使其适用于现实世界应用，特别是在安全至关重要且高质量演示稀缺的任务中。

更新时间: 2024-07-19 20:15:00

领域: cs.LG

下载: http://arxiv.org/abs/2407.14653v1

Improving Representation of High-frequency Components for Medical Foundation Models

Foundation models have recently attracted significant attention for their impressive generalizability across diverse downstream tasks. However, these models are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, the precise representation of such information is crucial due to the inherently intricate anatomical structures, sub-visual features, and complex boundaries involved. Consequently, the limited representation of prevalent foundation models can result in significant performance degradation or even failure in these tasks. To address these challenges, we propose a novel pretraining strategy, named Frequency-advanced Representation Autoencoder (Frepa). Through high-frequency masking and low-frequency perturbation combined with adversarial learning, Frepa encourages the encoder to effectively represent and preserve high-frequency components in the image embeddings. Additionally, we introduce an innovative histogram-equalized image masking strategy, extending the Masked Autoencoder approach beyond ViT to other architectures such as Swin Transformer and convolutional networks. We develop Frepa across nine medical modalities and validate it on 32 downstream tasks for both 2D images and 3D volume data. Without fine-tuning, Frepa can outperform other self-supervised pretraining methods and, in some cases, even surpasses task-specific trained models. This improvement is particularly significant for tasks involving fine-grained details, such as achieving up to a +15% increase in DSC for retina vessel segmentation and a +7% increase in IoU for lung nodule detection. Further experiments quantitatively reveal that Frepa enables superior high-frequency representations and preservation in the embeddings, underscoring its potential for developing more generalized and universal medical image foundation models.

Updated: 2024-07-19 20:05:10

标题: 提高医疗基础模型中高频组分的表示

摘要: 最近，基础模型因其在各种下游任务中的令人印象深刻的泛化能力而受到了显著关注。然而，这些模型被证明在表示高频成分和细粒度细节方面存在很大局限性。在许多医学图像任务中，由于涉及复杂的解剖结构、次观特征和复杂边界，对这些信息的精确表示至关重要。因此，普遍基础模型的有限表示可能导致这些任务的显著性能下降甚至失败。为了解决这些挑战，我们提出了一种新颖的预训练策略，称为高频高级表示自动编码器（Frepa）。通过高频掩蔽和低频扰动结合对抗学习，Frepa鼓励编码器有效地表示和保留图像嵌入中的高频成分。此外，我们引入了一种创新的直方图均衡化图像掩蔽策略，将Masked Autoencoder方法扩展到其他架构，如Swin Transformer和卷积网络。我们在九种医学模态上开发了Frepa，并在32个2D图像和3D体积数据的下游任务上进行了验证。在没有微调的情况下，Frepa可以优于其他自监督预训练方法，并在某些情况下甚至超越特定任务训练的模型。这种改进对涉及细粒度细节的任务尤为显著，例如视网膜血管分割的DSC增加高达+15％，肺结节检测的IoU增加了+7％。进一步的实验证明，Frepa能够在嵌入中实现优越的高频表示和保留，突显了其发展更广泛和通用的医学图像基础模型的潜力。

更新时间: 2024-07-19 20:05:10

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.14651v1

Vision Controlled Sensorized Prosthetic Hand

This paper presents a sensorized vision-enabled prosthetic hand aimed at replicating a natural hand's performance, functionality, appearance, and comfort. The design goal was to create an accessible substitution with a user-friendly interface requiring little to no training. Our mechanical hand uses a camera and embedded processors to perform most of these tasks. The interfaced pressure sensor is used to get pressure feedback and ensure a safe grasp of the object; an accelerometer is used to detect gestures and release the object. Unlike current EMG-based designs, the prototyped hand does not require personalized training. The details of the design, trade-offs, results, and informing the next iteration are presented in this paper.

Updated: 2024-07-19 19:52:02

标题: 视觉控制的传感器化假肢手

摘要: 本文介绍了一款传感器化的视觉启动义肢手，旨在复制自然手的性能、功能、外观和舒适性。设计目标是创造一个易于访问的替代品，具有用户友好的界面，几乎不需要培训。我们的机械手使用摄像头和嵌入式处理器来执行大部分任务。接口压力传感器用于获取压力反馈并确保对物体的安全抓取；加速度计用于检测手势并释放物体。与当前基于肌电图的设计不同，该原型手不需要个性化训练。本文介绍了设计细节、权衡、结果以及指导下一轮迭代的信息。

更新时间: 2024-07-19 19:52:02

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2407.12807v2

There is a Singularity in the Loss Landscape

Despite the widespread adoption of neural networks, their training dynamics remain poorly understood. We show experimentally that as the size of the dataset increases, a point forms where the magnitude of the gradient of the loss becomes unbounded. Gradient descent rapidly brings the network close to this singularity in parameter space, and further training takes place near it. This singularity explains a variety of phenomena recently observed in the Hessian of neural network loss functions, such as training on the edge of stability and the concentration of the gradient in a top subspace. Once the network approaches the singularity, the top subspace contributes little to learning, even though it constitutes the majority of the gradient.

Updated: 2024-07-19 19:49:23

标题: 在损失景观中存在奇异性

摘要: 尽管神经网络被广泛采用，但它们的训练动态仍然知之甚少。我们通过实验证明，随着数据集大小的增加，会形成一个点，使得损失梯度的幅度变得无界。梯度下降迅速将网络带近参数空间中的这个奇点，并进一步训练将在其附近进行。这个奇点解释了最近在神经网络损失函数的Hessian矩阵中观察到的各种现象，如在稳定边缘上训练和梯度在顶部子空间的集中。一旦网络接近奇点，顶部子空间对学习的贡献变得微乎其微，尽管它构成了大部分梯度。

更新时间: 2024-07-19 19:49:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2201.06964v2

Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference

Aligning future system design with the ever-increasing compute needs of large language models (LLMs) is undoubtedly an important problem in today's world. Here, we propose a general performance modeling methodology and workload analysis of distributed LLM training and inference through an analytical framework that accurately considers compute, memory sub-system, network, and various parallelization strategies (model parallel, data parallel, pipeline parallel, and sequence parallel). We validate our performance predictions with published data from literature and relevant industry vendors (e.g., NVIDIA). For distributed training, we investigate the memory footprint of LLMs for different activation re-computation methods, dissect the key factors behind the massive performance gain from A100 to B200 ($\sim$ 35x speed-up closely following NVIDIA's scaling trend), and further run a design space exploration at different technology nodes (12 nm to 1 nm) to study the impact of logic, memory, and network scaling on the performance. For inference, we analyze the compute versus memory boundedness of different operations at a matrix-multiply level for different GPU systems and further explore the impact of DRAM memory technology scaling on inference latency. Utilizing our modeling framework, we reveal the evolution of performance bottlenecks for both LLM training and inference with technology scaling, thus, providing insights to design future systems for LLM training and inference.

Updated: 2024-07-19 19:49:05

标题: 分布式大型语言模型训练和推断的性能建模和工作负载分析

摘要: 将未来系统设计与大型语言模型（LLMs）不断增长的计算需求进行对齐，无疑是当今世界中的一个重要问题。在这里，我们提出了一种通用的性能建模方法和工作负载分析，通过一个准确考虑计算、内存子系统、网络以及各种并行化策略（模型并行、数据并行、流水线并行和序列并行）的分析框架，对分布式LLM训练和推理进行了分析。我们通过文献和相关行业供应商（例如NVIDIA）的发布数据验证了我们的性能预测。对于分布式训练，我们研究了LLMs的内存占用情况，针对不同的激活重新计算方法，剖析了从A100到B200的巨大性能提升背后的关键因素（约35倍的加速，紧随NVIDIA的扩展趋势），并在不同技术节点（12纳米至1纳米）上进行了设计空间探索，研究了逻辑、内存和网络扩展对性能的影响。对于推理部分，我们分析了在不同GPU系统的矩阵乘法水平上不同操作的计算与内存受限性，并进一步探讨了DRAM内存技术扩展对推理延迟的影响。利用我们的建模框架，我们揭示了随着技术扩展，LLM训练和推理的性能瓶颈的演变，从而为LLM训练和推理的未来系统设计提供了见解。

更新时间: 2024-07-19 19:49:05

领域: cs.AR,cs.DC,cs.LG

下载: http://arxiv.org/abs/2407.14645v1

Dynamic Pricing in Securities Lending Market: Application in Revenue Optimization for an Agent Lender Portfolio

Securities lending is an important part of the financial market structure, where agent lenders help long term institutional investors to lend out their securities to short sellers in exchange for a lending fee. Agent lenders within the market seek to optimize revenue by lending out securities at the highest rate possible. Typically, this rate is set by hard-coded business rules or standard supervised machine learning models. These approaches are often difficult to scale and are not adaptive to changing market conditions. Unlike a traditional stock exchange with a centralized limit order book, the securities lending market is organized similarly to an e-commerce marketplace, where agent lenders and borrowers can transact at any agreed price in a bilateral fashion. This similarity suggests that the use of typical methods for addressing dynamic pricing problems in e-commerce could be effective in the securities lending market. We show that existing contextual bandit frameworks can be successfully utilized in the securities lending market. Using offline evaluation on real historical data, we show that the contextual bandit approach can consistently outperform typical approaches by at least 15% in terms of total revenue generated.

Updated: 2024-07-19 19:46:54

标题: 证券借贷市场的动态定价：在代理出借人投资组合收入优化中的应用

摘要: 证券借贷是金融市场结构的重要组成部分，代理借出人帮助长期机构投资者将其证券借给卖空者，以换取借贷费。市场内的代理借出人寻求通过以最高利率借出证券来优化收入。通常情况下，这一利率由硬编码的业务规则或标准监督机器学习模型设定。这些方法通常难以扩展，并且不适应市场条件的变化。与传统的股票交易所具有集中限价订单簿不同，证券借贷市场的组织方式类似于电子商务市场，代理借出人和借款人可以以双边方式以任何约定价格交易。这种相似性表明，在证券借贷市场中，采用电子商务中处理动态定价问题的典型方法可能是有效的。我们展示了现有的上下文匪徒框架可以成功地应用于证券借贷市场。通过对真实历史数据进行离线评估，我们表明上下文匪徒方法在生成的总收入方面可以始终至少比典型方法高出15%。

更新时间: 2024-07-19 19:46:54

领域: q-fin.TR,cs.LG

下载: http://arxiv.org/abs/2407.13687v2

BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks

Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize holistic information. Generalist AI holds the potential to address these limitations due to its versatility in interpreting different data types and generating tailored outputs for diverse needs. However, existing biomedical generalist AI solutions are typically heavyweight and closed source to researchers, practitioners, and patients. Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model, designed as a generalist capable of performing various biomedical tasks. BiomedGPT achieved state-of-the-art results in 16 out of 25 experiments while maintaining a computing-friendly model scale. We also conducted human evaluations to assess the capabilities of BiomedGPT in radiology visual question answering, report generation, and summarization. BiomedGPT exhibits robust prediction ability with a low error rate of 3.8% in question answering, satisfactory performance with an error rate of 8.3% in writing complex radiology reports, and competitive summarization ability with a nearly equivalent preference score to human experts. Our method demonstrates that effective training with diverse data can lead to more practical biomedical AI for improving diagnosis and workflow efficiency.

Updated: 2024-07-19 19:42:36

标题: BiomedGPT：面向视觉、语言和多模态任务的统一和通用生物医学生成预训练转换器

摘要: 传统的生物医学人工智能（AI）模型，针对特定任务或模式设计，通常在现实世界的部署中表现出有限的灵活性，并且难以利用整体信息。通用AI有潜力解决这些限制，因为它在解释不同数据类型和生成符合各种需求的定制输出方面具有灵活性。然而，现有的生物医学通用AI解决方案通常是庞大的并且对研究人员、从业者和患者封闭的。在这里，我们提出了BiomedGPT，这是第一个开源且轻量级的视觉语言基础模型，设计为通用AI，能够执行各种生物医学任务。BiomedGPT在25个实验中的16个中取得了最先进的结果，同时保持了计算友好的模型规模。我们还进行了人类评估，评估了BiomedGPT在放射学视觉问题回答、报告生成和摘要方面的能力。BiomedGPT在问题回答方面表现出强大的预测能力，错误率为3.8%，在撰写复杂放射学报告方面表现出令人满意的性能，错误率为8.3%，并且在摘要方面表现出具有竞争力的能力，几乎与人类专家的偏好分数相当。我们的方法表明，有效地利用多样化数据进行训练可以导致更实用的生物医学AI，从而提高诊断和工作流效率。

更新时间: 2024-07-19 19:42:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2305.17100v3

ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction

Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction. ImplicitAVE, sourced from the MAVE dataset, is carefully curated and expanded to include implicit AVE and multimodality, resulting in a refined dataset of 68k training and 1.6k testing data across five domains. We also explore the application of multimodal large language models (MLLMs) to implicit AVE, establishing a comprehensive benchmark for MLLMs on the ImplicitAVE dataset. Six recent MLLMs with eleven variants are evaluated across diverse settings, revealing that implicit value extraction remains a challenging task for MLLMs. The contributions of this work include the development and release of ImplicitAVE, and the exploration and benchmarking of various MLLMs for implicit AVE, providing valuable insights and potential future research directions. Dataset and code are available at https://github.com/HenryPengZou/ImplicitAVE

Updated: 2024-07-19 19:36:18

标题: ImplicitAVE：一种用于隐式属性值提取的开源数据集和多模态LLMs基准

摘要: 现有的属性值提取（AVE）数据集主要关注显式属性值，而忽视了隐含的属性值，缺乏产品图像，通常不公开可用，并且缺乏跨不同领域的深入人工检查。为了解决这些限制，我们提出了ImplicitAVE，这是第一个公开可用的多模态数据集，用于隐含属性值提取。ImplicitAVE源自MAVE数据集，经过精心策划和扩展，包括隐含AVE和多模态，从而得到一个包括68k训练数据和1.6k测试数据的精细数据集，涵盖了五个领域。我们还探讨了多模态大语言模型（MLLMs）在隐含AVE上的应用，建立了一个全面的MLLMs在ImplicitAVE数据集上的基准。在不同设置下评估了六种最近的MLLMs和十一种变体，发现隐含值提取对于MLLMs仍然是一个具有挑战性的任务。这项工作的贡献包括开发和发布ImplicitAVE，以及探索和基准测试各种MLLMs用于隐含AVE，提供有价值的见解和潜在的未来研究方向。数据集和代码可在https://github.com/HenryPengZou/ImplicitAVE上找到。

更新时间: 2024-07-19 19:36:18

领域: cs.CV,cs.AI,cs.CL,cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.15592v2

Differential Privacy with Multiple Selections

We consider the setting where a user with sensitive features wishes to obtain a recommendation from a server in a differentially private fashion. We propose a ``multi-selection'' architecture where the server can send back multiple recommendations and the user chooses one from these that matches best with their private features. When the user feature is one-dimensional -- on an infinite line -- and the accuracy measure is defined w.r.t some increasing function $\mathfrak{h}(.)$ of the distance on the line, we precisely characterize the optimal mechanism that satisfies differential privacy. The specification of the optimal mechanism includes both the distribution of the noise that the user adds to its private value, and the algorithm used by the server to determine the set of results to send back as a response and further show that Laplace is an optimal noise distribution. We further show that this optimal mechanism results in an error that is inversely proportional to the number of results returned when the function $\mathfrak{h}(.)$ is the identity function.

Updated: 2024-07-19 19:34:51

标题: 多次选择的差分隐私

摘要: 我们考虑一个用户希望以差分隐私的方式从服务器获取推荐的情景。我们提出了一个“多选”架构，其中服务器可以发送多个推荐，用户从中选择与其私密特征最匹配的一个。当用户特征是一维的——在无限线上——并且准确度度量是相对于线上距离的某个增函数$\mathfrak{h}(.)$定义时，我们精确地描述了满足差分隐私的最佳机制。最佳机制的规范包括用户添加到其私有值的噪声分布，以及服务器用于确定要发送的结果集的算法，并进一步表明Laplace是最佳的噪声分布。我们进一步表明，当函数$\mathfrak{h}(.)$为恒等函数时，这种最佳机制导致的误差与返回结果的数量成反比。

更新时间: 2024-07-19 19:34:51

领域: cs.DS,cs.CR

下载: http://arxiv.org/abs/2407.14641v1

CVE-LLM : Automatic vulnerability evaluation in medical device industry using large language models

The healthcare industry is currently experiencing an unprecedented wave of cybersecurity attacks, impacting millions of individuals. With the discovery of thousands of vulnerabilities each month, there is a pressing need to drive the automation of vulnerability assessment processes for medical devices, facilitating rapid mitigation efforts. Generative AI systems have revolutionized various industries, offering unparalleled opportunities for automation and increased efficiency. This paper presents a solution leveraging Large Language Models (LLMs) to learn from historical evaluations of vulnerabilities for the automatic assessment of vulnerabilities in the medical devices industry. This approach is applied within the portfolio of a single manufacturer, taking into account device characteristics, including existing security posture and controls. The primary contributions of this paper are threefold. Firstly, it provides a detailed examination of the best practices for training a vulnerability Language Model (LM) in an industrial context. Secondly, it presents a comprehensive comparison and insightful analysis of the effectiveness of Language Models in vulnerability assessment. Finally, it proposes a new human-in-the-loop framework to expedite vulnerability evaluation processes.

Updated: 2024-07-19 19:34:17

标题: CVE-LLM：使用大型语言模型在医疗设备行业自动评估漏洞

摘要: 当前，医疗保健行业正在经历前所未有的网络安全攻击浪潮，影响数百万人。每个月发现成千上万的漏洞，迫切需要推动医疗设备漏洞评估过程的自动化，促进快速的缓解措施。生成式人工智能系统已经彻底改变了各行各业，为自动化和提高效率提供了无与伦比的机会。本文提出了一种利用大型语言模型（LLMs）学习医疗设备行业漏洞历史评估的解决方案，用于自动评估医疗设备的漏洞。这种方法应用于单个制造商的产品组合中，考虑设备特性，包括现有的安全状况和控制。本文的主要贡献有三个方面。首先，它提供了一个详细的研究，探讨在工业背景下训练漏洞语言模型（LM）的最佳实践。其次，它对语言模型在漏洞评估中的有效性进行了全面比较和深入分析。最后，它提出了一个新的人机协同框架，以加快漏洞评估过程。

更新时间: 2024-07-19 19:34:17

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2407.14640v1

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization

Language model alignment methods, such as reinforcement learning from human feedback (RLHF), have led to impressive advances in language model capabilities, but existing techniques are limited by a widely observed phenomenon known as overoptimization, where the quality of the language model plateaus or degrades over the course of the alignment process. Overoptimization is often attributed to overfitting to an inaccurate reward model, and while it can be mitigated through online data collection, this is infeasible in many settings. This raises a fundamental question: Do existing offline alignment algorithms make the most of the data they have, or can their sample-efficiency be improved further? We address this question with a new algorithm for offline alignment, $\chi^2$-Preference Optimization ($\chi$PO). $\chi$PO is a one-line change to Direct Preference Optimization (DPO; Rafailov et al., 2023), which only involves modifying the logarithmic link function in the DPO objective. Despite this minimal change, $\chi$PO implicitly implements the principle of pessimism in the face of uncertainty via regularization with the $\chi^2$-divergence -- which quantifies uncertainty more effectively than KL-regularization -- and provably alleviates overoptimization, achieving sample-complexity guarantees based on single-policy concentrability -- the gold standard in offline reinforcement learning. $\chi$PO's simplicity and strong guarantees make it the first practical and general-purpose offline alignment algorithm that is provably robust to overoptimization.

Updated: 2024-07-19 19:29:49

标题: 纠正KL正则化的神话：通过卡方偏好优化实现直接对齐而不过度优化

摘要: 语言模型对齐方法，如从人类反馈中强化学习（RLHF），已经在语言模型能力方面取得了令人印象深刻的进展，但现有技术受到一个被广泛观察到的现象的限制，即过度优化，在对齐过程中语言模型的质量达到平稳状态或下降。过度优化通常被归因于对不准确奖励模型的过度拟合，虽然可以通过在线数据收集来缓解这一问题，但在许多情况下这是不可行的。这引发了一个基本问题：现有的离线对齐算法是否充分利用了它们所拥有的数据，或者它们的样本效率能进一步提高？我们通过一个新的离线对齐算法来回答这个问题，即$\chi^2$-Preference Optimization ($\chi$PO)。$\chi$PO只是对Direct Preference Optimization (DPO; Rafailov et al., 2023)进行了一行代码的修改，只涉及修改DPO目标中的对数链接函数。尽管这种微小的改变，$\chi$PO通过使用$\chi^2$-散度进行正则化，隐含地实现了在面对不确定性时的悲观原则，这种方法比KL-正则化更有效地量化不确定性，并且据可证明缓解了过度优化问题，实现了基于单策略可集中性的样本复杂度保证——这是离线强化学习中的黄金标准。$\chi$PO的简单性和强大的保证使其成为第一个实用且通用的离线对齐算法，可以明确地抵御过度优化。

更新时间: 2024-07-19 19:29:49

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.13399v2

Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: A comprehensive analysis

Breast cancer is not preventable because of its unknown causes. However, its early diagnosis increases patients' recovery chances. Machine learning (ML) can be utilized to improve treatment outcomes in healthcare operations while diminishing costs and time. In this research, we suggest two novel feature selection (FS) methods based upon an imperialist competitive algorithm (ICA) and a bat algorithm (BA) and their combination with ML algorithms. This study aims to enhance diagnostic models' efficiency and present a comprehensive analysis to help clinical physicians make much more precise and reliable decisions than before. K-nearest neighbors, support vector machine, decision tree, Naive Bayes, AdaBoost, linear discriminant analysis, random forest, logistic regression, and artificial neural network are some of the methods employed. This paper applied a distinctive integration of evaluation measures and ML algorithms using the wrapper feature selection based on ICA (WFSIC) and BA (WFSB) separately. We compared two proposed approaches for the performance of the classifiers. Also, we compared our best diagnostic model with previous works reported in the literature survey. Experimentations were performed on the Wisconsin diagnostic breast cancer dataset. Results reveal that the proposed framework that uses the BA with an accuracy of 99.12\%, surpasses the framework using the ICA and most previous works. Additionally, the RF classifier in the approach of FS based on BA emerges as the best model and outperforms others regarding its criteria. Besides, the results illustrate the role of our techniques in reducing the dataset dimensions up to 90\% and increasing the performance of diagnostic models by over 99\%. Moreover, the result demonstrates that there are more critical features than the optimum dataset obtained by proposed FS approaches that have been selected by most ML models.

Updated: 2024-07-19 19:07:53

标题: 基于学习启发技术的两种新的基于特征选择方法用于乳腺癌预测：一项全面分析

摘要: 乳腺癌是由于其未知原因而无法预防的。然而，早期诊断可以增加患者的康复机会。机器学习（ML）可以用于改善医疗保健运营中的治疗结果，同时降低成本和时间。在这项研究中，我们提出了两种基于帝国竞争算法（ICA）和蝙蝠算法（BA）的新颖特征选择（FS）方法，并将它们与ML算法结合使用。该研究旨在提高诊断模型的效率，并提供全面分析，以帮助临床医生做出比以往更加精确可靠的决策。K最近邻，支持向量机，决策树，朴素贝叶斯，AdaBoost，线性判别分析，随机森林，逻辑回归和人工神经网络是一些所采用的方法。本文应用了基于ICA（WFSIC）和BA（WFSB）的包装特征选择的独特整合方法。我们比较了两种提出的方法在分类器性能上的表现。此外，我们将我们最佳诊断模型与文献调查中报道的先前作品进行了比较。实验在威斯康辛乳腺癌诊断数据集上进行。结果显示，使用BA的提出的框架具有99.12％的准确度，超过了使用ICA和大多数先前作品的框架。此外，基于BA的FS方法中的RF分类器被证明是最佳模型，并在其准则方面优于其他方法。此外，结果说明了我们的技术在将数据集维度减少至90％以上，并将诊断模型的性能提高超过99％方面的作用。此外，结果表明，大多数ML模型选择的最佳数据集之外存在更多重要特征，这些特征是通过提出的FS方法选择的。

更新时间: 2024-07-19 19:07:53

领域: cs.LG,cs.AI,cs.NE,stat.ML

下载: http://arxiv.org/abs/2407.14631v1

Advancing Melanoma Diagnosis with Self-Supervised Neural Networks: Evaluating the Effectiveness of Different Techniques

We investigate the potential of self-supervision in improving the accuracy of deep learning models trained to classify melanoma patches. Various self-supervision techniques such as rotation prediction, missing patch prediction, and corruption removal were implemented and assessed for their impact on the convolutional neural network's performance. Preliminary results suggest a positive influence of self-supervision methods on the model's accuracy. The study notably demonstrates the efficacy of the corruption removal method in enhancing model performance. Despite observable improvements, we conclude that the self-supervised models have considerable potential for further enhancement, achievable through training over more epochs or expanding the dataset. We suggest exploring other self-supervision methods like Bootstrap Your Own Latent (BYOL) and contrastive learning in future research, emphasizing the cost-benefit trade-off due to their resource-intensive nature. The findings underline the promise of self-supervision in augmenting melanoma detection capabilities of deep learning models.

Updated: 2024-07-19 18:57:26

标题: 使用自监督神经网络推进黑色素瘤诊断：评估不同技术的有效性

摘要: 我们调查了自监督在提高训练用于分类黑色素瘤斑块的深度学习模型准确性方面的潜力。实施了旋转预测、缺失斑块预测和破坏修复等各种自监督技术，并评估它们对卷积神经网络性能的影响。初步结果表明自监督方法对模型准确性具有积极影响。研究显著展示了破坏修复方法在提升模型性能方面的有效性。尽管观察到改进，我们得出结论，自监督模型具有相当大的潜力，可以通过更多epochs的训练或扩展数据集来实现进一步增强。我们建议在未来研究中探索其他自监督方法，如Bootstrap Your Own Latent（BYOL）和对比学习，强调由于它们资源密集的特性而产生的成本效益权衡。研究结果强调了自监督在增强深度学习模型对黑色素瘤检测能力方面的潜力。

更新时间: 2024-07-19 18:57:26

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.14628v1

Reliable Reasoning Beyond Natural Language

Despite their linguistic competence, Large Language models (LLMs) often exhibit limitations in their ability to reason reliably and flexibly. To address this, we propose a neurosymbolic approach that prompts LLMs to extract and encode all relevant information from a problem statement as logical code statements, and then use a logic programming language (Prolog) to conduct the iterative computations of explicit deductive reasoning. Our approach significantly enhances the performance of LLMs on the standard mathematical reasoning benchmark, GSM8k, and the Navigate dataset from the BIG-bench dataset. Additionally, we introduce a novel dataset, the Non-Linear Reasoning (NLR) dataset, consisting of 55 unique word problems that target the shortcomings of the next token prediction paradigm of LLMs and require complex non-linear reasoning but only basic arithmetic skills to solve. Our findings demonstrate that the integration of Prolog enables LLMs to achieve high performance on the NLR dataset, which even the most advanced language models (including GPT4) fail to solve using text only.

Updated: 2024-07-19 18:54:02

标题: 可靠推理超越自然语言

摘要: 尽管具有语言能力，但大型语言模型（LLMs）经常在可靠和灵活推理方面表现出限制。为了解决这个问题，我们提出了一种神经符号化方法，促使LLMs从问题陈述中提取和编码所有相关信息作为逻辑代码陈述，然后使用逻辑编程语言（Prolog）进行显式演绎推理的迭代计算。我们的方法显著提高了LLMs在标准数学推理基准GSM8k和BIG-bench数据集中Navigate数据集上的表现。此外，我们引入了一个新的数据集，非线性推理（NLR）数据集，包含55个独特的单词问题，旨在解决LLMs的下一个令牌预测范例的缺点，并需要复杂的非线性推理，但仅需要基本的算术技能来解决。我们的研究结果表明，整合Prolog使LLMs能够在NLR数据集上取得高性能，即使是最先进的语言模型（包括GPT4）也无法仅使用文本解决这个问题。

更新时间: 2024-07-19 18:54:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.11373v2

BOND: Aligning LLMs with Best-of-N Distillation

Reinforcement learning from human feedback (RLHF) is a key driver of quality and safety in state-of-the-art large language models. Yet, a surprisingly simple and strong inference-time strategy is Best-of-N sampling that selects the best generation among N candidates. In this paper, we propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time. Specifically, BOND is a distribution matching algorithm that forces the distribution of generations from the policy to get closer to the Best-of-N distribution. We use the Jeffreys divergence (a linear combination of forward and backward KL) to balance between mode-covering and mode-seeking behavior, and derive an iterative formulation that utilizes a moving anchor for efficiency. We demonstrate the effectiveness of our approach and several design choices through experiments on abstractive summarization and Gemma models. Aligning Gemma policies with BOND outperforms other RLHF algorithms by improving results on several benchmarks.

Updated: 2024-07-19 18:38:25

标题: BOND：将LLMs与最佳N个蒸馏对齐

摘要: Human feedback reinforcement learning (RLHF) is essential for enhancing the quality and safety of large language models. However, a straightforward yet effective strategy at inference time is Best-of-N sampling, which selects the best generation from N candidates. This paper introduces Best-of-N Distillation (BOND), a novel RLHF algorithm that aims to replicate the Best-of-N strategy without the computational overhead. BOND is a distribution matching algorithm that adjusts the distribution of generations from the policy to align with the Best-of-N distribution. By utilizing Jeffreys divergence to balance between mode-covering and mode-seeking behavior, and implementing an iterative formulation with a moving anchor for efficiency, BOND demonstrates its effectiveness through experiments on abstractive summarization and Gemma models. Aligning Gemma policies with BOND leads to improved results on various benchmarks, outperforming other RLHF algorithms.

更新时间: 2024-07-19 18:38:25

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.14622v1

What Matters in Transformers? Not All Attention is Needed

Scaling Transformer-based large language models (LLMs) has demonstrated promising performance across various tasks. However, it also introduces redundant structures, posing challenges for real-world deployment. Despite some recognition of redundancy in LLMs, the variability of redundancy across different modules, such as MLP and Attention layers, is under-explored. In this work, we investigate the varying redundancy across different modules within Transformers, including Blocks, MLP, and Attention layers, using a similarity-based metric. This metric operates on the premise that redundant structures produce outputs highly similar to their inputs. Surprisingly, while attention layers are essential for transformers and distinguish them from other mainstream architectures, we found that a large proportion of attention layers exhibit excessively high similarity and can be safely pruned without degrading performance, leading to reduced memory and computation costs. Additionally, we further propose a method that jointly drops Attention and MLP layers, achieving improved performance and dropping ratios. Extensive experiments demonstrate the effectiveness of our methods, e.g., Llama-3-70B maintains comparable performance even after pruning half of the attention layers. Our findings provide valuable insights for future network architecture design. The code is released at: \url{https://github.com/Shwai-He/LLM-Drop}.

Updated: 2024-07-19 18:31:44

标题: 变压器中的重要因素是什么？并非所有的关注都是必要的

摘要: 将基于Transformer的大型语言模型（LLMs）进行扩展已经展示出在各种任务上具有有希望的性能。然而，这也引入了冗余结构，给实际部署带来了挑战。尽管对LLMs中的冗余有一些认识，但对不同模块（如MLP和Attention层）之间冗余的变化程度尚未得到深入探讨。在本研究中，我们使用基于相似性的度量方法，调查了Transformer内不同模块（包括Blocks、MLP和Attention层）之间的变化冗余。这个度量方法的基本假设是冗余结构产生的输出与它们的输入非常相似。令人惊讶的是，虽然注意力层对于transformers至关重要并将它们与其他主流架构区分开来，但我们发现大部分注意力层的相似性过高，可以安全地剪枝而不会降低性能，从而减少内存和计算成本。此外，我们还提出了一种方法，联合丢弃Attention和MLP层，实现了性能和丢弃比例的提升。大量实验证明了我们方法的有效性，例如，即使剪掉了一半的注意力层，Llama-3-70B仍然保持了可比较的性能。我们的发现为未来网络架构设计提供了宝贵的见解。代码已经发布在：\url{https://github.com/Shwai-He/LLM-Drop}。

更新时间: 2024-07-19 18:31:44

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.15786v3

SOREL: A Stochastic Algorithm for Spectral Risks Minimization

The spectral risk has wide applications in machine learning, especially in real-world decision-making, where people are not only concerned with models' average performance. By assigning different weights to the losses of different sample points, rather than the same weights as in the empirical risk, it allows the model's performance to lie between the average performance and the worst-case performance. In this paper, we propose SOREL, the first stochastic gradient-based algorithm with convergence guarantees for the spectral risk minimization. Previous algorithms often consider adding a strongly concave function to smooth the spectral risk, thus lacking convergence guarantees for the original spectral risk. We theoretically prove that our algorithm achieves a near-optimal rate of $\widetilde{O}(1/\sqrt{\epsilon})$ in terms of $\epsilon$. Experiments on real datasets show that our algorithm outperforms existing algorithms in most cases, both in terms of runtime and sample complexity.

Updated: 2024-07-19 18:20:53

标题: SOREL：一种用于谱风险最小化的随机算法

摘要: 光谱风险在机器学习中具有广泛的应用，特别是在现实世界的决策中，人们不仅关注模型的平均性能。通过为不同样本点的损失分配不同的权重，而不是与经验风险中相同的权重，允许模型的性能介于平均性能和最坏情况性能之间。在本文中，我们提出了 SOREL，这是第一个具有光谱风险最小化收敛保证的基于随机梯度的算法。先前的算法通常考虑添加一个强凹函数来平滑光谱风险，因此缺乏原始光谱风险的收敛保证。我们在理论上证明了我们的算法在ε方面实现了接近最优的速率$\widetilde{O}(1/\sqrt{\epsilon})$。对真实数据集的实验表明，我们的算法在大多数情况下在运行时间和样本复杂性方面都优于现有算法。

更新时间: 2024-07-19 18:20:53

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2407.14618v1

Evaluating language models as risk scores

Current question-answering benchmarks predominantly focus on accuracy in realizable prediction tasks. Conditioned on a question and answer-key, does the most likely token match the ground truth? Such benchmarks necessarily fail to evaluate language models' ability to quantify outcome uncertainty. In this work, we focus on the use of language models as risk scores for unrealizable prediction tasks. We introduce folktexts, a software package to systematically generate risk scores using large language models, and evaluate them against benchmark prediction tasks. Specifically, the package derives natural language tasks from US Census data products, inspired by popular tabular data benchmarks. A flexible API allows for any task to be constructed out of 28 census features whose values are mapped to prompt-completion pairs. We demonstrate the utility of folktexts through a sweep of empirical insights on 16 recent large language models, inspecting risk scores, calibration curves, and diverse evaluation metrics. We find that zero-shot risk sores have high predictive signal while being widely miscalibrated: base models overestimate outcome uncertainty, while instruction-tuned models underestimate uncertainty and generate over-confident risk scores.

Updated: 2024-07-19 18:13:37

标题: 评估语言模型作为风险评分

摘要: 目前的问题回答基准主要关注可实现预测任务的准确性。在给定问题和答案键的条件下，最有可能的标记是否与实际情况相符？这样的基准测试必然无法评估语言模型量化结果不确定性的能力。在这项工作中，我们专注于将语言模型用作无法实现预测任务的风险评分。我们引入folktexts，这是一个软件包，用于系统性地使用大型语言模型生成风险评分，并将其与基准预测任务进行评估。具体来说，该软件包从美国人口普查数据产品中提取自然语言任务，灵感来自流行的表格数据基准测试。一个灵活的API允许使用28个人口普查特征的值映射到提示完成对来构建任何任务。我们通过对16个最近的大型语言模型进行实证洞察的扫描，检查风险评分、校准曲线和多样的评估指标，展示了folktexts的实用性。我们发现零射风险评分具有很高的预测信号，同时普遍校准不良：基础模型高估结果不确定性，而调整模型低估不确定性并产生过于自信的风险评分。

更新时间: 2024-07-19 18:13:37

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2407.14614v1

ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation

Despite recent advances in human pose estimation (HPE), poor generalization to out-of-distribution (OOD) data remains a difficult problem. While previous works have proposed Test-Time Adaptation (TTA) to bridge the train-test domain gap by refining network parameters at inference, the absence of ground-truth annotations makes it highly challenging and existing methods typically increase inference times by one or more orders of magnitude. We observe that 1) not every test time sample is OOD, and 2) HPE errors are significantly larger on distal keypoints (wrist, ankle). To this end, we propose ESCAPE: a lightweight correction and selective adaptation framework which applies a fast, forward-pass correction on most data while reserving costly TTA for OOD data. The free energy function is introduced to separate OOD samples from incoming data and a correction network is trained to estimate the errors of pretrained backbone HPE predictions on the distal keypoints. For OOD samples, we propose a novel self-consistency adaptation loss to update the correction network by leveraging the constraining relationship between distal keypoints and proximal keypoints (shoulders, hips), via a second ``reverse" network. ESCAPE improves the distal MPJPE of five popular HPE models by up to 7% on unseen data, achieves state-of-the-art results on two popular HPE benchmarks, and is significantly faster than existing adaptation methods.

Updated: 2024-07-19 18:01:26

标题: 逃逸：基于能量的选择性自适应校正用于超出分布的3D人体姿势估计

摘要: 尽管近年来在人体姿势估计（HPE）方面取得了进展，但对于分布之外（OOD）数据的泛化能力仍然是一个困难问题。之前的研究提出了测试时适应（TTA）来在推断时通过优化网络参数来弥合训练-测试领域之间的差距，但由于缺乏地面真实标注，这使得这一过程充满挑战，并且现有方法通常会导致推断时间增加一个或多个数量级。我们观察到1）并非每个测试样本都是OOD，2）在远端关键点（手腕、踝）上的HPE错误明显更大。因此，我们提出了ESCAPE：一个轻量级的校正和选择性适应框架，它在大多数数据上应用快速的前向传递校正，同时将昂贵的TTA保留给OOD数据。自由能函数被引入以将OOD样本与传入数据分开，并训练一个校正网络来估计预训练骨干HPE预测在远端关键点上的误差。对于OOD样本，我们提出了一种新颖的自一致性适应损失，通过第二个“反向”网络利用远端关键点和近端关键点（肩部、臀部）之间的约束关系来更新校正网络。ESCAPE在未见数据上将五种流行的HPE模型的远端MPJPE改善了高达7％，在两个流行的HPE基准测试中取得了最先进的结果，并且比现有的适应方法快得多。

更新时间: 2024-07-19 18:01:26

领域: cs.CV,cs.AI,I.2.6; I.2.10

下载: http://arxiv.org/abs/2407.14605v1

DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks

We propose a permutation-based explanation method for image classifiers. Current image-model explanations like activation maps are limited to instance-based explanations in the pixel space, making it difficult to understand global model behavior. In contrast, permutation based explanations for tabular data classifiers measure feature importance by comparing model performance on data before and after permuting a feature. We propose an explanation method for image-based models that permutes interpretable concepts across dataset images. Given a dataset of images labeled with specific concepts like captions, we permute a concept across examples in the text space and then generate images via a text-conditioned diffusion model. Feature importance is then reflected by the change in model performance relative to unpermuted data. When applied to a set of concepts, the method generates a ranking of feature importance. We show this approach recovers underlying model feature importance on synthetic and real-world image classification tasks.

Updated: 2024-07-19 17:59:38

标题: DEPICT：用于图像分类任务的扩散启用的排列重要性

摘要: 我们提出了一种基于排列的图像分类器解释方法。当前的图像模型解释方法如激活图仅限于像素空间中的基于实例的解释，使得理解全局模型行为变得困难。相比之下，基于排列的解释方法用于表格数据分类器，通过比较在对特征进行排列之前和之后的数据上模型的性能来衡量特征的重要性。我们提出了一种用于基于图像的模型的解释方法，该方法通过在数据集图像中排列可解释的概念来测量特征的重要性。给定一个带有特定概念标签的图像数据集，例如标题，我们在文本空间中对示例中的概念进行排列，然后通过文本条件扩散模型生成图像。特征的重要性通过与未排列数据相比的模型性能变化来反映。当应用于一组概念时，该方法生成特征重要性的排名。我们展示了这种方法在合成和真实世界图像分类任务中恢复了潜在的模型特征重要性。

更新时间: 2024-07-19 17:59:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.14509v1

On Pre-training of Multimodal Language Models Customized for Chart Understanding

Recent studies customizing Multimodal Large Language Models (MLLMs) for domain-specific tasks have yielded promising results, especially in the field of scientific chart comprehension. These studies generally utilize visual instruction tuning with specialized datasets to enhance question and answer (QA) accuracy within the chart domain. However, they often neglect the fundamental discrepancy between natural image-caption pre-training data and digital chart image-QA data, particularly in the models' capacity to extract underlying numeric values from charts. This paper tackles this oversight by exploring the training processes necessary to improve MLLMs' comprehension of charts. We present three key findings: (1) Incorporating raw data values in alignment pre-training markedly improves comprehension of chart data. (2) Replacing images with their textual representation randomly during end-to-end fine-tuning transfer the language reasoning capability to chart interpretation skills. (3) Requiring the model to first extract the underlying chart data and then answer the question in the fine-tuning can further improve the accuracy. Consequently, we introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension. CHOPINLLM effectively interprets various types of charts, including unannotated ones, while maintaining robust reasoning abilities. Furthermore, we establish a new benchmark to evaluate MLLMs' understanding of different chart types across various comprehension levels. Experimental results show that CHOPINLLM exhibits strong performance in understanding both annotated and unannotated charts across a wide range of types.

Updated: 2024-07-19 17:58:36

标题: 关于为图表理解定制的多模态语言模型的预训练

摘要: 最近的研究定制多模态大语言模型（MLLMs）以用于特定领域的任务取得了令人鼓舞的成果，特别是在科学图表理解领域。这些研究通常利用专门数据集进行视觉指导调整，以增强图表领域内问题和答案（QA）的准确性。然而，它们经常忽视了自然图像标题预训练数据与数字图表图像QA数据之间的根本差异，特别是模型从图表中提取基础数值的能力。本文通过探讨改进MLLMs对图表理解的训练过程来解决这一疏忽。我们提出了三个关键发现：（1）在对齐预训练中加入原始数据值显著提高了对图表数据的理解。（2）在端到端微调过程中随机用文本表示替换图像，将语言推理能力转移到图表解释技能上。（3）要求模型首先提取基础图表数据，然后在微调中回答问题可以进一步提高准确性。因此，我们引入了CHOPINLLM，一种专为深度图表理解而定制的MLLM。CHOPINLLM有效地解释各种类型的图表，包括未注释的图表，同时保持强大的推理能力。此外，我们建立了一个新的基准来评估MLLMs对不同图表类型在各种理解水平上的理解。实验结果表明，CHOPINLLM在理解各种类型的已注释和未注释图表方面表现出色。

更新时间: 2024-07-19 17:58:36

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.14506v1

Nonlinear Schrödinger Network

Deep neural networks (DNNs) have achieved exceptional performance across various fields by learning complex nonlinear mappings from large-scale datasets. However, they encounter challenges such as high computational costs and limited interpretability. To address these issues, hybrid approaches that integrate physics with AI are gaining interest. This paper introduces a novel physics-based AI model called the "Nonlinear Schr\"odinger Network", which treats the Nonlinear Schr\"odinger Equation (NLSE) as a general-purpose trainable model for learning complex patterns including nonlinear mappings and memory effects from data. Existing physics-informed machine learning methods use neural networks to approximate the solutions of partial differential equations (PDEs). In contrast, our approach directly treats the PDE as a trainable model to obtain general nonlinear mappings that would otherwise require neural networks. As a physics-inspired approach, it offers a more interpretable and parameter-efficient alternative to traditional black-box neural networks, achieving comparable or better accuracy in time series classification tasks while significantly reducing the number of required parameters. Notably, the trained Nonlinear Schr\"odinger Network is interpretable, with all parameters having physical meanings as properties of a virtual physical system that transforms the data to a more separable space. This interpretability allows for insight into the underlying dynamics of the data transformation process. Applications to time series forecasting have also been explored. While our current implementation utilizes the NLSE, the proposed method of using physics equations as trainable models to learn nonlinear mappings from data is not limited to the NLSE and may be extended to other master equations of physics.

Updated: 2024-07-19 17:58:00

标题: 非线性薛定谔网络

摘要: 深度神经网络（DNNs）通过学习大规模数据集中的复杂非线性映射，在各个领域取得了卓越的性能。然而，它们面临高计算成本和有限的可解释性等挑战。为了解决这些问题，将物理与人工智能相结合的混合方法备受关注。本文介绍了一种名为“非线性薛定谔网络”的基于物理的人工智能模型，将非线性薛定谔方程（NLSE）视为一个通用的可训练模型，用于从数据中学习包括非线性映射和记忆效应在内的复杂模式。现有的基于物理的机器学习方法使用神经网络来逼近偏微分方程（PDEs）的解。相比之下，我们的方法直接将PDE视为一个可训练模型，以获得普通非线性映射，这在其他情况下可能需要神经网络。作为一种受物理启发的方法，它提供了一个更具可解释性和参数效率的替代方案，比传统的黑盒神经网络在时间序列分类任务中实现了可比或更好的准确性，同时显著减少了所需参数的数量。值得注意的是，经过训练的非线性薛定谔网络是可解释的，所有参数都具有物理意义，作为将数据转换为更可分离空间的虚拟物理系统的属性。这种可解释性能够洞察数据转换过程的基本动态。还探讨了时间序列预测的应用。虽然我们目前的实现采用了NLSE，但提出的使用物理方程作为可训练模型从数据中学习非线性映射的方法并不局限于NLSE，可以扩展到其他物理主方程。

更新时间: 2024-07-19 17:58:00

领域: cs.LG

下载: http://arxiv.org/abs/2407.14504v1

Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification

When applying reinforcement learning from human feedback (RLHF), the reward is learned from data and, therefore, always has some error. It is common to mitigate this by regularizing the policy with KL divergence from a base model, with the hope that balancing reward with regularization will achieve desirable outcomes despite this reward misspecification. We show that when the reward function has light-tailed error, optimal policies under less restrictive KL penalties achieve arbitrarily high utility. However, if error is heavy-tailed, some policies obtain arbitrarily high reward despite achieving no more utility than the base model--a phenomenon we call catastrophic Goodhart. We adapt a discrete optimization method to measure the tails of reward models, finding that they are consistent with light-tailed error. However, the pervasiveness of heavy-tailed distributions in many real-world applications indicates that future sources of RL reward could have heavy-tailed error, increasing the likelihood of reward hacking even with KL regularization.

Updated: 2024-07-19 17:57:59

标题: 灾难性Goodhart：使用KL散度对RLHF进行正则化不能缓解重尾奖励误设定

摘要: 在应用来自人类反馈的强化学习（RLHF）时，奖励是从数据中学习的，因此总会有一些误差。通常通过将策略与基本模型之间的KL散度进行正则化来减轻这种误差，希望通过平衡奖励与正则化来实现理想的结果，尽管存在奖励规范化的错误。我们展示了当奖励函数存在轻尾误差时，根据较少限制的KL惩罚得到的最优策略可以实现任意高的效用。然而，如果误差是重尾的，一些策略会获得任意高的奖励，尽管其效用不会超过基本模型--这种现象被称为灾难性的Goodhart。我们改进了一种离散优化方法来衡量奖励模型的尾部，发现它们与轻尾误差一致。然而，在许多现实应用中普遍存在重尾分布，这表明未来RL奖励的来源可能具有重尾误差，即使进行KL正则化，也会增加奖励篡改的可能性。

更新时间: 2024-07-19 17:57:59

领域: cs.LG

下载: http://arxiv.org/abs/2407.14503v1

Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income Communities

In recent years, indoor air pollution has posed a significant threat to our society, claiming over 3.2 million lives annually. Developing nations, such as India, are most affected since lack of knowledge, inadequate regulation, and outdoor air pollution lead to severe daily exposure to pollutants. However, only a limited number of studies have attempted to understand how indoor air pollution affects developing countries like India. To address this gap, we present spatiotemporal measurements of air quality from 30 indoor sites over six months during summer and winter seasons. The sites are geographically located across four regions of type: rural, suburban, and urban, covering the typical low to middle-income population in India. The dataset contains various types of indoor environments (e.g., studio apartments, classrooms, research laboratories, food canteens, and residential households), and can provide the basis for data-driven learning model research aimed at coping with unique pollution patterns in developing countries. This unique dataset demands advanced data cleaning and imputation techniques for handling missing data due to power failure or network outages during data collection. Furthermore, through a simple speech-to-text application, we provide real-time indoor activity labels annotated by occupants. Therefore, environmentalists and ML enthusiasts can utilize this dataset to understand the complex patterns of the pollutants under different indoor activities, identify recurring sources of pollution, forecast exposure, improve floor plans and room structures of modern indoor designs, develop pollution-aware recommender systems, etc.

Updated: 2024-07-19 17:53:21

标题: 低至中等收入社区中与日常生活活动相关的室内空气质量数据集

摘要: 近年来，室内空气污染对我们的社会构成了重大威胁，每年造成超过320万人死亡。发展中国家，如印度，受影响最为严重，因为缺乏知识、监管不足和室外空气污染导致人们每天暴露于污染物中。然而，只有少数研究尝试了解室内空气污染如何影响印度等发展中国家。为了填补这一空白，我们在夏季和冬季的六个月内对30个室内场所的空气质量进行了时空测量。这些场所地理位置分布在印度四个类型的地区：农村、郊区和城市，涵盖了印度典型的低至中等收入人口。数据集包含各种室内环境（如工作室公寓、教室、研究实验室、食堂和住宅家庭），可为以数据驱动的学习模型研究应对发展中国家中独特的污染模式提供基础。这一独特的数据集需要先进的数据清洗和填补技术来处理由于停电或网络中断导致的数据缺失。此外，通过一个简单的语音转文本应用程序，我们提供了被居民标注的实时室内活动标签。因此，环保人士和机器学习爱好者可以利用这一数据集来了解不同室内活动下污染物的复杂模式，识别污染源，预测暴露，改善现代室内设计的平面图和房间结构，开发污染感知的推荐系统等。

更新时间: 2024-07-19 17:53:21

领域: cs.LG

下载: http://arxiv.org/abs/2407.14501v1

Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery

Concept Bottleneck Models (CBMs) have recently been proposed to address the 'black-box' problem of deep neural networks, by first mapping images to a human-understandable concept space and then linearly combining concepts for classification. Such models typically require first coming up with a set of concepts relevant to the task and then aligning the representations of a feature extractor to map to these concepts. However, even with powerful foundational feature extractors like CLIP, there are no guarantees that the specified concepts are detectable. In this work, we leverage recent advances in mechanistic interpretability and propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm: instead of pre-selecting concepts based on the downstream classification task, we use sparse autoencoders to first discover concepts learnt by the model, and then name them and train linear probes for classification. Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model. We perform a comprehensive evaluation across multiple datasets and CLIP architectures and show that our method yields semantically meaningful concepts, assigns appropriate names to them that make them easy to interpret, and yields performant and interpretable CBMs. Code available at https://github.com/neuroexplicit-saar/discover-then-name.

Updated: 2024-07-19 17:50:11

标题: 发现-命名：通过自动概念发现实现任务不可知的概念瓶颈

摘要: 最近，概念瓶颈模型（CBMs）被提出来解决深度神经网络的“黑匣子”问题，首先将图像映射到人类可理解的概念空间，然后线性组合概念进行分类。这种模型通常需要首先确定与任务相关的一组概念，然后调整特征提取器的表示以映射到这些概念。然而，即使使用像CLIP这样功能强大的基础特征提取器，也无法保证指定的概念是可检测的。在这项工作中，我们利用了最近在机械性可解释性方面的进展，并提出了一种新颖的CBM方法——称为Discover-then-Name-CBM（DN-CBM）——颠倒了典型的范式：不是基于下游分类任务预先选择概念，而是使用稀疏自动编码器首先发现模型学习到的概念，然后为其命名并训练线性探测器进行分类。我们的概念提取策略高效，因为它对下游任务不可知，并使用模型已知的概念。我们在多个数据集和CLIP架构上进行了全面评估，并展示了我们的方法产生了语义上有意义的概念，为其分配了适当的名称使其易于解释，并产生了性能良好且可解释的CBMs。代码可在https://github.com/neuroexplicit-saar/discover-then-name 获取。

更新时间: 2024-07-19 17:50:11

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.14499v1

Learning Collective Variables with Synthetic Data Augmentation through Physics-Inspired Geodesic Interpolation

In molecular dynamics simulations, rare events, such as protein folding, are typically studied using enhanced sampling techniques, most of which are based on the definition of a collective variable (CV) along which acceleration occurs. Obtaining an expressive CV is crucial, but often hindered by the lack of information about the particular event, e.g., the transition from unfolded to folded conformation. We propose a simulation-free data augmentation strategy using physics-inspired metrics to generate geodesic interpolations resembling protein folding transitions, thereby improving sampling efficiency without true transition state samples. This new data can be used to improve the accuracy of classifier-based methods. Alternatively, a regression-based learning scheme for CV models can be adopted by leveraging the interpolation progress parameter.

Updated: 2024-07-19 17:48:10

标题: 通过物理启发的测地插值学习用合成数据增强的集体变量

摘要: 在分子动力学模拟中，稀有事件，如蛋白质折叠，通常使用增强抽样技术进行研究，其中大多数技术基于沿着加速发生的集体变量（CV）的定义。获得表达CV至关重要，但通常受到关于特定事件的信息不足的阻碍，例如，从未折叠到折叠构象的转变。我们提出了一种使用受物理启发的指标进行数据增强的策略，以生成类似蛋白质折叠转变的测地插值，从而提高抽样效率，而无需真正的过渡态样本。这些新数据可用于改进基于分类器的方法的准确性。或者，可以通过利用插值进度参数采用基于回归的CV模型学习方案。

更新时间: 2024-07-19 17:48:10

领域: physics.chem-ph,cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2402.01542v4

Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

Despite the successes of large language models (LLMs), they exhibit significant drawbacks, particularly when processing long contexts. Their inference cost scales quadratically with respect to sequence length, making it expensive for deployment in some real-world text processing applications, such as retrieval-augmented generation (RAG). Additionally, LLMs also exhibit the "distraction phenomenon", where irrelevant context in the prompt degrades output quality. To address these drawbacks, we propose a novel RAG prompting methodology, *superposition prompting*, which can be directly applied to pre-trained transformer-based LLMs *without the need for fine-tuning*. At a high level, superposition prompting allows the LLM to process input documents in parallel *prompt paths*, discarding paths once they are deemed irrelevant. We demonstrate the capability of our method to simultaneously enhance time efficiency across a variety of question-answering benchmarks using multiple pre-trained LLMs. Furthermore, our technique significantly improves accuracy when the retrieved context is large relative the context the model was trained on. For example, our approach facilitates a 93x reduction in compute time while *improving* accuracy by 43% on the NaturalQuestions-Open dataset with the MPT-7B instruction-tuned model over naive RAG.

Updated: 2024-07-19 17:47:42

标题: 叠加提示：改进和加速检索增强生成

摘要: 尽管大型语言模型（LLMs）取得了成功，但它们存在显著的缺点，特别是在处理长文本时。它们的推理成本随着序列长度的增加呈二次方增长，这使得它们在一些真实世界的文本处理应用中部署变得昂贵，比如检索增强生成（RAG）。此外，LLMs还表现出“分散注意现象”，即提示中的无关上下文会降低输出质量。为了解决这些缺点，我们提出了一种新颖的RAG提示方法，即“叠加提示”，它可以直接应用于预训练的基于transformer的LLMs，而无需进行微调。在高层次上，叠加提示允许LLM并行处理输入文档的*提示路径*，一旦被认为无关，就会丢弃路径。我们展示了我们的方法在多个预训练LLMs上同时提高各种问答基准的时间效率。此外，我们的技术在检索到的上下文相对于模型训练的上下文较大时显著提高了准确性。例如，我们的方法在NaturalQuestions-Open数据集上使用MPT-7B指令调整模型，实现了93倍的计算时间减少，同时将准确性提高了43%。

更新时间: 2024-07-19 17:47:42

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.06910v2

Conformal Thresholded Intervals for Efficient Regression

This paper introduces Conformal Thresholded Intervals (CTI), a novel conformal regression method that aims to produce the smallest possible prediction set with guaranteed coverage. Unlike existing methods that rely on nested conformal framework and full conditional distribution estimation, CTI estimates the conditional probability density for a new response to fall into each interquantile interval using off-the-shelf multi-output quantile regression. CTI constructs prediction sets by thresholding the estimated conditional interquantile intervals based on their length, which is inversely proportional to the estimated probability density. The threshold is determined using a calibration set to ensure marginal coverage. Experimental results demonstrate that CTI achieves optimal performance across various datasets.

Updated: 2024-07-19 17:47:08

标题: 有效回归的保形阈值区间

摘要: 本文介绍了一种新颖的符合性回归方法——Conformal Thresholded Intervals（CTI），旨在产生具有保证覆盖率的最小可能预测集。与现有依赖于嵌套符合性框架和完整条件分布估计的方法不同，CTI利用现成的多输出分位数回归来估计新响应落入每个分位数区间的条件概率密度。CTI通过根据其长度对估计的条件分位数区间进行阈值处理来构建预测集，该长度与估计的概率密度成反比。阈值是使用校准集确定的，以确保边际覆盖率。实验结果表明，CTI在各种数据集上实现了最佳性能。

更新时间: 2024-07-19 17:47:08

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.14495v1

InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques

Mechanistic interpretability methods aim to identify the algorithm a neural network implements, but it is difficult to validate such methods when the true algorithm is unknown. This work presents InterpBench, a collection of semi-synthetic yet realistic transformers with known circuits for evaluating these techniques. We train these neural networks using a stricter version of Interchange Intervention Training (IIT) which we call Strict IIT (SIIT). Like the original, SIIT trains neural networks by aligning their internal computation with a desired high-level causal model, but it also prevents non-circuit nodes from affecting the model's output. We evaluate SIIT on sparse transformers produced by the Tracr tool and find that SIIT models maintain Tracr's original circuit while being more realistic. SIIT can also train transformers with larger circuits, like Indirect Object Identification (IOI). Finally, we use our benchmark to evaluate existing circuit discovery techniques.

Updated: 2024-07-19 17:46:51

标题: InterpBench：用于评估机械解释可解释性技术的半合成变压器

摘要: 机制可解释性方法旨在识别神经网络实施的算法，但当真实算法未知时，很难验证这些方法。这项工作介绍了InterpBench，这是一个半合成但现实的变压器集合，其中包含已知电路，用于评估这些技术。我们使用严格版本的交替干预训练（IIT）来训练这些神经网络，我们称之为Strict IIT（SIIT）。与原始方法一样，SIIT通过将神经网络的内部计算与所需的高级因果模型对齐来训练神经网络，但它还防止非电路节点影响模型的输出。我们在Tracr工具生成的稀疏变压器上评估了SIIT，并发现SIIT模型保留了Tracr的原始电路，同时更加现实。SIIT还可以训练具有更大电路的变压器，如间接对象识别（IOI）。最后，我们使用我们的基准来评估现有的电路发现技术。

更新时间: 2024-07-19 17:46:51

领域: cs.LG

下载: http://arxiv.org/abs/2407.14494v1

Explainable Post hoc Portfolio Management Financial Policy of a Deep Reinforcement Learning agent

Financial portfolio management investment policies computed quantitatively by modern portfolio theory techniques like the Markowitz model rely on a set on assumptions that are not supported by data in high volatility markets. Hence, quantitative researchers are looking for alternative models to tackle this problem. Concretely, portfolio management is a problem that has been successfully addressed recently by Deep Reinforcement Learning (DRL) approaches. In particular, DRL algorithms train an agent by estimating the distribution of the expected reward of every action performed by an agent given any financial state in a simulator. However, these methods rely on Deep Neural Networks model to represent such a distribution, that although they are universal approximator models, they cannot explain its behaviour, given by a set of parameters that are not interpretable. Critically, financial investors policies require predictions to be interpretable, so DRL agents are not suited to follow a particular policy or explain their actions. In this work, we developed a novel Explainable Deep Reinforcement Learning (XDRL) approach for portfolio management, integrating the Proximal Policy Optimization (PPO) with the model agnostic explainable techniques of feature importance, SHAP and LIME to enhance transparency in prediction time. By executing our methodology, we can interpret in prediction time the actions of the agent to assess whether they follow the requisites of an investment policy or to assess the risk of following the agent suggestions. To the best of our knowledge, our proposed approach is the first explainable post hoc portfolio management financial policy of a DRL agent. We empirically illustrate our methodology by successfully identifying key features influencing investment decisions, which demonstrate the ability to explain the agent actions in prediction time.

Updated: 2024-07-19 17:40:39

标题: 可解释的事后组合管理金融政策：深度强化学习代理的翻译

摘要: 金融投资组合管理投资政策通过现代投资组合理论技术（如马科维茨模型）定量计算，依赖于一组在高波动市场中未得到数据支持的假设。因此，定量研究人员正在寻找替代模型来解决这个问题。具体而言，投资组合管理是一个最近成功通过深度强化学习（DRL）方法解决的问题。特别是，DRL算法通过在模拟器中估计代理人在任何金融状态下执行的每个动作的预期奖励分布来训练代理人。然而，这些方法依赖于深度神经网络模型来表示这种分布，尽管它们是通用逼近模型，但无法解释其行为，由一组不可解释的参数给出。至关重要的是，金融投资者的政策要求预测是可解释的，因此DRL代理人不适合遵循特定政策或解释其行为。在这项工作中，我们开发了一种新颖的可解释深度强化学习（XDRL）方法用于投资组合管理，将近端政策优化（PPO）与特征重要性、SHAP和LIME等模型无关的可解释技术相结合，以增强预测时的透明度。通过执行我们的方法，我们可以在预测时解释代理人的行为，以评估它们是否符合投资政策的要求或评估遵循代理人建议的风险。据我们所知，我们提出的方法是第一个解释性事后投资组合管理金融政策的DRL代理人。我们通过成功识别影响投资决策的关键特征来经验性地说明我们的方法，这表明了解释代理人在预测时的行动的能力。

更新时间: 2024-07-19 17:40:39

领域: cs.CE,cs.AI,q-fin.PM

下载: http://arxiv.org/abs/2407.14486v1

Modeling Long Sequences in Bladder Cancer Recurrence: A Comparative Evaluation of LSTM,Transformer,and Mamba

Traditional survival analysis methods often struggle with complex time-dependent data,failing to capture and interpret dynamic characteristics adequately.This study aims to evaluate the performance of three long-sequence models,LSTM,Transformer,and Mamba,in analyzing recurrence event data and integrating them with the Cox proportional hazards model.This study integrates the advantages of deep learning models for handling long-sequence data with the Cox proportional hazards model to enhance the performance in analyzing recurrent events with dynamic time information.Additionally,this study compares the ability of different models to extract and utilize features from time-dependent clinical recurrence data.The LSTM-Cox model outperformed both the Transformer-Cox and Mamba-Cox models in prediction accuracy and model fit,achieving a Concordance index of up to 0.90 on the test set.Significant predictors of bladder cancer recurrence,such as treatment stop time,maximum tumor size at recurrence and recurrence frequency,were identified.The LSTM-Cox model aligned well with clinical outcomes,effectively distinguishing between high-risk and low-risk patient groups.This study demonstrates that the LSTM-Cox model is a robust and efficient method for recurrent data analysis and feature extraction,surpassing newer models like Transformer and Mamba.It offers a practical approach for integrating deep learning technologies into clinical risk prediction systems,thereby improving patient management and treatment outcomes.

Updated: 2024-07-19 17:38:12

标题: 在膀胱癌复发中建模长序列：LSTM、Transformer和Mamba的比较评估

摘要: 传统的生存分析方法往往在处理复杂的时间依赖数据时遇到困难，无法充分捕捉和解释动态特征。本研究旨在评估三种长序列模型，LSTM、Transformer和Mamba，在分析复发事件数据并将其与Cox比例危险模型结合的性能。本研究将深度学习模型处理长序列数据的优势与Cox比例危险模型相结合，以增强分析具有动态时间信息的复发事件的性能。此外，本研究比较了不同模型从时间依赖临床复发数据中提取和利用特征的能力。LSTM-Cox模型在预测准确性和模型拟合方面优于Transformer-Cox和Mamba-Cox模型，在测试集上实现了高达0.90的一致性指数。鉴别出了膀胱癌复发的显著预测因子，如治疗停止时间、复发时最大肿瘤大小和复发频率。LSTM-Cox模型与临床结果相符，有效区分高风险和低风险患者群。本研究表明，LSTM-Cox模型是一种用于复发数据分析和特征提取的强大有效方法，超越了Transformer和Mamba等新型模型。它为将深度学习技术整合到临床风险预测系统中提供了实用方法，从而改善患者管理和治疗结果。

更新时间: 2024-07-19 17:38:12

领域: cs.LG,stat.ME,stat.ML

下载: http://arxiv.org/abs/2405.18518v2

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

In this work, we introduce ChatQA 2, a Llama3-based model designed to bridge the gap between open-access LLMs and leading proprietary models (e.g., GPT-4-Turbo) in long-context understanding and retrieval-augmented generation (RAG) capabilities. These two capabilities are essential for LLMs to process large volumes of information that cannot fit into a single prompt and are complementary to each other, depending on the downstream tasks and computational budgets. We present a detailed continued training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens, along with a three-stage instruction tuning process to enhance the model's instruction-following, RAG performance, and long-context understanding capabilities. Our results demonstrate that the Llama3-ChatQA-2-70B model achieves accuracy comparable to GPT-4-Turbo-2024-0409 on many long-context understanding tasks and surpasses it on the RAG benchmark. Interestingly, we find that the state-of-the-art long-context retriever can alleviate the top-k context fragmentation issue in RAG, further improving RAG-based results for long-context understanding tasks. We also provide extensive comparisons between RAG and long-context solutions using state-of-the-art long-context LLMs.

Updated: 2024-07-19 17:35:47

标题: ChatQA 2：弥合长篇背景和RAG能力中专有LLMs的差距

摘要: 在这项工作中，我们介绍了ChatQA 2，这是一个基于Llama3的模型，旨在弥合开放获取的LLMs和领先的专有模型（例如GPT-4-Turbo）在长文本理解和检索增强生成（RAG）能力方面的差距。这两种能力对于LLMs来说至关重要，以便处理大量信息，这些信息无法适应单个提示，并且根据下游任务和计算预算的不同，这两种能力是互补的。我们提出了一个详细的持续训练配方，以将Llama3-70B-base的上下文窗口从8K扩展到128K标记，以及一个三阶段的指令调整过程，以增强模型的指令遵循、RAG性能和长文本理解能力。我们的结果表明，Llama3-ChatQA-2-70B模型在许多长文本理解任务上的准确性与GPT-4-Turbo-2024-0409相当，并且在RAG基准测试中超越了它。有趣的是，我们发现最先进的长文本检索器可以缓解RAG中的top-k上下文碎片化问题，进一步改善基于RAG的长文本理解任务的结果。我们还使用最先进的长文本LLMs对RAG和长文本解决方案进行了广泛比较。

更新时间: 2024-07-19 17:35:47

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.14482v1

Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. However, in low-resource languages, obtaining such hand-picked exemplars can still be challenging, where unsupervised techniques may be necessary. Moreover, competent generative capabilities of LLMs are observed only in high-resource languages, while their performances among under-represented languages fall behind due to pre-training data imbalance. To elicit LLMs' ability onto low-resource languages without any supervised data, we propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. These prompts are then used to create intra-lingual exemplars to perform tasks in the target languages. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages. We also show that fine-tuning a 7B model on data generated from our method helps it perform competitively with a 175B model. In non-English translation tasks, our method even outperforms supervised prompting by up to 3 chrF++ in many low-resource languages. When evaluated on zero-shot multilingual summarization, our method surpasses other English-pivoting baselines by up to 4 ROUGE-L and is also favored by GPT-4.

Updated: 2024-07-19 17:31:58

标题: 通过利用英语主导能力和语言多样化提示，为低资源语言民主化LLM（大型语言模型）

摘要: 大型语言模型（LLMs）以简单地观察少量示例即可有效执行任务而闻名。然而，在资源匮乏的语言中，获取这种精心挑选的示例仍然具有挑战性，可能需要无监督技术。此外，LLMs 的优秀生成能力仅在高资源语言中观察到，而在低资源语言中，由于预训练数据不平衡，它们的表现落后。为了在没有监督数据的情况下引出LLMs 对低资源语言的能力，我们建议从多样化的高资源语言集合中组装合成示例，以促使LLMs将任何语言翻译成英语。然后，这些提示用于创建目标语言中的任务执行的内部语言示例。我们的无监督提示方法在LLMs不同大小的情况下与监督式少样本学习表现相当，用于英语和13种印度和21种非洲低资源语言之间的翻译。我们还展示，通过从我们的方法生成的数据对7B模型进行微调有助于使其与175B模型竞争。在非英语翻译任务中，我们的方法甚至在许多低资源语言中比监督提示表现更好，chrF++提高了3个单位。在零射多语言摘要评估中，我们的方法超过其他基于英语的基线最多4个ROUGE-L，并且也受到GPT-4的青睐。

更新时间: 2024-07-19 17:31:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2306.11372v2

Data-Centric Human Preference Optimization with Rationales

Reinforcement learning from human feedback plays a crucial role in aligning language models towards human preferences, traditionally represented through comparisons between pairs or sets of responses within a given context. While many studies have enhanced algorithmic techniques to optimize learning from such data, this work shifts focus to improving preference learning through a data-centric approach. Specifically, we propose enriching existing preference datasets with machine-generated rationales that explain the reasons behind choices. We develop a simple and principled framework to augment current preference learning methods with rationale information. Our comprehensive analysis highlights how rationales enhance learning efficiency. Extensive experiments reveal that rationale-enriched preference learning offers multiple advantages: it improves data efficiency, accelerates convergence to higher-performing models, and reduces verbosity bias and hallucination. Furthermore, this framework is versatile enough to integrate with various preference optimization algorithms. Overall, our findings highlight the potential of re-imagining data design for preference learning, demonstrating that even freely available machine-generated rationales can significantly boost performance across multiple dimensions. The code repository is available at https: //github.com/reds-lab/preference-learning-with-rationales

Updated: 2024-07-19 17:27:52

标题: 具有理由的数据中心人类偏好优化

摘要: 从人类反馈中进行强化学习在将语言模型与人类偏好对齐方面发挥着关键作用，传统上通过在给定上下文中比较一对或一组响应来表示。虽然许多研究已经优化了从这些数据中学习的算法技术，但这项工作将重点转向通过数据中心方法改进偏好学习。具体地，我们提出通过解释选择背后的原因，将现有偏好数据集丰富起来的方法。我们开发了一个简单而有原则的框架，用理由信息增强当前的偏好学习方法。我们的综合分析突出了理由如何增强学习效率。大量实验表明，理由丰富的偏好学习提供了多重优势：它提高了数据效率，加速了收敛到性能更高的模型，并减少了啰嗦偏见和幻觉。此外，这个框架足够灵活，可以与各种偏好优化算法集成。总的来说，我们的发现突出了重新构想偏好学习数据设计的潜力，表明即使是免费提供的机器生成的理由也可以在多个维度显著提升性能。代码存储库可在https://github.com/reds-lab/preference-learning-with-rationales找到。

更新时间: 2024-07-19 17:27:52

领域: cs.LG

下载: http://arxiv.org/abs/2407.14477v1

Check-Eval: A Checklist-based Approach for Evaluating Text Quality

Evaluating the quality of text generated by large language models (LLMs) remains a significant challenge. Traditional metrics often fail to align well with human judgments, particularly in tasks requiring creativity and nuance. In this paper, we propose Check-Eval, a novel evaluation framework leveraging LLMs to assess the quality of generated text through a checklist-based approach. Check-Eval can be employed as both a reference-free and reference-dependent evaluation method, providing a structured and interpretable assessment of text quality. The framework consists of two main stages: checklist generation and checklist evaluation. We validate Check-Eval on two benchmark datasets: Portuguese Legal Semantic Textual Similarity and SummEval. Our results demonstrate that Check-Eval achieves higher correlations with human judgments compared to existing metrics, such as G-Eval and GPTScore, underscoring its potential as a more reliable and effective evaluation framework for natural language generation tasks. The code for our experiments is available at https://anonymous.4open.science/r/check-eval-0DB4.

Updated: 2024-07-19 17:14:16

标题: Check-Eval: 一种基于清单的评估文本质量的方法

摘要: 评估大型语言模型（LLMs）生成的文本质量仍然是一个重要挑战。传统指标常常无法很好地与人类判断相一致，特别是在需要创造力和细微差别的任务中。在本文中，我们提出了Check-Eval，一种利用LLMs评估生成文本质量的新颖评估框架，通过基于清单的方法。Check-Eval可以作为无参考和有参考的评估方法，为文本质量提供结构化和可解释的评估。该框架包括两个主要阶段：清单生成和清单评估。我们在两个基准数据集上验证了Check-Eval：葡萄牙法律语义文本相似性和SummEval。我们的结果表明，与现有指标（如G-Eval和GPTScore）相比，Check-Eval与人类判断之间的相关性更高，突显了它作为自然语言生成任务更可靠和有效的评估框架的潜力。我们实验的代码可在https://anonymous.4open.science/r/check-eval-0DB4 上获得。

更新时间: 2024-07-19 17:14:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.14467v1

Discovering environments with XRM

Environment annotations are essential for the success of many out-of-distribution (OOD) generalization methods. Unfortunately, these are costly to obtain and often limited by human annotators' biases. To achieve robust generalization, it is essential to develop algorithms for automatic environment discovery within datasets. Current proposals, which divide examples based on their training error, suffer from one fundamental problem. These methods introduce hyper-parameters and early-stopping criteria, which require a validation set with human-annotated environments, the very information subject to discovery. In this paper, we propose Cross-Risk-Minimization (XRM) to address this issue. XRM trains twin networks, each learning from one random half of the training data, while imitating confident held-out mistakes made by its sibling. XRM provides a recipe for hyper-parameter tuning, does not require early-stopping, and can discover environments for all training and validation data. Algorithms built on top of XRM environments achieve oracle worst-group-accuracy, addressing a long-standing challenge in OOD generalization. Code available at \url{https://github.com/facebookresearch/XRM}.

Updated: 2024-07-19 17:08:00

标题: 发现具有XRM的环境

摘要: 环境注释对许多超出分布（OOD）泛化方法的成功至关重要。不幸的是，这些注释获取成本高昂，通常受限于人类注释者的偏见。为了实现强大的泛化能力，必须开发用于在数据集中自动发现环境的算法。当前的提议根据它们的训练错误将示例分为不同组，存在一个根本问题。这些方法引入了超参数和早停止标准，这需要一个具有人类注释环境的验证集，这些信息正是需要发现的。在本文中，我们提出了Cross-Risk-Minimization（XRM）来解决这个问题。XRM训练双网络，每个网络从训练数据的随机一半中学习，同时模仿由其同伴做出的自信的挽留错误。XRM提供了用于超参数调整的方法，不需要早停止，并且可以为所有训练和验证数据发现环境。基于XRM环境构建的算法实现了oracle最差组准确性，解决了OOD泛化中长期存在的挑战。代码可在\url{https://github.com/facebookresearch/XRM}找到。

更新时间: 2024-07-19 17:08:00

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2309.16748v2

SurvReLU: Inherently Interpretable Survival Analysis via Deep ReLU Networks

Survival analysis models time-to-event distributions with censorship. Recently, deep survival models using neural networks have dominated due to their representational power and state-of-the-art performance. However, their "black-box" nature hinders interpretability, which is crucial in real-world applications. In contrast, "white-box" tree-based survival models offer better interpretability but struggle to converge to global optima due to greedy expansion. In this paper, we bridge the gap between previous deep survival models and traditional tree-based survival models through deep rectified linear unit (ReLU) networks. We show that a deliberately constructed deep ReLU network (SurvReLU) can harness the interpretability of tree-based structures with the representational power of deep survival models. Empirical studies on both simulated and real survival benchmark datasets show the effectiveness of the proposed SurvReLU in terms of performance and interoperability. The code is available at \href{https://github.com/xs018/SurvReLU}{\color{magenta}{ https://github.com/xs018/SurvReLU}}.

Updated: 2024-07-19 17:06:03

标题: SurvReLU：通过深度ReLU网络实现固有可解释的生存分析

摘要: 存活分析模型基于具有审查的时间分布。最近，使用神经网络的深度存活模型由于其表征能力和最先进的性能而占据主导地位。然而，它们的“黑匣子”性质阻碍了可解释性，在现实世界应用中至关重要。相比之下，“白匣子”基于树的存活模型提供更好的可解释性，但由于贪婪扩展而难以收敛到全局最优解。在本文中，我们通过深度修正线性单元（ReLU）网络弥合了以前深度存活模型和传统基于树的存活模型之间的差距。我们展示了一个经过精心构建的深度ReLU网络（SurvReLU）可以利用基于树的结构的可解释性和深度存活模型的表征能力。对模拟和真实存活基准数据集的实证研究表明，提出的SurvReLU在性能和互操作性方面具有有效性。代码可在\href{https://github.com/xs018/SurvReLU}{\color{magenta}{https://github.com/xs018/SurvReLU}}找到。

更新时间: 2024-07-19 17:06:03

领域: cs.LG

下载: http://arxiv.org/abs/2407.14463v1

PolyFormer: Scalable Node-wise Filters via Polynomial Graph Transformer

Spectral Graph Neural Networks have demonstrated superior performance in graph representation learning. However, many current methods focus on employing shared polynomial coefficients for all nodes, i.e., learning node-unified filters, which limits the filters' flexibility for node-level tasks. The recent DSF attempts to overcome this limitation by learning node-wise coefficients based on positional encoding. However, the initialization and updating process of the positional encoding are burdensome, hindering scalability on large-scale graphs. In this work, we propose a scalable node-wise filter, PolyAttn. Leveraging the attention mechanism, PolyAttn can directly learn node-wise filters in an efficient manner, offering powerful representation capabilities. Building on PolyAttn, we introduce the whole model, named PolyFormer. In the lens of Graph Transformer models, PolyFormer, which calculates attention scores within nodes, shows great scalability. Moreover, the model captures spectral information, enhancing expressiveness while maintaining efficiency. With these advantages, PolyFormer offers a desirable balance between scalability and expressiveness for node-level tasks. Extensive experiments demonstrate that our proposed methods excel at learning arbitrary node-wise filters, showing superior performance on both homophilic and heterophilic graphs, and handling graphs containing up to 100 million nodes. The code is available at https://github.com/air029/PolyFormer.

Updated: 2024-07-19 17:01:41

标题: PolyFormer：通过多项式图变换器实现可扩展的节点级过滤器

摘要: 谱图神经网络在图表示学习中表现出优越性能。然而，许多当前方法专注于为所有节点使用共享的多项式系数，即学习节点统一的滤波器，这限制了滤波器对节点级任务的灵活性。最近的DSF试图通过基于位置编码学习节点-wise系数来克服这一限制。然而，位置编码的初始化和更新过程繁琐，阻碍了在大规模图上的可扩展性。在这项工作中，我们提出了一种可扩展的节点-wise滤波器PolyAttn。利用注意力机制，PolyAttn可以以高效的方式直接学习节点-wise滤波器，提供强大的表示能力。基于PolyAttn，我们引入了命名为PolyFormer的整个模型。在图变换器模型的视角下，PolyFormer计算节点内的注意力分数，展现出很好的可扩展性。此外，该模型捕获了谱信息，增强了表现力同时保持了效率。凭借这些优势，PolyFormer为节点级任务提供了可扩展性和表现力之间的理想平衡。大量实验证明，我们提出的方法在学习任意节点-wise滤波器方面表现出色，在同源图和异源图上表现优越，并处理包含多达1亿个节点的图。代码可在https://github.com/air029/PolyFormer获得。

更新时间: 2024-07-19 17:01:41

领域: cs.LG

下载: http://arxiv.org/abs/2407.14459v1

Regression prediction algorithm for energy consumption regression in cloud computing based on horned lizard algorithm optimised convolutional neural network-bidirectional gated recurrent unit

For this paper, a prediction study of cloud computing energy consumption was conducted by optimising the data regression algorithm based on the horned lizard optimisation algorithm for Convolutional Neural Networks-Bi-Directional Gated Recurrent Units. Firstly, through Spearman correlation analysis of CPU, usage, memory usage, network traffic, power consumption, number of instructions executed, execution time and energy efficiency, we found that power consumption has the highest degree of positive correlation with energy efficiency, while CPU usage has the highest degree of negative correlation with energy efficiency. In our experiments, we introduced a random forest model and an optimisation model based on the horned lizard optimisation algorithm for testing, and the results show that the optimisation algorithm has better prediction results compared to the random forest model. Specifically, the mean square error (MSE) of the optimisation algorithm is 0.01 smaller than that of the random forest model, and the mean absolute error (MAE) is 0.01 smaller than that of the random forest.3 The results of the combined metrics show that the optimisation algorithm performs more accurately and reliably in predicting energy efficiency. This research result provides new ideas and methods to improve the energy efficiency of cloud computing systems. This research not only expands the scope of application in the field of cloud computing, but also provides a strong support for improving the energy use efficiency of the system.

Updated: 2024-07-19 16:19:14

标题: 《基于角蜥算法优化的卷积神经网络-双向门控循环单元的云计算能耗回归预测算法》

摘要: 本文通过优化基于角蜥蜴优化算法的卷积神经网络-双向门控循环单元的数据回归算法，进行了一项云计算能耗预测研究。首先，通过对CPU使用率、内存使用率、网络流量、功耗、执行指令数量、执行时间和能效的Spearman相关性分析，我们发现功耗与能效之间具有最高程度的正相关性，而CPU使用率与能效之间具有最高程度的负相关性。在实验中，我们引入了一个基于角蜥蜴优化算法的随机森林模型和优化模型进行测试，结果显示与随机森林模型相比，优化算法具有更好的预测结果。具体来说，优化算法的均方误差(MSE)比随机森林模型小0.01，均值绝对误差(MAE)比随机森林模型小0.01。综合指标的结果显示，优化算法在预测能效方面表现更准确可靠。这一研究结果为改善云计算系统的能效提供了新的思路和方法。这项研究不仅扩大了在云计算领域的应用范围，还为提高系统能源利用效率提供了强有力的支持。

更新时间: 2024-07-19 16:19:14

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.14575v1

SlowPerception: Physical-World Latency Attack against Visual Perception in Autonomous Driving

Autonomous Driving (AD) systems critically depend on visual perception for real-time object detection and multiple object tracking (MOT) to ensure safe driving. However, high latency in these visual perception components can lead to significant safety risks, such as vehicle collisions. While previous research has extensively explored latency attacks within the digital realm, translating these methods effectively to the physical world presents challenges. For instance, existing attacks rely on perturbations that are unrealistic or impractical for AD, such as adversarial perturbations affecting areas like the sky, or requiring large patches that obscure most of a camera's view, thus making them impossible to be conducted effectively in the real world. In this paper, we introduce SlowPerception, the first physical-world latency attack against AD perception, via generating projector-based universal perturbations. SlowPerception strategically creates numerous phantom objects on various surfaces in the environment, significantly increasing the computational load of Non-Maximum Suppression (NMS) and MOT, thereby inducing substantial latency. Our SlowPerception achieves second-level latency in physical-world settings, with an average latency of 2.5 seconds across different AD perception systems, scenarios, and hardware configurations. This performance significantly outperforms existing state-of-the-art latency attacks. Additionally, we conduct AD system-level impact assessments, such as vehicle collisions, using industry-grade AD systems with production-grade AD simulators with a 97% average rate. We hope that our analyses can inspire further research in this critical domain, enhancing the robustness of AD systems against emerging vulnerabilities.

Updated: 2024-07-19 16:16:50

标题: SlowPerception：自动驾驶中针对视觉感知的物理世界延迟攻击

摘要: 自动驾驶（AD）系统在实时目标检测和多目标跟踪（MOT）方面严重依赖视觉感知，以确保安全驾驶。然而，这些视觉感知组件的高延迟可能导致重大安全风险，如车辆碰撞。虽然先前的研究已广泛探讨了数字领域内的延迟攻击，但有效地将这些方法转化到物理世界中面临挑战。例如，现有攻击依赖于对AD不切实际或不切实际的扰动，如影响天空的对抗性扰动，或需要遮挡大部分摄像头视野的大块补丁，因此在现实世界中无法有效进行。在本文中，我们介绍了SlowPerception，这是针对AD感知的首个物理世界延迟攻击，通过生成基于投影仪的通用扰动。SlowPerception策略性地在环境中的各种表面上创建大量幻影对象，显著增加了非极大值抑制（NMS）和MOT的计算负载，从而引发了显著的延迟。我们的SlowPerception在物理世界设置中实现了二级延迟，在不同AD感知系统、场景和硬件配置下的平均延迟为2.5秒。这一性能明显优于现有的最新延迟攻击。此外，我们使用具有97%平均率的生产级AD模拟器对行业级AD系统进行了AD系统级影响评估，如车辆碰撞。我们希望我们的分析能够激发这一关键领域的进一步研究，增强AD系统对新兴漏洞的弹性。

更新时间: 2024-07-19 16:16:50

领域: cs.CV,cs.CR

下载: http://arxiv.org/abs/2406.05800v2

MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective

The growing richness of large-scale datasets has been crucial in driving the rapid advancement and wide adoption of machine learning technologies. The massive collection and usage of data, however, pose an increasing risk for people's private and sensitive information due to either inadvertent mishandling or malicious exploitation. Besides legislative solutions, many technical approaches have been proposed towards data privacy protection. However, they bear various limitations such as leading to degraded data availability and utility, or relying on heuristics and lacking solid theoretical bases. To overcome these limitations, we propose a formal information-theoretic definition for this utility-preserving privacy protection problem, and design a data-driven learnable data transformation framework that is capable of selectively suppressing sensitive attributes from target datasets while preserving the other useful attributes, regardless of whether or not they are known in advance or explicitly annotated for preservation. We provide rigorous theoretical analyses on the operational bounds for our framework, and carry out comprehensive experimental evaluations using datasets of a variety of modalities, including facial images, voice audio clips, and human activity motion sensor signals. Results demonstrate the effectiveness and generalizability of our method under various configurations on a multitude of tasks. Our code is available at https://github.com/jpmorganchase/MaSS.

Updated: 2024-07-19 16:10:00

标题: MaSS：基于信息论的多属性选择性抑制，用于保留效用的数据转换

摘要: 大规模数据集的增长对推动机器学习技术的快速发展和广泛应用至关重要。然而，数据的大规模收集和使用也带来了对个人私密和敏感信息的风险，可能是由于无意的处理不当或恶意利用。除了立法解决方案外，许多技术方法已被提出用于数据隐私保护。然而，它们存在诸多限制，如导致数据可用性和效用的降低，或依赖启发法而缺乏坚实的理论基础。为了克服这些限制，我们提出了一个对保护实用性的隐私保护问题进行形式信息论定义，并设计了一个数据驱动的可学习数据转换框架，能够选择性地从目标数据集中抑制敏感属性，同时保留其他有用属性，无论这些属性是否事先已知或明确注释为保留。我们对我们的框架的运行边界进行了严格的理论分析，并使用各种模态的数据集进行了全面的实验评估，包括面部图像、语音音频片段和人体活动运动传感器信号。结果表明，在多项任务的各种配置下，我们的方法的有效性和普适性。我们的代码可在https://github.com/jpmorganchase/MaSS 上找到。

更新时间: 2024-07-19 16:10:00

领域: cs.LG

下载: http://arxiv.org/abs/2405.14981v2

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

Sparse autoencoders (SAEs) are a promising unsupervised approach for identifying causally relevant and interpretable linear features in a language model's (LM) activations. To be useful for downstream tasks, SAEs need to decompose LM activations faithfully; yet to be interpretable the decomposition must be sparse -- two objectives that are in tension. In this paper, we introduce JumpReLU SAEs, which achieve state-of-the-art reconstruction fidelity at a given sparsity level on Gemma 2 9B activations, compared to other recent advances such as Gated and TopK SAEs. We also show that this improvement does not come at the cost of interpretability through manual and automated interpretability studies. JumpReLU SAEs are a simple modification of vanilla (ReLU) SAEs -- where we replace the ReLU with a discontinuous JumpReLU activation function -- and are similarly efficient to train and run. By utilising straight-through-estimators (STEs) in a principled manner, we show how it is possible to train JumpReLU SAEs effectively despite the discontinuous JumpReLU function introduced in the SAE's forward pass. Similarly, we use STEs to directly train L0 to be sparse, instead of training on proxies such as L1, avoiding problems like shrinkage.

Updated: 2024-07-19 16:07:19

标题: 跳跃前进：利用JumpReLU稀疏自动编码器提高重建保真度

摘要: 稀疏自编码器（SAEs）是一种有前途的无监督方法，用于识别语言模型（LM）激活中因果关系相关且可解释的线性特征。为了在下游任务中有用，SAEs需要忠实地分解LM激活；然而为了可解释性，分解必须是稀疏的——这两个目标存在紧张关系。在本文中，我们介绍了JumpReLU SAEs，在Gemma 2 9B激活中实现了给定稀疏水平上的最先进重构保真度，相较于最近的其他进展如门控和TopK SAEs。我们还展示了这一改进并未以可解释性为代价，通过手动和自动可解释性研究。JumpReLU SAEs是对普通（ReLU）SAEs的简单修改，其中我们将ReLU替换为不连续的JumpReLU激活函数，并且训练和运行效率类似。通过有原则地利用直通估计器（STEs），我们展示了尽管在SAE的前向传递中引入了不连续的JumpReLU函数，但仍然可以有效地训练JumpReLU SAEs。同样，我们使用STEs直接训练L0使之稀疏，而不是在L1等代理上进行训练，避免了收缩等问题。

更新时间: 2024-07-19 16:07:19

领域: cs.LG

下载: http://arxiv.org/abs/2407.14435v1

Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data

The growing use of machine learning (ML) has raised concerns that an ML model may reveal private information about an individual who has contributed to the training dataset. To prevent leakage of sensitive data, we consider using differentially-private (DP), synthetic training data instead of real training data to train an ML model. A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution. Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data, for continuous and Lipschitz loss functions. We perform extensive experimentation alongside our theoretical results.

Updated: 2024-07-19 16:01:49

标题: 用边缘保留、差分隐私合成数据训练的线性模型的过度风险界定

摘要: 机器学习（ML）的不断增长引起了人们对ML模型可能泄露有关为训练数据集做出贡献的个人的私人信息的担忧。为了防止敏感数据泄漏，我们考虑使用差分私有（DP），合成训练数据来训练ML模型，而不是真实训练数据。合成数据的一个关键优点是能够保留原始分布的低阶边际。我们的主要贡献包括对在这种合成数据上训练的线性模型的过度经验风险的新型上下界，适用于连续和Lipschitz损失函数。我们在理论结果之外进行了大量实验。

更新时间: 2024-07-19 16:01:49

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2402.04375v2

The Extrapolation Power of Implicit Models

In this paper, we investigate the extrapolation capabilities of implicit deep learning models in handling unobserved data, where traditional deep neural networks may falter. Implicit models, distinguished by their adaptability in layer depth and incorporation of feedback within their computational graph, are put to the test across various extrapolation scenarios: out-of-distribution, geographical, and temporal shifts. Our experiments consistently demonstrate significant performance advantage with implicit models. Unlike their non-implicit counterparts, which often rely on meticulous architectural design for each task, implicit models demonstrate the ability to learn complex model structures without the need for task-specific design, highlighting their robustness in handling unseen data.

Updated: 2024-07-19 16:01:37

标题: 隐式模型的外推能力

摘要: 在这篇论文中，我们调查了隐式深度学习模型在处理未观察到的数据时的外推能力，传统的深度神经网络可能会失败。隐式模型以其在层深度上的适应性和在计算图中引入反馈而著称，我们在各种外推场景中对其进行了测试：超出分布、地理和时间变化。我们的实验一致表明隐式模型具有显著的性能优势。与常常依赖于每个任务的精心设计的非隐式对应物不同，隐式模型展示了学习复杂模型结构的能力，而无需特定于任务的设计，突显了它们在处理未见数据方面的健壮性。

更新时间: 2024-07-19 16:01:37

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.14430v1

Automated Gateways: A Smart Contract-Powered Solution for Interoperability Across Blockchains

Interoperability is a significant challenge in blockchain technology, hindering seamless data and service sharing across diverse blockchain networks. This study introduces Automated Gateways as a novel framework leveraging smart contracts to facilitate interoperability. Unlike existing solutions, which often require adopting new technologies or relying on external services, Automated Gateways framework is integrated directly with a blockchain's core infrastructure to enhance systems with built-in interoperability features. By implementing fine-grained access control mechanisms, smart contracts within this framework manage accessibility and authorization for cross-chain interactions and facilitate streamlining the selective sharing of services between blockchains. Our evaluation demonstrates the framework's capability to handle cross-chain interactions efficiently, significantly reduce operational complexities, and uphold transactional integrity and security across different blockchain networks. With its focus on user-friendliness, self-managed permissions, and independence from external platforms, this framework is designed to achieve broader adoption within the blockchain community.

Updated: 2024-07-19 15:59:28

标题: 自动门户：一种基于智能合约的区块链间互操作性解决方案

摘要: 互操作性是区块链技术中的一个重要挑战，阻碍了不同区块链网络之间无缝数据和服务共享。本研究引入了自动门户作为一种利用智能合约促进互操作性的新框架。与现有解决方案不同，这些解决方案通常需要采用新技术或依赖外部服务，自动门户框架直接集成到区块链的核心基础设施中，以增强系统具备内置的互操作性特性。通过实现细粒度的访问控制机制，该框架中的智能合约管理跨链交互的可访问性和授权，并促进了在区块链之间进行选择性共享服务的流程化。我们的评估展示了该框架有效处理跨链交互、显著降低运营复杂性，并在不同区块链网络间维护交易的完整性和安全性。该框架注重用户友好性、自我管理的权限和独立于外部平台，旨在实现更广泛的区块链社区采用。

更新时间: 2024-07-19 15:59:28

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2407.13001v2

From Principles to Practices: Lessons Learned from Applying Partnership on AI's (PAI) Synthetic Media Framework to 11 Use Cases

2023 was the year the world woke up to generative AI, and 2024 is the year policymakers are responding more firmly. Importantly, this policy momentum is taking place alongside real world creation and distribution of synthetic media. Social media platforms, news organizations, dating apps, image generation companies, and more are already navigating a world of AI-generated visuals and sounds, already changing hearts and minds, as policymakers try to catch up. How, then, can AI governance capture the complexity of the synthetic media landscape? How can it attend to synthetic media's myriad uses, ranging from storytelling to privacy preservation, to deception, fraud, and defamation, taking into account the many stakeholders involved in its development, creation, and distribution? And what might it mean to govern synthetic media in a manner that upholds the truth while bolstering freedom of expression? What follows is the first known collection of diverse examples of the implementation of synthetic media governance that responds to these questions, specifically through Partnership on AI's (PAI) Responsible Practices for Synthetic Media - a voluntary, normative Framework for creating, distributing, and building technology for synthetic media responsibly, launched in February 2023. In this paper, we present a case bank of real world examples that help operationalize the Framework - highlighting areas synthetic media governance can be applied, augmented, expanded, and refined for use, in practice. Read together, the cases emphasize distinct elements of AI policymaking and seven emergent best practices supporting transparency, safety, expression, and digital dignity online: consent, disclosure, and differentiation between harmful and creative use cases.

Updated: 2024-07-19 15:57:35

标题: 从原则到实践：应用合作伙伴关系在人工智能(PAI)合成媒体框架上的经验教训：以11个使用案例为例

摘要: 2023年是世界开始关注生成式人工智能的一年，2024年是决策者更加坚定地回应的一年。重要的是，这种政策势头正在与合成媒体的真实创作和分发同时发生。社交媒体平台、新闻机构、约会应用程序、图像生成公司等已经在处理由人工智能生成的视觉和声音的世界，已经改变了人们的心智，而政策制定者正在努力跟上步伐。那么，AI治理如何能够把握合成媒体领域的复杂性呢？它如何能够关注合成媒体的多种用途，从讲故事到保护隐私，再到欺骗、欺诈和诽谤，考虑到参与其开发、创作和分发的众多利益相关者？并且，如何才能在维护真相的同时增强言论自由来治理合成媒体？接下来是首个已知的多样化示例集，展示了对这些问题做出回应的合成媒体治理实施，特别是通过AI合作伙伴关系（PAI）的合成媒体负责实践框架 - 一个自愿的、规范的框架，用于负责地创建、分发和构建合成媒体技术，于2023年2月推出。在本文中，我们提供了一个真实世界示例库，帮助操作化这一框架 - 突出合成媒体治理可以应用、增强、扩展和完善的领域，以供实践使用。综合阅读这些案例，强调了人工智能政策制定的不同要素和支持透明、安全、表达和在线数字尊严的七大新兴最佳实践：同意、披露以及区分有害和创造性用例。

更新时间: 2024-07-19 15:57:35

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2407.13025v2

TTP-Based Cyber Resilience Index: A Probabilistic Quantitative Approach to Measure Defence Effectiveness Against Cyber Attacks

In the dynamic cyber threat landscape, effective decision-making under uncertainty is crucial for maintaining robust information security. This paper introduces the Cyber Resilience Index (CRI), a TTP-based probabilistic approach to quantifying an organisation's defence effectiveness against cyber-attacks (campaigns). Building upon the Threat-Intelligence Based Security Assessment (TIBSA) methodology, we present a mathematical model that translates complex threat intelligence into an actionable, unified metric similar to a stock market index, that executives can understand and interact with while teams can act upon. Our method leverages Partially Observable Markov Decision Processes (POMDPs) to simulate attacker behaviour considering real-world uncertainties and the latest threat actor tactics, techniques, and procedures (TTPs). This allows for dynamic, context-aware evaluation of an organization's security posture, moving beyond static compliance-based assessments. As a result, decision-makers are equipped with a single metric of cyber resilience that bridges the gap between quantitative and qualitative assessments, enabling data-driven resource allocation and strategic planning. This can ultimately lead to more informed decision-making, mitigate under or overspending, and assist in resource allocation.

Updated: 2024-07-19 15:56:01

标题: 基于TTP的网络韧性指数：一种概率量化方法来衡量抵御网络攻击的有效性

摘要: 在动态的网络威胁环境中，有效地在不确定性下做出决策对于维护强大的信息安全至关重要。本文介绍了网络弹性指数（CRI），这是一种基于TTP的概率方法，用于量化组织对网络攻击（攻击活动）的防御效果。基于威胁情报安全评估（TIBSA）方法，我们提出了一个数学模型，将复杂的威胁情报转化为可操作的、类似于股票市场指数的统一指标，高管可以理解和互动，团队可以采取行动。我们的方法利用部分可观测马尔可夫决策过程（POMDPs）来模拟攻击者行为，考虑现实世界的不确定性和最新的威胁行为者战术、技术和程序（TTPs）。这允许对组织的安全姿态进行动态、上下文感知的评估，超越静态的基于合规性的评估。因此，决策者可以获得一种网络弹性的单一指标，弥合了定量和定性评估之间的差距，实现数据驱动的资源分配和战略规划。这最终可以带来更加明智的决策，减轻过度或不足的支出，并协助资源分配。

更新时间: 2024-07-19 15:56:01

领域: cs.CR

下载: http://arxiv.org/abs/2406.19374v2

Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference

We present two neural network approaches that approximate the solutions of static and dynamic conditional optimal transport (COT) problems. Both approaches enable conditional sampling and conditional density estimation, which are core tasks in Bayesian inference$\unicode{x2013}$particularly in the simulation-based ("likelihood-free") setting. Our methods represent the target conditional distributions as transformations of a tractable reference distribution and, therefore, fall into the framework of measure transport. Although many measure transport approaches model the transformation as COT maps, obtaining the map is computationally challenging, even in moderate dimensions. To improve scalability, our numerical algorithms use neural networks to parameterize COT maps and further exploit the structure of the COT problem. Our static approach approximates the map as the gradient of a partially input-convex neural network. It uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. Our dynamic approach approximates the conditional optimal transport via the flow map of a regularized neural ODE; compared to the static approach, it is slower to train but offers more modeling choices and can lead to faster sampling. We demonstrate both algorithms numerically, comparing them with competing state-of-the-art approaches, using benchmark datasets and simulation-based Bayesian inverse problems.

Updated: 2024-07-19 15:55:46

标题: 高效的神经网络方法用于条件最优传输，在贝叶斯推断中的应用

摘要: 我们提出了两种神经网络方法，用于近似静态和动态条件最优输运（COT）问题的解决方案。这两种方法都可以进行条件抽样和条件密度估计，这是贝叶斯推断中的核心任务，特别是在基于模拟的（“无似然”）设置中。我们的方法将目标条件分布表示为可处理的参考分布的变换，因此属于测度输运框架。尽管许多测度输运方法将变换建模为COT映射，但即使在适度维度中，获得映射也具有计算挑战。为了提高可扩展性，我们的数值算法使用神经网络来参数化COT映射，并进一步利用COT问题的结构。我们的静态方法将映射近似为部分输入凸神经网络的梯度。它使用一种新颖的数值实现，与最先进的替代方法相比，可以提高计算效率。我们的动态方法通过正则化神经ODE的流图来近似条件最优输运；与静态方法相比，它训练速度较慢，但提供更多建模选择，并且可能导致更快的抽样。我们通过数值演示了这两种算法，使用基准数据集和基于模拟的贝叶斯逆问题，将它们与竞争的最先进方法进行了比较。

更新时间: 2024-07-19 15:55:46

领域: stat.ML,cs.LG,62F15, 62M45

下载: http://arxiv.org/abs/2310.16975v2

Mixture of Experts with Mixture of Precisions for Tuning Quality of Service

The increasing demand for deploying large Mixture-of-Experts (MoE) models in resource-constrained environments necessitates efficient approaches to address their high memory and computational requirements challenges. Moreover, given that tasks come in different user-defined constraints and the available resources change over time in multi-tenant environments, it is necessary to design an approach which provides a flexible configuration space. This paper presents an adaptive serving approach for the efficient deployment of MoE models, capitalizing on partial quantization of the experts. By dynamically determining the number of quantized experts and their distribution across CPU and GPU, our approach explores the Pareto frontier and offers a fine-grained range of configurations for tuning throughput and model quality. Our evaluation on an NVIDIA A100 GPU using a Mixtral 8x7B MoE model for three language modelling benchmarks demonstrates that the throughput of token generation can be adjusted from 0.63 to 13.00 token per second. This enhancement comes with a marginal perplexity increase of 2.62 to 2.80, 6.48 to 7.24, and 3.24 to 3.53 for WikiText2, PTB, and C4 datasets respectively under maximum quantization. These results highlight the practical applicability of our approach in dynamic and accuracy-sensitive applications where both memory usage and output quality are important.

Updated: 2024-07-19 15:42:49

标题: 专家混合模型与精度混合模型在调整服务质量中的应用

摘要: 随着在资源受限环境中部署大型专家混合模型（MoE）的需求不断增加，需要高效的方法来解决其高内存和计算需求挑战。此外，考虑到任务具有不同的用户定义约束，并且在多租户环境中可用资源随时间变化，必须设计一种提供灵活配置空间的方法。本文提出了一种自适应服务方法，用于高效部署MoE模型，利用专家的部分量化。通过动态确定量化专家的数量及其在CPU和GPU上的分布，我们的方法探索帕累托前沿，并为调整吞吐量和模型质量提供了精细的配置范围。我们在NVIDIA A100 GPU上评估了一个Mixtral 8x7B MoE模型，用于三种语言建模基准测试，结果显示令牌生成的吞吐量可以从每秒0.63个调整到13.00个。在最大量化下，对于WikiText2、PTB和C4数据集，这种增强带来的困惑度增加分别为2.62至2.80、6.48至7.24和3.24至3.53。这些结果突显了我们方法在动态和对输出质量敏感的应用中的实际适用性，其中内存使用和输出质量都很重要。

更新时间: 2024-07-19 15:42:49

领域: cs.DC,cs.AI,cs.LG,cs.PF

下载: http://arxiv.org/abs/2407.14417v1

System-1.x: Learning to Balance Fast and Slow Planning with Language Models

Language models can be used to solve long-horizon planning problems in two distinct modes: a fast 'System-1' mode, directly generating plans without any explicit search or backtracking, and a slow 'System-2' mode, planning step-by-step by explicitly searching over possible actions. While System-2 is typically more effective, it is also more computationally expensive, making it infeasible for long plans or large action spaces. Moreover, isolated System-1 or 2 ignores the user's end goals, failing to provide ways to control the model's behavior. To this end, we propose the System-1.x Planner, a controllable planning framework with LLMs that is capable of generating hybrid plans and balancing between the two planning modes based on the difficulty of the problem at hand. System-1.x consists of (i) a controller, (ii) a System-1 Planner, and (iii) a System-2 Planner. Based on a user-specified hybridization factor (x) governing the mixture between System-1 and 2, the controller decomposes a problem into sub-goals, and classifies them as easy or hard to be solved by either System-1 or 2, respectively. We fine-tune all three components on top of a single base LLM, requiring only search traces as supervision. Experiments with two diverse planning tasks -- Maze Navigation and Blocksworld -- show that our System-1.x Planner outperforms a System-1 Planner, a System-2 Planner trained to approximate A* search, and also a symbolic planner (A*). We demonstrate the following key properties of our planner: (1) controllability: increasing the hybridization factor (e.g., System-1.75 vs 1.5) performs more search, improving performance, (2) flexibility: by building a neuro-symbolic variant with a neural System-1 and a symbolic System-2, we can use existing symbolic methods, and (3) generalizability: by being able to learn from different search algorithms, our method is robust to the choice of search algorithm.

Updated: 2024-07-19 15:40:59

标题: System-1.x：利用语言模型学习平衡快速和缓慢规划

摘要: 语言模型可以用于解决长期规划问题，有两种不同的模式：快速的“系统-1”模式，直接生成计划而无需任何显式搜索或回溯，以及慢速的“系统-2”模式，通过显式搜索可能的行动逐步规划。虽然系统-2通常更有效，但也更耗费计算资源，使其在长期计划或大型行动空间中不可行。此外，孤立的系统-1或2忽略了用户的最终目标，未能提供控制模型行为的方法。为此，我们提出了System-1.x Planner，这是一个具有LLMs的可控规划框架，能够生成混合计划并根据手头问题的难度在两种规划模式之间平衡。System-1.x包括（i）一个控制器，（ii）一个系统-1规划器和（iii）一个系统-2规划器。基于用户指定的混合化因子（x）控制系统-1和2之间的混合，控制器将问题分解为子目标，并将其分类为易解或难解，分别由系统-1或2解决。我们在单个基础LLM之上对这三个组件进行微调，只需要搜索迹作为监督。通过两个不同的规划任务——迷宫导航和Blocksworld的实验表明，我们的System-1.x Planner优于系统-1 Planner、训练以逼近A*搜索的系统-2 Planner以及符号规划器（A*）。我们展示了我们的规划器的以下关键特性：（1）可控性：增加混合化因子（例如，System-1.75 vs 1.5）进行更多搜索，提高性能；（2）灵活性：通过构建一个神经符号变体，具有神经系统-1和符号系统-2，我们可以使用现有的符号方法；（3）泛化性：通过能够从不同的搜索算法中学习，我们的方法对搜索算法的选择具有鲁棒性。

更新时间: 2024-07-19 15:40:59

领域: cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.14414v1

DEAL: Disentangle and Localize Concept-level Explanations for VLMs

Large pre-trained Vision-Language Models (VLMs) have become ubiquitous foundational components of other models and downstream tasks. Although powerful, our empirical results reveal that such models might not be able to identify fine-grained concepts. Specifically, the explanations of VLMs with respect to fine-grained concepts are entangled and mislocalized. To address this issue, we propose to DisEntAngle and Localize (DEAL) the concept-level explanations for VLMs without human annotations. The key idea is encouraging the concept-level explanations to be distinct while maintaining consistency with category-level explanations. We conduct extensive experiments and ablation studies on a wide range of benchmark datasets and vision-language models. Our empirical results demonstrate that the proposed method significantly improves the concept-level explanations of the model in terms of disentanglability and localizability. Surprisingly, the improved explainability alleviates the model's reliance on spurious correlations, which further benefits the prediction accuracy.

Updated: 2024-07-19 15:39:19

标题: DEAL: 为VLMs分离和定位概念级解释

摘要: 大型预训练的视觉-语言模型（VLMs）已经成为其他模型和下游任务的普遍基础组件。尽管强大，但我们的实证结果显示，这种模型可能无法识别细粒度概念。具体来说，关于细粒度概念的VLMs的解释是混乱的并且错位的。为了解决这个问题，我们提出了一种不需要人工注释的DEAL（DisEntAngle and Localize）概念级解释的方法。关键思想是鼓励概念级解释保持独特性，同时与类别级解释保持一致。我们在各种基准数据集和视觉-语言模型上进行了大量实验和消融研究。我们的实证结果表明，所提出的方法在概念级解释的分离性和定位性方面显著改善了模型。令人惊讶的是，改进的可解释性减轻了模型对偶然相关性的依赖，进一步提高了预测准确性。

更新时间: 2024-07-19 15:39:19

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.14412v1

HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context

This work explores the in-context learning capabilities of State Space Models (SSMs) and presents, to the best of our knowledge, the first theoretical explanation of a possible underlying mechanism. We introduce a novel weight construction for SSMs, enabling them to predict the next state of any dynamical system after observing previous states without parameter fine-tuning. This is accomplished by extending the HiPPO framework to demonstrate that continuous SSMs can approximate the derivative of any input signal. Specifically, we find an explicit weight construction for continuous SSMs and provide an asymptotic error bound on the derivative approximation. The discretization of this continuous SSM subsequently yields a discrete SSM that predicts the next state. Finally, we demonstrate the effectiveness of our parameterization empirically. This work should be an initial step toward understanding how sequence models based on SSMs learn in context.

Updated: 2024-07-19 15:34:25

标题: HiPPO-Prophecy：状态空间模型可证明地在上下文中学习动态系统

摘要: 这项工作探讨了状态空间模型（SSMs）的上下文学习能力，并据我们所知，首次提出了可能的潜在机制的理论解释。我们引入了一种新颖的权重构造方法，使得SSMs能够在观察到先前状态后预测任何动态系统的下一个状态，而无需参数微调。通过将HiPPO框架扩展，我们证明连续的SSMs可以近似任何输入信号的导数。具体来说，我们找到了连续SSMs的显式权重构造，并提供了导数近似的渐近误差界限。对这种连续SSM的离散化随后产生了一个可以预测下一个状态的离散SSM。最后，我们通过经验证明了参数化的有效性。这项工作应该是理解基于SSMs的序列模型如何在上下文中学习的初步步骤。

更新时间: 2024-07-19 15:34:25

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.09375v2

The Vision of Autonomic Computing: Can LLMs Make It a Reality?

The Vision of Autonomic Computing (ACV), proposed over two decades ago, envisions computing systems that self-manage akin to biological organisms, adapting seamlessly to changing environments. Despite decades of research, achieving ACV remains challenging due to the dynamic and complex nature of modern computing systems. Recent advancements in Large Language Models (LLMs) offer promising solutions to these challenges by leveraging their extensive knowledge, language understanding, and task automation capabilities. This paper explores the feasibility of realizing ACV through an LLM-based multi-agent framework for microservice management. We introduce a five-level taxonomy for autonomous service maintenance and present an online evaluation benchmark based on the Sock Shop microservice demo project to assess our framework's performance. Our findings demonstrate significant progress towards achieving Level 3 autonomy, highlighting the effectiveness of LLMs in detecting and resolving issues within microservice architectures. This study contributes to advancing autonomic computing by pioneering the integration of LLMs into microservice management frameworks, paving the way for more adaptive and self-managing computing systems. The code will be made available at https://aka.ms/ACV-LLM.

Updated: 2024-07-19 15:30:32

标题: 自主计算的愿景：LLMs能使其变为现实吗？

摘要: 自动计算（ACV）的愿景，提出了二十多年前，设想计算系统自我管理，类似于生物有机体，能够无缝地适应不断变化的环境。尽管经过几十年的研究，实现ACV仍然具有挑战性，这是由于现代计算系统的动态和复杂性。最近大型语言模型（LLMs）的进步为这些挑战提供了有希望的解决方案，利用它们的广泛知识、语言理解和任务自动化能力。本文通过基于LLM的多代理框架探讨了实现ACV的可行性，用于微服务管理。我们引入了一个五级分类法，用于自主服务维护，并基于Sock Shop微服务演示项目提出了一个在线评估基准，以评估我们的框架性能。我们的研究结果表明，朝着实现第三级自治性取得了显著进展，突显了LLMs在检测和解决微服务架构中问题的有效性。这项研究通过将LLMs集成到微服务管理框架中，推动了自动计算的进步，为更具适应性和自我管理的计算系统铺平了道路。代码将在https://aka.ms/ACV-LLM上提供。

更新时间: 2024-07-19 15:30:32

领域: cs.AI,cs.CL,cs.DC,cs.MA,cs.SE

下载: http://arxiv.org/abs/2407.14402v1

On the Impact of PRB Load Uncertainty Forecasting for Sustainable Open RAN

The transition to sustainable Open Radio Access Network (O-RAN) architectures brings new challenges for resource management, especially in predicting the utilization of Physical Resource Block (PRB)s. In this paper, we propose a novel approach to characterize the PRB load using probabilistic forecasting techniques. First, we provide background information on the O-RAN architecture and components and emphasize the importance of energy/power consumption models for sustainable implementations. The problem statement highlights the need for accurate PRB load prediction to optimize resource allocation and power efficiency. We then investigate probabilistic forecasting techniques, including Simple-Feed-Forward (SFF), DeepAR, and Transformers, and discuss their likelihood model assumptions. The simulation results show that DeepAR estimators predict the PRBs with less uncertainty and effectively capture the temporal dependencies in the dataset compared to SFF- and Transformer-based models, leading to power savings. Different percentile selections can also increase power savings, but at the cost of over-/under provisioning. At the same time, the performance of the Long-Short Term Memory (LSTM) is shown to be inferior to the probabilistic estimators with respect to all error metrics. Finally, we outline the importance of probabilistic, prediction-based characterization for sustainable O-RAN implementations and highlight avenues for future research.

Updated: 2024-07-19 15:25:20

标题: 关于PRB负载不确定性预测对可持续开放式无线接入网络的影响

摘要: 转向可持续的开放无线接入网络（O-RAN）架构为资源管理带来了新的挑战，特别是在预测物理资源块（PRB）的利用方面。本文提出了一种利用概率预测技术对PRB负载进行特征化的新方法。首先，我们提供了关于O-RAN架构和组件的背景信息，并强调了能源/功耗模型对可持续实现的重要性。问题陈述强调了准确的PRB负载预测对于优化资源分配和功率效率的必要性。然后，我们调查了包括Simple-Feed-Forward（SFF）、DeepAR和Transformers在内的概率预测技术，并讨论了它们的可能性模型假设。模拟结果显示，DeepAR估计器比基于SFF和Transformer的模型对PRB的预测具有更少的不确定性，并有效地捕捉了数据集中的时间依赖性，从而实现了节能。不同的百分位数选择也可以增加节能，但同时会造成过度/欠配置的代价。与此同时，长短期记忆（LSTM）的性能被证明不如概率估计器在所有误差度量方面。最后，我们概述了基于概率预测的特征化对于可持续的O-RAN实现的重要性，并突出了未来研究的方向。

更新时间: 2024-07-19 15:25:20

领域: cs.NI,cs.AI,cs.DC,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2407.14400v1

Wildfire Risk Prediction: A Review

Wildfires have significant impacts on global vegetation, wildlife, and humans. They destroy plant communities and wildlife habitats and contribute to increased emissions of carbon dioxide, nitrogen oxides, methane, and other pollutants. The prediction of wildfires relies on various independent variables combined with regression or machine learning methods. In this technical review, we describe the options for independent variables, data processing techniques, models, independent variables collinearity and importance estimation methods, and model performance evaluation metrics. First, we divide the independent variables into 4 aspects, including climate and meteorology conditions, socio-economical factors, terrain and hydrological features, and wildfire historical records. Second, preprocessing methods are described for different magnitudes, different spatial-temporal resolutions, and different formats of data. Third, the collinearity and importance evaluation methods of independent variables are also considered. Fourth, we discuss the application of statistical models, traditional machine learning models, and deep learning models in wildfire risk prediction. In this subsection, compared with other reviews, this manuscript particularly discusses the evaluation metrics and recent advancements in deep learning methods. Lastly, addressing the limitations of current research, this paper emphasizes the need for more effective deep learning time series forecasting algorithms, the utilization of three-dimensional data including ground and trunk fuel, extraction of more accurate historical fire point data, and improved model evaluation metrics.

Updated: 2024-07-19 15:25:15

标题: 野火风险预测：一项综述

摘要: 野火对全球植被、野生动物和人类造成重大影响。它们摧毁植物群落和野生动物栖息地，并导致二氧化碳、氮氧化物、甲烷等排放物的增加。预测野火依赖于各种独立变量结合回归或机器学习方法。在这篇技术评论中，我们描述了独立变量的选项、数据处理技术、模型、独立变量共线性和重要性评估方法以及模型性能评估指标。首先，我们将独立变量分为4个方面，包括气候和气象条件、社会经济因素、地形和水文特征以及野火历史记录。其次，描述了不同幅度、不同空间-时间分辨率和不同格式数据的预处理方法。第三，还考虑了独立变量的共线性和重要性评估方法。第四，我们讨论了在野火风险预测中应用统计模型、传统机器学习模型和深度学习模型。在本小节中，与其他评论相比，本文特别讨论了评估指标和深度学习方法的最新进展。最后，针对当前研究的局限性，本文强调了需要更有效的深度学习时间序列预测算法、利用三维数据包括地面和树干燃料、提取更准确的历史火点数据以及改进模型评估指标的必要性。

更新时间: 2024-07-19 15:25:15

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.01607v2

Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks

Bayesian neural networks (BNN) promise to combine the predictive performance of neural networks with principled uncertainty modeling important for safety-critical systems and decision making. However, posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult. This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights. In this paper, we address a fundamental issue with such function-space VI approaches pointed out by Burt et al. (2020), who showed that the objective function (ELBO) is negative infinite for most priors of interest. Our solution builds on generalized VI (Knoblauch et al., 2019) with the regularized KL divergence (Quang, 2019) and is, to the best of our knowledge, the first well-defined variational objective for function-space inference in BNNs with Gaussian process (GP) priors. Experiments show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets, and provides competitive uncertainty estimates for regression, classification and out-of-distribution detection compared to BNN baselines with both function and weight-space priors.

Updated: 2024-07-19 15:19:55

标题: 贝叶斯神经网络中明确定义的函数空间变分推断的正则化KL散度

摘要: 贝叶斯神经网络（BNN）承诺结合神经网络的预测性能和对安全关键系统和决策制定至关重要的原则性不确定性建模。然而，后验不确定性估计取决于先验的选择，并且在权重空间中找到信息丰富的先验已被证明是困难的。这激发了变分推断（VI）方法，其直接将先验放在BNN生成的函数上，而不是在权重上。在本文中，我们解决了Burt等人（2020）指出的这种函数空间VI方法的一个基本问题，他们表明对于大多数感兴趣的先验，目标函数（ELBO）是负无穷的。我们的解决方案基于广义VI（Knoblauch等人，2019）和正则化KL散度（Quang，2019），据我们所知，这是第一个对GP先验的BNN中的函数空间推断提供了明确定义的变分目标。实验表明，我们的方法在合成和小型真实数据集上包含了GP先验指定的属性，并且与具有函数和权重空间先验的BNN基线相比，为回归、分类和超出分布检测提供了竞争力强的不确定性估计。

更新时间: 2024-07-19 15:19:55

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2406.04317v2

TTT: A Temporal Refinement Heuristic for Tenuously Tractable Discrete Time Reachability Problems

Reachable set computation is an important tool for analyzing control systems. Simulating a control system can show that the system is generally functioning as desired, but a formal tool like reachability analysis can provide a guarantee of correctness. For linear systems, reachability analysis is straightforward and fast, but as more complex components are added to the control system such as nonlinear dynamics or a neural network controller, reachability analysis may slow down or become overly conservative. To address these challenges, much literature has focused on spatial refinement, e.g., tuning the discretization of the input sets and intermediate reachable sets. However, this paper addresses a different dimension: temporal refinement. The basic idea of temporal refinement is to automatically choose when along the horizon of the reachability problem to execute slow symbolic queries which incur less approximation error versus fast concrete queries which incur more approximation error. Temporal refinement can be combined with other refinement approaches and offers an additional ``tuning knob'' with which to trade off tractability and tightness in approximate reachable set computation. Here, we introduce an automatic framework for performing temporal refinement and we demonstrate the effectiveness of this technique on computing approximate reachable sets for nonlinear systems with neural network control policies. We demonstrate the calculation of reachable sets of varying approximation error under varying computational budget and show that our algorithm is able to generate approximate reachable sets with a similar amount of error to the baseline approach in 20-70% less time.

Updated: 2024-07-19 15:16:25

标题: TTT：一种用于繁琐可处理的离散时间可达性问题的时间细化启发式算法

摘要: 可达集计算是分析控制系统的重要工具。模拟控制系统可以显示系统通常按预期方式运作，但类似可达性分析的形式工具可以提供正确性保证。对于线性系统，可达性分析是直接且快速的，但是当控制系统添加更复杂的组件，如非线性动态或神经网络控制器时，可达性分析可能会变慢或变得过于保守。为了解决这些挑战，许多文献集中在空间细化上，例如，调整输入集和中间可达集的离散化。然而，本文讨论了一个不同的维度：时间细化。时间细化的基本思想是自动选择在可达性问题的时间范围内何时执行慢速符号查询，这些查询产生较少的近似误差，而快速具体查询则产生更多的近似误差。时间细化可以与其他细化方法结合，并提供一个额外的“调节旋钮”，用于在近似可达集计算中权衡可处理性和紧凑性。在这里，我们介绍了一个执行时间细化的自动框架，并展示了该技术在计算具有神经网络控制策略的非线性系统的近似可达集时的有效性。我们展示了在不同计算预算下，计算具有不同近似误差的可达集，并展示了我们的算法能够以比基准方法少20-70%的时间生成具有相似误差量的近似可达集。

更新时间: 2024-07-19 15:16:25

领域: eess.SY,cs.AI,cs.LO,cs.SY

下载: http://arxiv.org/abs/2407.14394v1

The Future of Large Language Model Pre-training is Federated

Generative pre-trained large language models (LLMs) have demonstrated impressive performance over a wide range of tasks, thanks to the unprecedented amount of data they have been trained on. As established scaling laws indicate, LLMs' future performance improvement depends on the amount of computing and data sources they can leverage for pre-training. Federated learning (FL) has the potential to unleash the majority of the planet's data and computational resources, which are underutilized by the data-center-focused training methodology of current LLM practice. Our work presents a robust, flexible, reproducible FL approach that enables large-scale collaboration across institutions to train LLMs. We propose a scalable deployment system called Photon to enable the investigation and development of this new training paradigm for LLM pre-training. We show that Photon can be used by organizations interested in collaborating with their private data sources and computational resources for pre-training LLMs with billions of parameters. This paradigm would mobilize more computational and data resources while matching or potentially exceeding centralized performance. We further show the effectiveness of the federated training scales with model size and present our approach for training a billion-scale federated LLM using limited resources. Finally, we show that LLM training is highly resilient to the classical challenges of federated statistical and hardware heterogeneity. Furthermore, we show that convergence is robust to partial participation, opening the avenue for compute-efficient collaborative training. Photon will help data-rich actors to become the protagonists of LLMs pre-training instead of leaving the stage to compute-rich actors alone.

Updated: 2024-07-19 15:16:17

标题: 大型语言模型预训练的未来是联邦的

摘要: 预训练的大型生成式语言模型（LLMs）在广泛任务上展示了出色的性能，这要归功于它们接受了前所未有的大量数据训练。正如已经建立的缩放定律所指出的那样，LLMs未来的性能改进取决于它们可以利用的计算量和数据来源。联邦学习（FL）有潜力释放地球上大多数数据和计算资源，这些资源在当前LLM实践的数据中心集中培训方法中被低估利用。我们的工作提出了一种强大、灵活、可复制的FL方法，可以促进机构之间进行大规模协作训练LLMs。我们提出了一个可扩展的部署系统称为Photon，以促进对LLM预训练的这种新训练范式的研究和开发。我们展示了Photon可以被有兴趣与他们的私人数据来源和计算资源合作进行预训练具有数十亿参数的LLMs的组织使用。这种范式将动员更多的计算和数据资源，同时匹配或可能超过集中性能。我们进一步展示了联邦训练的有效性随着模型规模的增大而增加，并展示了我们利用有限资源训练十亿规模的联邦LLM的方法。最后，我们展示了LLM训练对于联邦统计和硬件异质性的经典挑战具有高度的韧性。此外，我们展示了收敛对于部分参与是稳健的，为计算高效的协作训练开辟了道路。Photon将帮助数据丰富的参与者成为LLMs预训练的主角，而不是将舞台留给计算丰富的参与者。

更新时间: 2024-07-19 15:16:17

领域: cs.LG,cs.AI,cs.DC

下载: http://arxiv.org/abs/2405.10853v2

Do Parameters Reveal More than Loss for Membership Inference?

Membership inference attacks aim to infer whether an individual record was used to train a model, serving as a key tool for disclosure auditing. While such evaluations are useful to demonstrate risk, they are computationally expensive and often make strong assumptions about potential adversaries' access to models and training environments, and thus do not provide very tight bounds on leakage from potential attacks. We show how prior claims around black-box access being sufficient for optimal membership inference do not hold for most useful settings such as stochastic gradient descent, and that optimal membership inference indeed requires white-box access. We validate our findings with a new white-box inference attack IHA (Inverse Hessian Attack) that explicitly uses model parameters by taking advantage of computing inverse-Hessian vector products. Our results show that both audits and adversaries may be able to benefit from access to model parameters, and we advocate for further research into white-box methods for membership privacy auditing.

Updated: 2024-07-19 15:13:45

标题: 参数是否比损失更多地揭示了成员推理？

摘要: 成员推理攻击旨在推断个人记录是否用于训练模型，作为揭示审计的关键工具。虽然这种评估对于展示风险很有用，但它们在计算上很昂贵，并且通常对潜在对手对模型和训练环境的访问做出强烈假设，因此并不能提供关于潜在攻击泄漏的非常严格的界限。我们展示了先前关于黑盒访问足以实现最佳成员推理的主张在大多数有用的设置（如随机梯度下降）中不成立，并且最佳成员推理确实需要白盒访问。我们用一种新的白盒推理攻击IHA（逆Hessian攻击）验证了我们的发现，该攻击明确利用模型参数，利用计算逆Hessian向量乘积。我们的结果显示，审计和对手都可能从访问模型参数中受益，我们主张进一步研究白盒方法用于成员隐私审计。

更新时间: 2024-07-19 15:13:45

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2406.11544v2

Honest Computing: Achieving demonstrable data lineage and provenance for driving data and process-sensitive policies

Data is the foundation of any scientific, industrial or commercial process. Its journey typically flows from collection to transport, storage, management and processing. While best practices and regulations guide data management and protection, recent events have underscored its vulnerability. Academic research and commercial data handling have been marred by scandals, revealing the brittleness of data management. Data, despite its importance, is susceptible to undue disclosures, leaks, losses, manipulation, or fabrication. These incidents often occur without visibility or accountability, necessitating a systematic structure for safe, honest, and auditable data management. In this paper, we introduce the concept of Honest Computing as the practice and approach that emphasizes transparency, integrity, and ethical behaviour within the realm of computing and technology. It ensures that computer systems and software operate honestly and reliably without hidden agendas, biases, or unethical practices. It enables privacy and confidentiality of data and code by design and by default. We also introduce a reference framework to achieve demonstrable data lineage and provenance, contrasting it with Secure Computing, a related but differently-orientated form of computing. At its core, Honest Computing leverages Trustless Computing, Confidential Computing, Distributed Computing, Cryptography and AAA security concepts. Honest Computing opens new ways of creating technology-based processes and workflows which permit the migration of regulatory frameworks for data protection from principle-based approaches to rule-based ones. Addressing use cases in many fields, from AI model protection and ethical layering to digital currency formation for finance and banking, trading, and healthcare, this foundational layer approach can help define new standards for appropriate data custody and processing.

Updated: 2024-07-19 15:13:42

标题: 诚实计算：实现可证明的数据渊源和来源，以驱动数据和过程敏感政策

摘要: 数据是任何科学、工业或商业过程的基础。它的流程通常从收集到传输、存储、管理和处理。尽管最佳实践和法规指导数据管理和保护，但最近的事件突显了其脆弱性。学术研究和商业数据处理受到丑闻的困扰，揭示了数据管理的脆弱性。尽管数据非常重要，但容易受到不当披露、泄露、丢失、篡改或伪造的影响。这些事件经常发生在没有可见性或问责制的情况下，需要建立一个系统结构，用于安全、诚实和可审计的数据管理。在本文中，我们介绍了诚实计算的概念，作为在计算和技术领域强调透明、诚信和道德行为的实践和方法。它确保计算机系统和软件在没有隐藏议程、偏见或不道德做法的情况下诚实可靠地运行。它通过设计和默认方式实现数据和代码的隐私和保密性。我们还介绍了一个参考框架，以实现可证明的数据谱系和出处，将其与安全计算相对比，后者是一种相关但定位不同的计算形式。在其核心，诚实计算利用了无信任计算、保密计算、分布式计算、密码学和AAA安全概念。诚实计算开启了创造技术为基础的流程和工作流程的新途径，允许将数据保护的监管框架从基于原则的方法转变为基于规则的方法。通过解决许多领域的使用案例，从AI模型保护和道德层次到金融和银行业的数字货币形成、交易和医疗保健，这种基础层次的方法可以帮助定义适当数据保管和处理的新标准。

更新时间: 2024-07-19 15:13:42

领域: cs.CY,cs.CR,H.1.0; H.4.2; K.5.1

下载: http://arxiv.org/abs/2407.14390v1

GLAudio Listens to the Sound of the Graph

We propose GLAudio: Graph Learning on Audio representation of the node features and the connectivity structure. This novel architecture propagates the node features through the graph network according to the discrete wave equation and then employs a sequence learning architecture to learn the target node function from the audio wave signal. This leads to a new paradigm of learning on graph-structured data, in which information propagation and information processing are separated into two distinct steps. We theoretically characterize the expressivity of our model, introducing the notion of the receptive field of a vertex, and investigate our model's susceptibility to over-smoothing and over-squashing both theoretically as well as experimentally on various graph datasets.

Updated: 2024-07-19 15:13:22

标题: GLAudio倾听图形的声音

摘要: 我们提出GLAudio：在节点特征和连接结构的音频表示上进行图学习。这种新颖的架构根据离散波动方程将节点特征通过图网络传播，然后采用序列学习架构从音频波形信号中学习目标节点功能。这导致了一种新的图结构数据学习范式，其中信息传播和信息处理分为两个不同的步骤。我们理论上表征了我们模型的表达能力，引入了顶点的感受域的概念，并在各种图数据集上从理论和实验两方面研究了我们模型对过度平滑和过度压缩的敏感性。

更新时间: 2024-07-19 15:13:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.14387v1

Frontiers of Deep Learning: From Novel Application to Real-World Deployment

Deep learning continues to re-shape numerous fields, from natural language processing and imaging to data analytics and recommendation systems. This report studies two research papers that represent recent progress on deep learning from two largely different aspects: The first paper applied the transformer networks, which are typically used in language models, to improve the quality of synthetic aperture radar image by effectively reducing the speckle noise. The second paper presents an in-storage computing design solution to enable cost-efficient and high-performance implementations of deep learning recommendation systems. In addition to summarizing each paper in terms of motivation, key ideas and techniques, and evaluation results, this report also presents thoughts and discussions about possible future research directions. By carrying out in-depth study on these two representative papers and related references, this doctoral candidate has developed better understanding on the far-reaching impact and efficient implementation of deep learning models.

Updated: 2024-07-19 15:11:55

标题: 深度学习的前沿：从新应用到实际部署

摘要: 深度学习继续重塑许多领域，从自然语言处理和图像处理到数据分析和推荐系统。本报告研究了两篇代表深度学习最新进展的研究论文，从两个大不相同的方面：第一篇论文应用了通常用于语言模型的变压器网络，通过有效减少斑点噪声来提高合成孔径雷达图像的质量。第二篇论文提出了一种存储计算设计解决方案，以实现深度学习推荐系统的成本效益和高性能实现。除了总结每篇论文的动机、关键思想和技术以及评估结果，本报告还提出了关于可能未来研究方向的思考和讨论。通过深入研究这两篇代表性论文和相关参考文献，这位博士候选人对深度学习模型的深远影响和高效实现有了更好的理解。

更新时间: 2024-07-19 15:11:55

领域: cs.LG

下载: http://arxiv.org/abs/2407.14386v1

The Sticky Path to Expressive Querying: Decidability of Navigational Queries under Existential Rules

Extensive research in the field of ontology-based query answering has led to the identification of numerous fragments of existential rules (also known as tuple-generating dependencies) that exhibit decidable answering of atomic and conjunctive queries. Motivated by the increased theoretical and practical interest in navigational queries, this paper considers the question for which of these fragments decidability of querying extends to regular path queries (RPQs). In fact, decidability of RPQs has recently been shown to generally hold for the comprehensive family of all fragments that come with the guarantee of universal models being reasonably well-shaped (that is, being of finite cliquewidth). Yet, for the second major family of fragments, known as finite unification sets (short: fus), which are based on first-order-rewritability, corresponding results have been largely elusive so far. We complete the picture by showing that RPQ answering over arbitrary fus rulesets is undecidable. On the positive side, we establish that the problem is decidable for the prominent fus subclass of sticky rulesets, with the caveat that a very mild extension of the RPQ formalism turns the problem undecidable again.

Updated: 2024-07-19 15:11:09

标题: 通往表达式查询的粘性路径：存在规则下导航查询的可决定性

摘要: 在基于本体的查询回答领域进行了广泛的研究，已经确定了许多存在规则片段（也称为元组生成依赖关系），这些片段展现了原子和合取查询的可判定回答。由于导航查询的理论和实际兴趣增加，本文考虑了这些片段中哪些片段的查询可判定性扩展到正则路径查询（RPQs）。事实上，最近已经证明了RPQs的可判定性通常适用于具有合理形状的所有片段的综合家族（即，有限团宽）。然而，对于第二类重要片段家族，即基于一阶可重写性的有限统一集合（简称为fus），相应的结果迄今为止大部分难以理解。我们通过展示在任意fus规则集上的RPQ回答是不可判定的来完善这一图景。在积极的一面，我们证明了这个问题对于显著的fus子类别粘性规则集来说是可判定的，但需要注意的是，对RPQ形式主义的非常轻微的扩展会使问题再次变得不可判定。

更新时间: 2024-07-19 15:11:09

领域: cs.DB,cs.AI,cs.LO

下载: http://arxiv.org/abs/2407.14384v1

Improving GBDT Performance on Imbalanced Datasets: An Empirical Study of Class-Balanced Loss Functions

Class imbalance remains a significant challenge in machine learning, particularly for tabular data classification tasks. While Gradient Boosting Decision Trees (GBDT) models have proven highly effective for such tasks, their performance can be compromised when dealing with imbalanced datasets. This paper presents the first comprehensive study on adapting class-balanced loss functions to three GBDT algorithms across various tabular classification tasks, including binary, multi-class, and multi-label classification. We conduct extensive experiments on multiple datasets to evaluate the impact of class-balanced losses on different GBDT models, establishing a valuable benchmark. Our results demonstrate the potential of class-balanced loss functions to enhance GBDT performance on imbalanced datasets, offering a robust approach for practitioners facing class imbalance challenges in real-world applications. Additionally, we introduce a Python package that facilitates the integration of class-balanced loss functions into GBDT workflows, making these advanced techniques accessible to a wider audience.

Updated: 2024-07-19 15:10:46

标题: 在不平衡数据集上改善GBDT性能：类平衡损失函数的实证研究

摘要: 类别不平衡仍然是机器学习中的一个重要挑战，特别是对于表格数据分类任务。虽然梯度提升决策树（GBDT）模型在这些任务中已被证明非常有效，但在处理不平衡数据集时，它们的性能可能会受到影响。本文首次对三种GBDT算法在各种表格分类任务中采用类别平衡损失函数进行了全面研究，包括二元分类、多类分类和多标签分类。我们在多个数据集上进行了大量实验，评估了类别平衡损失对不同GBDT模型的影响，建立了一个有价值的基准。我们的结果表明，类别平衡损失函数有可能增强GBDT在不平衡数据集上的性能，为在现实应用中面临类别不平衡挑战的从业者提供了一个强大的方法。此外，我们还推出了一个Python包，可以方便地将类别平衡损失函数集成到GBDT工作流程中，使这些先进技术能够被更广泛的人群使用。

更新时间: 2024-07-19 15:10:46

领域: cs.LG

下载: http://arxiv.org/abs/2407.14381v1

Enhancing Cloud-Native Resource Allocation with Probabilistic Forecasting Techniques in O-RAN

The need for intelligent and efficient resource provisioning for the productive management of resources in real-world scenarios is growing with the evolution of telecommunications towards the 6G era. Technologies such as Open Radio Access Network (O-RAN) can help to build interoperable solutions for the management of complex systems. Probabilistic forecasting, in contrast to deterministic single-point estimators, can offer a different approach to resource allocation by quantifying the uncertainty of the generated predictions. This paper examines the cloud-native aspects of O-RAN together with the radio App (rApp) deployment options. The integration of probabilistic forecasting techniques as a rApp in O-RAN is also emphasized, along with case studies of real-world applications. Through a comparative analysis of forecasting models using the error metric, we show the advantages of Deep Autoregressive Recurrent network (DeepAR) over other deterministic probabilistic estimators. Furthermore, the simplicity of Simple-Feed-Forward (SFF) leads to a fast runtime but does not capture the temporal dependencies of the input data. Finally, we present some aspects related to the practical applicability of cloud-native O-RAN with probabilistic forecasting.

Updated: 2024-07-19 15:04:15

标题: 利用概率预测技术增强O-RAN中的云原生资源分配

摘要: 随着电信技术向6G时代发展，对于在现实场景中智能高效地调配资源以实现资源的生产性管理的需求正在增长。诸如开放式无线接入网络（O-RAN）等技术可以帮助构建用于管理复杂系统的可互操作解决方案。与确定性单点估计器相比，概率预测可以通过量化生成的预测的不确定性，提供资源分配的不同方法。本文研究了O-RAN的云原生方面以及无线应用（rApp）的部署选项。强调将概率预测技术作为rApp整合到O-RAN中，以及真实应用案例。通过使用误差度量进行预测模型的比较分析，我们展示了深度自回归递归网络（DeepAR）相对于其他确定性概率估计器的优势。此外，简单前馈（SFF）的简单性导致运行时间快，但无法捕捉输入数据的时间依赖性。最后，我们介绍了与云原生O-RAN与概率预测的实际适用性相关的一些方面。

更新时间: 2024-07-19 15:04:15

领域: cs.NI,cs.AI,cs.DC,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2407.14377v1

On the use of Probabilistic Forecasting for Network Analysis in Open RAN

Unlike other single-point Artificial Intelligence (AI)-based prediction techniques, such as Long-Short Term Memory (LSTM), probabilistic forecasting techniques (e.g., DeepAR and Transformer) provide a range of possible outcomes and associated probabilities that enable decision makers to make more informed and robust decisions. At the same time, the architecture of Open RAN has emerged as a revolutionary approach for mobile networks, aiming at openness, interoperability and innovation in the ecosystem of RAN. In this paper, we propose the use of probabilistic forecasting techniques as a radio App (rApp) within the Open RAN architecture. We investigate and compare different probabilistic and single-point forecasting methods and algorithms to estimate the utilization and resource demands of Physical Resource Blocks (PRBs) of cellular base stations. Through our evaluations, we demonstrate the numerical advantages of probabilistic forecasting techniques over traditional single-point forecasting methods and show that they are capable of providing more accurate and reliable estimates. In particular, DeepAR clearly outperforms single-point forecasting techniques such as LSTM and Seasonal-Naive (SN) baselines and other probabilistic forecasting techniques such as Simple-Feed-Forward (SFF) and Transformer neural networks.

Updated: 2024-07-19 15:03:38

标题: 关于在Open RAN中利用概率预测进行网络分析的研究

摘要: 与其他基于单点人工智能（AI）预测技术（如长短期记忆网络（LSTM））不同，概率预测技术（例如DeepAR和Transformer）提供一系列可能的结果及相关概率，使决策者能够做出更明智和稳健的决策。同时，开放式无线接入网络（Open RAN）的架构已经成为移动网络的一种革命性方法，旨在实现RAN生态系统中的开放性、互操作性和创新。本文提出将概率预测技术作为一种无线应用（rApp）应用于Open RAN架构中。我们研究并比较了不同的概率和单点预测方法和算法，以估计蜂窝基站的物理资源块（PRB）的利用率和资源需求。通过我们的评估，我们展示了概率预测技术相对于传统的单点预测方法的数值优势，并表明它们能够提供更准确和可靠的估算。特别是，DeepAR明显优于单点预测技术（如LSTM和季节性-朴素（SN）基线）以及其他概率预测技术（如简单前馈（SFF）和Transformer神经网络）。

更新时间: 2024-07-19 15:03:38

领域: cs.NI,cs.AI,cs.DC,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2407.14375v1

SCoPE: Evaluating LLMs for Software Vulnerability Detection

In recent years, code security has become increasingly important, especially with the rise of interconnected technologies. Detecting vulnerabilities early in the software development process has demonstrated numerous benefits. Consequently, the scientific community started using machine learning for automated detection of source code vulnerabilities. This work explores and refines the CVEFixes dataset, which is commonly used to train models for code-related tasks, specifically the C/C++ subset. To this purpose, the Source Code Processing Engine (SCoPE), a framework composed of strategized techniques that can be used to reduce the size and normalize C/C++ functions is presented. The output generated by SCoPE was used to create a new version of CVEFixes. This refined dataset was then employed in a feature representation analysis to assess the effectiveness of the tool's code processing techniques, consisting of fine-tuning three pre-trained LLMs for software vulnerability detection. The results show that SCoPE successfully helped to identify 905 duplicates within the evaluated subset. The LLM results corroborate with the literature regarding their suitability for software vulnerability detection, with the best model achieving 53% F1-score.

Updated: 2024-07-19 15:02:00

标题: SCoPE：评估LLMs在软件漏洞检测中的应用

摘要: 近年来，代码安全性变得越来越重要，特别是随着互联技术的兴起。在软件开发过程中早期检测漏洞已经证明了许多好处。因此，科学界开始使用机器学习来自动检测源代码漏洞。本文探讨和完善了CVEFixes数据集，该数据集通常用于训练与代码相关的模型，特别是C/C++子集。为此，提出了源代码处理引擎（SCoPE），这是一个由策略化技术组成的框架，可用于减少C/C++函数的大小和标准化。SCoPE生成的输出用于创建CVEFixes的新版本。然后将这个完善的数据集用于特征表示分析，评估工具的代码处理技术的有效性，包括对三个预训练的LLM进行微调，用于软件漏洞检测。结果显示，SCoPE成功帮助识别了评估子集中的905个重复项。LLM的结果与文献中关于它们适用于软件漏洞检测的观点相一致，最佳模型达到了53%的F1分数。

更新时间: 2024-07-19 15:02:00

领域: cs.SE,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.14372v1

Open Artificial Knowledge

The tremendous success of chat-based AI systems like ChatGPT, Claude, and Gemini stems from Large Language Models (LLMs) trained on vast amount of datasets. However, acquiring high-quality, diverse, and ethically sourced training data remains a significant challenge. We introduce the Open Artificial Knowledge (OAK) dataset, a large-scale resource of over 500 million tokens (at the moment of writing) designed to address this issue. OAK leverages an ensemble of state-of-the-art LLMs, including GPT4o, LLaMa3-70B, LLaMa3-8B, Mixtral-8x7B, Gemma-7B, and Gemma-2-9B , to generate high-quality text across diverse domains, guided by Wikipedia's main categories. Our methodology ensures broad knowledge coverage while maintaining coherence and factual accuracy. The OAK dataset aims to foster the development of more capable and aligned language models while addressing critical issues of data scarcity and privacy in LLM training, and it is freely available on www.oakdataset.org.

Updated: 2024-07-19 15:01:24

标题: 开放式人工知识

摘要: 聊天式人工智能系统如ChatGPT、Claude和Gemini的巨大成功源自基于大规模语言模型（LLMs）的训练，这些模型在大量数据集上进行了训练。然而，获取高质量、多样化和符合伦理标准的训练数据仍然是一个重大挑战。我们介绍了开放人工知识（OAK）数据集，这是一个大规模资源，目前包含超过5亿个标记，旨在解决这一问题。OAK利用一组最先进的LLMs，包括GPT4o、LLaMa3-70B、LLaMa3-8B、Mixtral-8x7B、Gemma-7B和Gemma-2-9B，跨不同领域生成高质量文本，受维基百科主要类别的指导。我们的方法确保了广泛的知识覆盖范围，同时保持连贯性和事实准确性。OAK数据集旨在促进更具能力和对齐的语言模型的发展，同时解决LLM训练中的数据稀缺和隐私关键问题，并可在www.oakdataset.org免费获取。

更新时间: 2024-07-19 15:01:24

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.14371v1

Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio

Recent advancements in music generation are raising multiple concerns about the implications of AI in creative music processes, current business models and impacts related to intellectual property management. A relevant challenge is the potential replication and plagiarism of the training set in AI-generated music, which could lead to misuse of data and intellectual property rights violations. To tackle this issue, we present the Music Replication Assessment (MiRA) tool: a model-independent open evaluation method based on diverse audio music similarity metrics to assess data replication of the training set. We evaluate the ability of five metrics to identify exact replication, by conducting a controlled replication experiment in different music genres based on synthetic samples. Our results show that the proposed methodology can estimate exact data replication with a proportion higher than 10%. By introducing the MiRA tool, we intend to encourage the open evaluation of music generative models by researchers, developers and users concerning data replication, highlighting the importance of ethical, social, legal and economic consequences of generative AI in the music domain.

Updated: 2024-07-19 14:52:11

标题: 朝向使用原始音频上的音乐相似性度量评估音乐生成中的数据复制

摘要: 音乐生成领域的最新进展引发了人们对人工智能在创意音乐过程、当前商业模式和知识产权管理相关影响的多重关注。一个相关的挑战是人工智能生成音乐中训练集的潜在复制和抄袭，这可能导致数据和知识产权权利的滥用。为了解决这个问题，我们提出了音乐复制评估（MiRA）工具：一个基于多种音频音乐相似度指标的独立模型开放评估方法，用于评估训练集数据复制情况。我们通过在不同音乐流派基于合成样本进行受控复制实验来评估五个指标识别精确复制的能力。我们的结果显示，所提出的方法可以估计高于10%的精确数据复制比例。通过引入MiRA工具，我们旨在鼓励研究人员、开发者和用户就音乐生成模型的数据复制进行开放评估，强调了人工智能在音乐领域的道德、社会、法律和经济后果的重要性。

更新时间: 2024-07-19 14:52:11

领域: cs.SD,cs.AI,cs.MM,eess.AS

下载: http://arxiv.org/abs/2407.14364v1

FuzzTheREST: An Intelligent Automated Black-box RESTful API Fuzzer

Software's pervasive impact and increasing reliance in the era of digital transformation raise concerns about vulnerabilities, emphasizing the need for software security. Fuzzy testing is a dynamic analysis software testing technique that consists of feeding faulty input data to a System Under Test (SUT) and observing its behavior. Specifically regarding black-box RESTful API testing, recent literature has attempted to automate this technique using heuristics to perform the input search and using the HTTP response status codes for classification. However, most approaches do not keep track of code coverage, which is important to validate the solution. This work introduces a black-box RESTful API fuzzy testing tool that employs Reinforcement Learning (RL) for vulnerability detection. The fuzzer operates via the OpenAPI Specification (OAS) file and a scenarios file, which includes information to communicate with the SUT and the sequences of functionalities to test, respectively. To evaluate its effectiveness, the tool was tested on the Petstore API. The tool found a total of six unique vulnerabilities and achieved 55\% code coverage.

Updated: 2024-07-19 14:43:35

标题: FuzzTheREST：一个智能自动化的黑盒RESTful API模糊器

摘要: 软件在数字转型时代的无处不在的影响和日益依赖引起了对漏洞的担忧，强调了对软件安全性的需求。模糊测试是一种动态分析软件测试技术，其包括向被测系统(SUT)提供错误的输入数据并观察其行为。具体针对黑盒式RESTful API测试，最近的文献尝试使用启发式方法自动化这种技术，以进行输入搜索，并使用HTTP响应状态码进行分类。然而，大多数方法并不跟踪代码覆盖率，这对验证解决方案很重要。本文介绍了一种利用强化学习(RL)进行漏洞检测的黑盒式RESTful API模糊测试工具。该模糊器通过OpenAPI规范(OAS)文件和场景文件运行，场景文件包含与SUT通信的信息和要测试的功能序列。为了评估其有效性，该工具在Petstore API上进行了测试。该工具发现了总共六个独特的漏洞，并实现了55%的代码覆盖率。

更新时间: 2024-07-19 14:43:35

领域: cs.SE,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.14361v1

Stable Audio Open

Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

Updated: 2024-07-19 14:40:23

标题: 稳定的音频开放

摘要: 开放生成模型对社区至关重要，可以进行微调，并在推出新模型时作为基线。然而，目前大多数文本到音频模型是私有的，无法供艺术家和研究人员构建。在这里，我们描述了一个新的开放权重文本到音频模型的架构和训练过程，该模型是通过使用创意共享数据进行训练的。我们的评估表明，该模型在各种指标上的表现与最先进技术相媲美。值得注意的是，报告的FDopenl3结果（衡量生成的逼真度）展示了其在44.1kHz的高质量立体声合成的潜力。

更新时间: 2024-07-19 14:40:23

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.14358v1

Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine

This paper introduces the Pandemic PACT Advanced Categorisation Engine (PPACE) along with its associated dataset. PPACE is a fine-tuned model developed to automatically classify research abstracts from funded biomedical projects according to WHO-aligned research priorities. This task is crucial for monitoring research trends and identifying gaps in global health preparedness and response. Our approach builds on human-annotated projects, which are allocated one or more categories from a predefined list. A large language model is then used to generate `rationales' explaining the reasoning behind these annotations. This augmented data, comprising expert annotations and rationales, is subsequently used to fine-tune a smaller, more efficient model. Developed as part of the Pandemic PACT project, which aims to track and analyse research funding and clinical evidence for a wide range of diseases with outbreak potential, PPACE supports informed decision-making by research funders, policymakers, and independent researchers. We introduce and release both the trained model and the instruction-based dataset used for its training. Our evaluation shows that PPACE significantly outperforms its baselines. The release of PPACE and its associated dataset offers valuable resources for researchers in multilabel biomedical document classification and supports advancements in aligning biomedical research with key global health priorities.

Updated: 2024-07-19 14:28:26

标题: 快速生物医学研究分类：疫情PACT先进分类引擎

摘要: 本文介绍了流行病PACT高级分类引擎（PPACE）及其相关数据集。 PPACE是一个经过优化的模型，旨在根据世界卫生组织的研究优先事项自动分类资助的生物医学项目的研究摘要。这项任务对监测研究趋势并识别全球卫生应急准备和响应中的差距至关重要。我们的方法基于人工注释的项目，这些项目被分配为一个或多个预定义列表中的类别。然后使用大型语言模型生成解释这些注释背后推理的“原理”。随后使用包含专家注释和原理的增强数据来对一个更小、更高效的模型进行优化。作为旨在跟踪和分析具有爆发潜力的多种疾病的研究资助和临床证据的流行病PACT项目的一部分，PPACE支持研究资助机构、政策制定者和独立研究人员做出知情决策。我们介绍并发布了训练模型和用于训练的基于指令的数据集。我们的评估表明，PPACE明显优于其基线。PPACE及其相关数据集的发布为多标签生物医学文档分类的研究人员提供了宝贵资源，并支持将生物医学研究与关键全球卫生优先事项对齐的进展。

更新时间: 2024-07-19 14:28:26

领域: cs.CL,cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2407.10086v2

LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains

This research investigates whether OpenAI's GPT-4, a state-of-the-art large language model, can accurately classify the political bias of news sources based solely on their URLs. Given the subjective nature of political labels, third-party bias ratings like those from Ad Fontes Media, AllSides, and Media Bias/Fact Check (MBFC) are often used in research to analyze news source diversity. This study aims to determine if GPT-4 can replicate these human ratings on a seven-degree scale ("far-left" to "far-right"). The analysis compares GPT-4's classifications against MBFC's, and controls for website popularity using Open PageRank scores. Findings reveal a high correlation ($\text{Spearman's } \rho = .89$, $n = 5,877$, $p < 0.001$) between GPT-4's and MBFC's ratings, indicating the model's potential reliability. However, GPT-4 abstained from classifying approximately $\frac{2}{3}$ of the dataset, particularly less popular and less biased sources. The study also identifies a slight leftward skew in GPT-4's classifications compared to MBFC's. The analysis suggests that while GPT-4 can be a scalable, cost-effective tool for political bias classification of news websites, but its use should complement human judgment to mitigate biases. Further research is recommended to explore the model's performance across different settings, languages, and additional datasets.

Updated: 2024-07-19 14:28:07

标题: 左中右的LLMs：评估GPT从网络域名标记政治偏见的能力

摘要: 这项研究调查了OpenAI的GPT-4，这是一种最先进的大型语言模型，是否能够准确地根据新闻来源的URL来分类政治偏见。鉴于政治标签的主观性质，第三方偏见评级，如Ad Fontes Media、AllSides和Media Bias/Fact Check（MBFC）的评级经常被用于研究分析新闻来源的多样性。本研究旨在确定GPT-4是否能够在七级标度上（从“极左”到“极右”）复制这些人类评级。该分析将GPT-4的分类与MBFC进行比较，并使用Open PageRank分数控制网站的流行度。研究结果显示，GPT-4的评级与MBFC的评级之间存在很高的相关性（$\text{Spearman's } \rho = .89$, $n = 5,877$, $p < 0.001$），表明该模型具有潜在的可靠性。然而，GPT-4放弃对数据集中约$\frac{2}{3}$进行分类，特别是对较不受欢迎和较不偏见的来源。研究还发现，与MBFC相比，GPT-4的分类略微向左倾斜。分析表明，虽然GPT-4可以成为新闻网站政治偏见分类的可扩展、成本效益高的工具，但其使用应该与人类判断相结合，以减轻偏见。建议进一步研究探索该模型在不同环境、语言和额外数据集中的表现。

更新时间: 2024-07-19 14:28:07

领域: cs.CL,cs.AI,cs.CY

下载: http://arxiv.org/abs/2407.14344v1

Blockchain in Healthcare: Implementing Hyperledger Fabric for Electronic Health Records at Frere Provincial Hospital

As healthcare systems worldwide continue to grapple with the challenges of interoperability, data security, and accessibility, integrating emerging technologies becomes imperative. This paper investigates the implementation of blockchain technology, specifically Hyperledger Fabric, for Electronic Health Records (EHR) management at Frere Hospital in the Eastern Cape province of South Africa. The paper examines the benefits and challenges of integrating blockchain into healthcare information systems. Hyperledger Fabric's modular architecture is harnessed to create a secure, transparent, and decentralized platform for storing, managing, and sharing EHRs among stakeholders. The study used a mixed-methods approach, integrating case studies and data collection methods through observation and informal questions, with the specific goal of understanding current record management methods and challenges. This method offers practical insights and validates the approach. The result demonstrates the role of blockchain in transforming healthcare, framed within a rigorous exploration and analysis. The findings of this study have broader implications for healthcare institutions seeking advanced solutions to address the persistent challenges in electronic health record management. Ultimately, the research underscores the transformative potential of blockchain technology in healthcare settings, fostering trust, security, and efficiency in the management of sensitive patient data.

Updated: 2024-07-19 14:27:55

标题: 区块链在医疗保健领域的应用：在弗雷尔省立医院实施超级账本面向电子健康记录

摘要: 随着全球各地的医疗系统继续应对互操作性、数据安全性和可访问性方面的挑战，整合新兴技术变得至关重要。本文研究了在南非东开普省弗雷尔医院实施区块链技术（特别是Hyperledger Fabric）用于电子健康记录（EHR）管理。本文探讨了将区块链整合到医疗信息系统中的益处和挑战。利用Hyperledger Fabric的模块化架构，创建了一个安全、透明和去中心化的平台，用于在利益相关方之间存储、管理和共享EHR。该研究采用了混合方法，通过观察和非正式提问集成案例研究和数据收集方法，旨在了解当前记录管理方法和挑战。这种方法提供了实用见解并验证了这种方法。结果展示了区块链在转变医疗保健领域中的作用，结合了严谨的探索和分析。本研究的发现对寻求解决电子健康记录管理中持久挑战的先进解决方案的医疗机构具有更广泛的影响。最终，研究强调了区块链技术在医疗环境中的转变潜力，促进了对敏感患者数据的信任、安全性和效率管理。

更新时间: 2024-07-19 14:27:55

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2407.15876v1

Do LLMs have Consistent Values?

Values are a basic driving force underlying human behavior. Large Language Models (LLM) technology is constantly improving towards human-like dialogue. However, little research has been done to study the values exhibited in text generated by LLMs. Here we study this question by turning to the rich literature on value structure in psychology. We ask whether LLMs exhibit the same value structure that has been demonstrated in humans, including the ranking of values, and correlation between values. We show that the results of this analysis strongly depend on how the LLM is prompted, and that under a particular prompting strategy (referred to as 'Value Anchoring') the agreement with human data is quite compelling. Our results serve both to improve our understanding of values in LLMs, as well as introduce novel methods for assessing consistency in LLM responses.

Updated: 2024-07-19 14:24:47

标题: LLM是否具有一致的价值观？

摘要: 价值观是人类行为的基本驱动力。大型语言模型（LLM）技术不断改进，朝着类似于人类对话的方向发展。然而，很少有研究探讨LLMs生成文本中展示的价值观。在这里，我们通过借鉴心理学中丰富的价值结构文献来研究这个问题。我们问LLMs是否展示出在人类中已经证明的相同的价值结构，包括价值的排名和价值之间的相关性。我们表明，这项分析的结果强烈依赖于LLM如何被提示，并且在一种特定的提示策略（称为“价值锚定”）下，与人类数据的一致性非常令人信服。我们的结果既有助于改进我们对LLMs中价值观的理解，也引入了评估LLM响应一致性的新方法。

更新时间: 2024-07-19 14:24:47

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.12878v2

Quantifying the value of positive transfer: An experimental case study

In traditional approaches to structural health monitoring, challenges often arise associated with the availability of labelled data. Population-based structural health monitoring seeks to overcomes these challenges by leveraging data/information from similar structures via technologies such as transfer learning. The current paper demonstrate a methodology for quantifying the value of information transfer in the context of operation and maintenance decision-making. This demonstration, based on a population of laboratory-scale aircraft models, highlights the steps required to evaluate the expected value of information transfer including similarity assessment and prediction of transfer efficacy. Once evaluated for a given population, the value of information transfer can be used to optimise transfer-learning strategies for newly-acquired target domains.

Updated: 2024-07-19 14:23:20

标题: 量化积极迁移的价值：一项实验案例研究

摘要: 在传统的结构健康监测方法中，通常会出现与标记数据的可用性相关的挑战。基于人口的结构健康监测通过利用来自类似结构的数据/信息，借助转移学习等技术来克服这些挑战。本文展示了在运营和维护决策过程中量化信息传递价值的方法论。这项演示基于一组实验室规模的飞机模型，突显了评估信息传递期望价值所需的步骤，包括相似性评估和传递效果预测。一旦为给定人口评估完毕，信息传递的价值可以用于优化新获得的目标域的转移学习策略。

更新时间: 2024-07-19 14:23:20

领域: cs.LG

下载: http://arxiv.org/abs/2407.14342v1

Quantifying the Blockchain Trilemma: A Comparative Analysis of Algorand, Ethereum 2.0, and Beyond

Blockchain technology is essential for the digital economy and metaverse, supporting applications from decentralized finance to virtual assets. However, its potential is constrained by the "Blockchain Trilemma," which necessitates balancing decentralization, security, and scalability. This study evaluates and compares two leading proof-of-stake (PoS) systems, Algorand and Ethereum 2.0, against these critical metrics. Our research interprets existing indices to measure decentralization, evaluates scalability through transactional data, and assesses security by identifying potential vulnerabilities. Utilizing real-world data, we analyze each platform's strategies in a structured manner to understand their effectiveness in addressing trilemma challenges. The findings highlight each platform's strengths and propose general methodologies for evaluating key blockchain characteristics applicable to other systems. This research advances the understanding of blockchain technologies and their implications for the future digital economy. Data and code are available on GitHub as open source.

Updated: 2024-07-19 14:15:29

标题: 量化区块链三难题：对Algorand、以太坊2.0及其它区块链的比较分析

摘要: 区块链技术对数字经济和元宇宙至关重要，支持从去中心化金融到虚拟资产的应用。然而，其潜力受到“区块链三难题”的限制，这需要平衡去中心化、安全性和可扩展性。本研究评估并比较了两个领先的权益证明（PoS）系统Algorand和Ethereum 2.0，在这些关键指标上的表现。我们通过解释现有指标来衡量去中心化，通过交易数据评估可扩展性，并通过识别潜在漏洞来评估安全性。利用现实世界数据，我们以结构化方式分析每个平台的策略，以了解它们在解决三难题挑战方面的有效性。研究结果突出了每个平台的优势，并提出了用于评估其他系统适用的关键区块链特征的一般方法。这项研究推动了对区块链技术的理解，以及它们对未来数字经济的影响。数据和代码在GitHub上以开源形式提供。

更新时间: 2024-07-19 14:15:29

领域: econ.GN,cs.CE,cs.CR,q-fin.CP,q-fin.EC,stat.CO

下载: http://arxiv.org/abs/2407.14335v1

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection

Visual relationship detection aims to identify objects and their relationships in images. Prior methods approach this task by adding separate relationship modules or decoders to existing object detection architectures. This separation increases complexity and hinders end-to-end training, which limits performance. We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection. Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly. To extract relationship information, we introduce an attention mechanism that selects object pairs likely to form a relationship. We provide a single-stage recipe to train this model on a mixture of object and relationship detection data. Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds. We provide ablations, real-world qualitative examples, and analyses of zero-shot performance.

Updated: 2024-07-19 14:07:25

标题: 场景图ViT:端到端开放词汇视觉关系检测

摘要: 视觉关系检测的目标是在图像中识别对象及其关系。先前的方法是通过将单独的关系模块或解码器添加到现有的对象检测架构中来处理这个任务。这种分离增加了复杂性，并阻碍了端到端的训练，从而限制了性能。我们提出了一种简单且高效的无解码器架构，用于开放词汇的视觉关系检测。我们的模型由基于Transformer的图像编码器组成，将对象表示为令牌，并隐式地建模它们的关系。为了提取关系信息，我们引入了一个注意机制，选择可能形成关系的对象对。我们提供了一个单阶段的训练方法，用于在对象和关系检测数据的混合上训练该模型。我们的方法在Visual Genome和大词汇GQA基准测试上实现了最先进的关系检测性能，并以实时推断速度。我们提供了消融分析、真实世界的定性示例和零样本性能分析。

更新时间: 2024-07-19 14:07:25

领域: cs.CV,cs.CL,cs.LG,cs.RO

下载: http://arxiv.org/abs/2403.14270v2

Modality-Order Matters! A Novel Hierarchical Feature Fusion Method for CoSAm: A Code-Switched Autism Corpus

Autism Spectrum Disorder (ASD) is a complex neuro-developmental challenge, presenting a spectrum of difficulties in social interaction, communication, and the expression of repetitive behaviors in different situations. This increasing prevalence underscores the importance of ASD as a major public health concern and the need for comprehensive research initiatives to advance our understanding of the disorder and its early detection methods. This study introduces a novel hierarchical feature fusion method aimed at enhancing the early detection of ASD in children through the analysis of code-switched speech (English and Hindi). Employing advanced audio processing techniques, the research integrates acoustic, paralinguistic, and linguistic information using Transformer Encoders. This innovative fusion strategy is designed to improve classification robustness and accuracy, crucial for early and precise ASD identification. The methodology involves collecting a code-switched speech corpus, CoSAm, from children diagnosed with ASD and a matched control group. The dataset comprises 61 voice recordings from 30 children diagnosed with ASD and 31 from neurotypical children, aged between 3 and 13 years, resulting in a total of 159.75 minutes of voice recordings. The feature analysis focuses on MFCCs and extensive statistical attributes to capture speech pattern variability and complexity. The best model performance is achieved using a hierarchical fusion technique with an accuracy of 98.75% using a combination of acoustic and linguistic features first, followed by paralinguistic features in a hierarchical manner.

Updated: 2024-07-19 14:06:01

标题: 情态顺序很重要！一种新颖的层次特征融合方法用于CoSAm：一个混合码自闭症语料库

摘要: 自闭症谱系障碍（ASD）是一个复杂的神经发育挑战，表现为社交互动、沟通以及在不同情况下表达重复行为困难的一系列困难。这种不断增加的患病率凸显了ASD作为一个重要的公共卫生问题的重要性，以及需要全面的研究倡议来推动我们对该疾病及其早期检测方法的理解。本研究介绍了一种新颖的分层特征融合方法，旨在通过分析代码切换语音（英语和印地语）来增强儿童ASD的早期检测。利用先进的音频处理技术，该研究利用Transformer编码器整合声学、语用和语言信息。这种创新的融合策略旨在提高分类的鲁棒性和准确性，这对于早期和精确的ASD识别至关重要。该方法涉及从ASD诊断的儿童和配对对照组中收集代码切换语音语料库CoSAm。数据集包括30名ASD诊断儿童和31名典型儿童的61个语音录音，年龄介于3至13岁之间，总共159.75分钟的语音录音。特征分析侧重于MFCCs和广泛的统计属性，以捕捉语音模式的变异性和复杂性。使用声学和语言特征的组合首先，然后以分层方式使用语用特征，最佳模型性能达到了98.75%的准确率。

更新时间: 2024-07-19 14:06:01

领域: cs.LG

下载: http://arxiv.org/abs/2407.14328v1

Panoptic Segmentation of Mammograms with Text-To-Image Diffusion Model

Mammography is crucial for breast cancer surveillance and early diagnosis. However, analyzing mammography images is a demanding task for radiologists, who often review hundreds of mammograms daily, leading to overdiagnosis and overtreatment. Computer-Aided Diagnosis (CAD) systems have been developed to assist in this process, but their capabilities, particularly in lesion segmentation, remained limited. With the contemporary advances in deep learning their performance may be improved. Recently, vision-language diffusion models emerged, demonstrating outstanding performance in image generation and transferability to various downstream tasks. We aim to harness their capabilities for breast lesion segmentation in a panoptic setting, which encompasses both semantic and instance-level predictions. Specifically, we propose leveraging pretrained features from a Stable Diffusion model as inputs to a state-of-the-art panoptic segmentation architecture, resulting in accurate delineation of individual breast lesions. To bridge the gap between natural and medical imaging domains, we incorporated a mammography-specific MAM-E diffusion model and BiomedCLIP image and text encoders into this framework. We evaluated our approach on two recently published mammography datasets, CDD-CESM and VinDr-Mammo. For the instance segmentation task, we noted 40.25 AP0.1 and 46.82 AP0.05, as well as 25.44 PQ0.1 and 26.92 PQ0.05. For the semantic segmentation task, we achieved Dice scores of 38.86 and 40.92, respectively.

Updated: 2024-07-19 14:04:05

标题: 乳房X线照片的全视分割：基于文本到图像扩散模型

摘要: 乳腺X线摄影对于乳腺癌监测和早期诊断至关重要。然而，分析乳腺X线摄影图像对于放射科医师来说是一项繁重的任务，他们通常每天要审查数百张乳腺X线摄影，导致过度诊断和过度治疗。计算机辅助诊断（CAD）系统已经被开发出来以协助这一过程，但它们的能力，特别是在病变分割方面，仍然有限。随着深度学习的现代进展，它们的性能可能会得到改善。最近，视觉语言扩散模型出现了，展示了在图像生成和传输到各种下游任务方面出色的性能。我们的目标是利用它们的能力在一个全景设置中进行乳腺病变分割，该设置涵盖了语义和实例级别的预测。具体而言，我们提出利用来自稳定扩散模型的预训练特征作为最先进的全景分割架构的输入，从而准确地勾画出单个乳腺病变。为了弥合自然和医学成像领域之间的差距，我们将乳腺X线摄影特定的MAM-E扩散模型和BiomedCLIP图像和文本编码器整合到这一框架中。我们在两个最近发布的乳腺X线摄影数据集CDD-CESM和VinDr-Mammo上评估了我们的方法。对于实例分割任务，我们注意到AP0.1为40.25，AP0.05为46.82，以及PQ0.1为25.44，PQ0.05为26.92。对于语义分割任务，我们分别达到了38.86和40.92的Dice分数。

更新时间: 2024-07-19 14:04:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.14326v1

Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach

A major challenge facing the world is the provision of equitable and universal access to quality education. Recent advances in generative AI (gen AI) have created excitement about the potential of new technologies to offer a personal tutor for every learner and a teaching assistant for every teacher. The full extent of this dream, however, has not yet materialised. We argue that this is primarily due to the difficulties with verbalising pedagogical intuitions into gen AI prompts and the lack of good evaluation practices, reinforced by the challenges in defining excellent pedagogy. Here we present our work collaborating with learners and educators to translate high level principles from learning science into a pragmatic set of seven diverse educational benchmarks, spanning quantitative, qualitative, automatic and human evaluations; and to develop a new set of fine-tuning datasets to improve the pedagogical capabilities of Gemini, introducing LearnLM-Tutor. Our evaluations show that LearnLM-Tutor is consistently preferred over a prompt tuned Gemini by educators and learners on a number of pedagogical dimensions. We hope that this work can serve as a first step towards developing a comprehensive educational evaluation framework, and that this can enable rapid progress within the AI and EdTech communities towards maximising the positive impact of gen AI in education.

Updated: 2024-07-19 14:03:41

标题: 朝着负责任的发展方向发展生成式人工智能用于教育：一种基于评估驱动的方法。

摘要: 世界面临的一个重大挑战是提供公平和普遍获取优质教育的机会。最近生成型人工智能（gen AI）取得了进展，引起了人们对新技术为每个学习者提供个人导师和为每个教师提供教学助手的潜力的兴奋。然而，这个梦想的全部实现尚未出现。我们认为，这主要是由于将教育直觉转化为gen AI提示的困难以及缺乏良好的评估实践所致，加之定义卓越教学的挑战。在这里，我们介绍了与学习者和教育者合作，将学习科学中的高级原则转化为一个实用的七个不同教育基准的工作，并开发了一组新的微调数据集，以提高Gemini的教育能力，引入LearnLM-Tutor。我们的评估显示，教育工作者和学习者普遍更喜欢LearnLM-Tutor而不是通过提示调整的Gemini，在多个教育维度上。我们希望这项工作可以作为发展全面教育评估框架的第一步，以便AI和EdTech社区能够快速取得进展，最大限度地促进gen AI在教育中的积极影响。

更新时间: 2024-07-19 14:03:41

领域: cs.CY,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.12687v2

Truly No-Regret Learning in Constrained MDPs

Constrained Markov decision processes (CMDPs) are a common way to model safety constraints in reinforcement learning. State-of-the-art methods for efficiently solving CMDPs are based on primal-dual algorithms. For these algorithms, all currently known regret bounds allow for error cancellations -- one can compensate for a constraint violation in one round with a strict constraint satisfaction in another. This makes the online learning process unsafe since it only guarantees safety for the final (mixture) policy but not during learning. As Efroni et al. (2020) pointed out, it is an open question whether primal-dual algorithms can provably achieve sublinear regret if we do not allow error cancellations. In this paper, we give the first affirmative answer. We first generalize a result on last-iterate convergence of regularized primal-dual schemes to CMDPs with multiple constraints. Building upon this insight, we propose a model-based primal-dual algorithm to learn in an unknown CMDP. We prove that our algorithm achieves sublinear regret without error cancellations.

Updated: 2024-07-19 14:00:43

标题: 在受限制的MDPs中真正的无悔学习

摘要: 受限制的马尔可夫决策过程（CMDP）是建模强化学习中安全约束的常见方式。目前用于高效解决CMDP的方法基于原始-对偶算法。对于这些算法，所有当前已知的遗憾界限都允许错误抵消 - 可以通过在一轮中违反约束来在另一轮中严格满足约束。这使得在线学习过程不安全，因为它只能保证最终（混合）策略的安全性，而不是在学习过程中。正如Efroni等人（2020）指出的那样，一个开放的问题是如果不允许错误抵消，原始-对偶算法是否能够可以证明实现次线性遗憾。在本文中，我们给出了第一个肯定的答案。我们首先将关于正则化原始-对偶方案最后迭代收敛的结果推广到具有多个约束的CMDPs。基于这一见解，我们提出了一种基于模型的原始-对偶算法，用于在未知的CMDP中学习。我们证明我们的算法实现了无错误抵消的次线性遗憾。

更新时间: 2024-07-19 14:00:43

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.15776v3

Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models

While transformer models exhibit strong capabilities on linguistic tasks, their complex architectures make them difficult to interpret. Recent work has aimed to reverse engineer transformer models into human-readable representations called circuits that implement algorithmic functions. We extend this research by analyzing and comparing circuits for similar sequence continuation tasks, which include increasing sequences of Arabic numerals, number words, and months. By applying circuit interpretability analysis, we identify a key sub-circuit in both GPT-2 Small and Llama-2-7B responsible for detecting sequence members and for predicting the next member in a sequence. Our analysis reveals that semantically related sequences rely on shared circuit subgraphs with analogous roles. Additionally, we show that this sub-circuit has effects on various math-related prompts, such as on intervaled circuits, Spanish number word and months continuation, and natural language word problems. Overall, documenting shared computational structures enables better model behavior predictions, identification of errors, and safer editing procedures. This mechanistic understanding of transformers is a critical step towards building more robust, aligned, and interpretable language models.

Updated: 2024-07-19 13:57:52

标题: 朝向可解释的序列延续：分析大型语言模型中的共享回路

摘要: 尽管变压器模型在语言任务上表现出强大的能力，但其复杂的架构使其难以解释。最近的研究致力于将变压器模型逆向工程成人类可读的表示，称为实现算法功能的电路。我们通过分析和比较类似序列续接任务的电路来扩展这项研究，这些任务包括阿拉伯数字、数字词和月份的递增序列。通过应用电路可解释性分析，我们确定了GPT-2 Small和Llama-2-7B中负责检测序列成员并预测序列下一个成员的关键子电路。我们的分析显示，语义相关的序列依赖于具有类似角色的共享电路子图。此外，我们展示了这个子电路对各种与数学有关的提示的影响，比如间隔电路、西班牙数字词和月份的续接，以及自然语言词问题。总的来说，记录共享的计算结构能够更好地预测模型行为、识别错误，并进行更安全的编辑程序。对变压器的这种机械理解是朝着构建更稳健、对齐、可解释的语言模型迈出的关键一步。

更新时间: 2024-07-19 13:57:52

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.04131v5

Joint or Disjoint: Mixing Training Regimes for Early-Exit Models

Early exits are an important efficiency mechanism integrated into deep neural networks that allows for the termination of the network's forward pass before processing through all its layers. By allowing early halting of the inference process for less complex inputs that reached high confidence, early exits significantly reduce the amount of computation required. Early exit methods add trainable internal classifiers which leads to more intricacy in the training process. However, there is no consistent verification of the approaches of training of early exit methods, and no unified scheme of training such models. Most early exit methods employ a training strategy that either simultaneously trains the backbone network and the exit heads or trains the exit heads separately. We propose a training approach where the backbone is initially trained on its own, followed by a phase where both the backbone and the exit heads are trained together. Thus, we advocate for organizing early-exit training strategies into three distinct categories, and then validate them for their performance and efficiency. In this benchmark, we perform both theoretical and empirical analysis of early-exit training regimes. We study the methods in terms of information flow, loss landscape and numerical rank of activations and gauge the suitability of regimes for various architectures and datasets.

Updated: 2024-07-19 13:56:57

标题: 联合还是分离：混合训练模式用于早期退出模型

摘要: 早期退出是深度神经网络中集成的一种重要效率机制，允许在通过所有层之前终止网络的前向传递。通过允许对已达到高置信度的较简单输入进行早期终止推断过程，早期退出显著减少所需的计算量。早期退出方法添加可训练的内部分类器，这使得训练过程更加复杂。然而，对早期退出方法的训练方法缺乏一致的验证，也没有统一的训练模型方案。大多数早期退出方法采用同时训练骨干网络和退出头部或分别训练退出头部的训练策略。我们提出了一种训练方法，其中骨干网络最初单独进行训练，然后是骨干和退出头部一起训练的阶段。因此，我们主张将早期退出训练策略组织成三个明确的类别，并对它们的性能和效率进行验证。在这个基准测试中，我们对早期退出训练方案进行了理论和实证分析。我们从信息流、损失景观和激活的数值秩等方面研究了这些方法，并评估了不同架构和数据集的适用性。

更新时间: 2024-07-19 13:56:57

领域: cs.LG

下载: http://arxiv.org/abs/2407.14320v1

Cognitive Bias in High-Stakes Decision-Making with LLMs

Large language models (LLMs) offer significant potential as tools to support an expanding range of decision-making tasks. Given their training on human (created) data, LLMs have been shown to inherit societal biases against protected groups, as well as be subject to bias functionally resembling cognitive bias. Human-like bias can impede fair and explainable decisions made with LLM assistance. Our work introduces BiasBuster, a framework designed to uncover, evaluate, and mitigate cognitive bias in LLMs, particularly in high-stakes decision-making tasks. Inspired by prior research in psychology and cognitive science, we develop a dataset containing 16,800 prompts to evaluate different cognitive biases (e.g., prompt-induced, sequential, inherent). We test various bias mitigation strategies, amidst proposing a novel method utilising LLMs to debias their own prompts. Our analysis provides a comprehensive picture of the presence and effects of cognitive bias across commercial and open-source models. We demonstrate that our self-help debiasing effectively mitigates model answers that display patterns akin to human cognitive bias without having to manually craft examples for each bias.

Updated: 2024-07-19 13:47:15

标题: 高风险决策中的认知偏见与LLMs

摘要: 大型语言模型（LLMs）作为支持越来越多决策任务的工具具有重要潜力。由于它们在人类（创造的）数据上进行训练，已经显示出LLMs会继承针对受保护群体的社会偏见，并且会受到类似认知偏见的功能性偏见的影响。类似人类的偏见可能会妨碍LLM辅助下做出的公平且可解释的决策。我们的工作介绍了BiasBuster，这是一个旨在揭示、评估和减轻LLMs中认知偏见的框架，特别是在高风险决策任务中。受到心理学和认知科学先前研究的启发，我们开发了一个包含16,800个提示的数据集，用于评估不同的认知偏见（例如，提示诱导、顺序、固有）。我们测试了各种偏见减轻策略，同时提出了一种利用LLMs来消除其自身提示偏见的新方法。我们的分析全面展示了商业和开源模型中认知偏见的存在和影响。我们证明了我们的自助去偏见有效地减轻了显示出类似人类认知偏见模式的模型答案，而无需为每种偏见手工制作示例。

更新时间: 2024-07-19 13:47:15

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.00811v2

EmoCAM: Toward Understanding What Drives CNN-based Emotion Recognition

Convolutional Neural Networks are particularly suited for image analysis tasks, such as Image Classification, Object Recognition or Image Segmentation. Like all Artificial Neural Networks, however, they are "black box" models, and suffer from poor explainability. This work is concerned with the specific downstream task of Emotion Recognition from images, and proposes a framework that combines CAM-based techniques with Object Detection on a corpus level to better understand on which image cues a particular model, in our case EmoNet, relies to assign a specific emotion to an image. We demonstrate that the model mostly focuses on human characteristics, but also explore the pronounced effect of specific image modifications.

Updated: 2024-07-19 13:47:02

标题: EmoCAM: 探索CNN基于情绪识别的驱动因素

摘要: 卷积神经网络特别适用于图像分析任务，如图像分类、目标识别或图像分割。然而，与所有人工神经网络一样，它们是“黑匣子”模型，缺乏解释性。本文关注于从图像中识别情绪的具体下游任务，并提出了一个结合了基于CAM的技术和对象检测的框架，以更好地理解特定模型（在我们的案例中是EmoNet）依赖于哪些图像线索来为图像分配特定情绪。我们证明该模型主要关注人类特征，同时也探索了特定图像修改的显著效果。

更新时间: 2024-07-19 13:47:02

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.14314v1

How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading

Using questions in written text is an effective strategy to enhance readability. However, what makes an active reading question good, what the linguistic role of these questions is, and what is their impact on human reading remains understudied. We introduce GuidingQ, a dataset of 10K in-text questions from textbooks and scientific articles. By analyzing the dataset, we present a comprehensive understanding of the use, distribution, and linguistic characteristics of these questions. Then, we explore various approaches to generate such questions using language models. Our results highlight the importance of capturing inter-question relationships and the challenge of question position identification in generating these questions. Finally, we conduct a human study to understand the implication of such questions on reading comprehension. We find that the generated questions are of high quality and are almost as effective as human-written questions in terms of improving readers' memorization and comprehension.

Updated: 2024-07-19 13:42:56

标题: 如何吸引读者？生成引导性问题以促进积极阅读

摘要: 在书面文字中使用问题是一种增强可读性的有效策略。然而，什么使一个主动阅读问题优秀，这些问题的语言角色是什么，以及它们对人类阅读的影响仍未得到充分研究。我们介绍了GuidingQ，这是一个包含来自教科书和科学文章的1万个文本内问题的数据集。通过分析数据集，我们提供了对这些问题的使用、分布和语言特征的全面理解。然后，我们探讨了使用语言模型生成这些问题的各种方法。我们的研究结果突出了捕捉问题之间关系的重要性以及在生成这些问题时问题位置识别的挑战。最后，我们进行了一项人类研究，以了解这些问题对阅读理解的影响。我们发现，生成的问题质量很高，几乎与人类编写的问题在提高读者记忆和理解方面一样有效。

更新时间: 2024-07-19 13:42:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.14309v1

Complementary Learning for Real-World Model Failure Detection

In real-world autonomous driving, deep learning models can experience performance degradation due to distributional shifts between the training data and the driving conditions encountered. As is typical in machine learning, it is difficult to acquire a large and potentially representative labeled test set to validate models in preparation for deployment in the wild. In this work, we introduce complementary learning, where we use learned characteristics from different training paradigms to detect model errors. We demonstrate our approach by learning semantic and predictive motion labels in point clouds in a supervised and self-supervised manner and detect and classify model discrepancies subsequently. We perform a large-scale qualitative analysis and present LidarCODA, the first dataset with labeled anomalies in lidar point clouds, for an extensive quantitative analysis.

Updated: 2024-07-19 13:36:35

标题: 实际模型故障检测的互补学习

摘要: 在实际的自动驾驶中，由于训练数据和遇到的驾驶条件之间的分布偏移，深度学习模型可能会出现性能下降。与机器学习中的典型情况一样，很难获得一个大且可能代表性的标记测试集，以验证模型在野外部署前的准备工作。在这项工作中，我们引入了互补学习，通过使用不同训练范式学到的特征来检测模型错误。我们通过在点云中以监督和自监督的方式学习语义和预测运动标签，并随后检测和分类模型差异来演示我们的方法。我们进行了大规模的定性分析，并提出了 LidarCODA，这是第一个具有激光雷达点云中标记异常的数据集，用于进行广泛的定量分析。

更新时间: 2024-07-19 13:36:35

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.14306v1

Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment

Machine learning applications on signals such as computer vision or biomedical data often face significant challenges due to the variability that exists across hardware devices or session recordings. This variability poses a Domain Adaptation (DA) problem, as training and testing data distributions often differ. In this work, we propose Spatio-Temporal Monge Alignment (STMA) to mitigate these variabilities. This Optimal Transport (OT) based method adapts the cross-power spectrum density (cross-PSD) of multivariate signals by mapping them to the Wasserstein barycenter of source domains (multi-source DA). Predictions for new domains can be done with a filtering without the need for retraining a model with source data (test-time DA). We also study and discuss two special cases of the method, Temporal Monge Alignment (TMA) and Spatial Monge Alignment (SMA). Non-asymptotic concentration bounds are derived for the mappings estimation, which reveals a bias-plus-variance error structure with a variance decay rate of $\mathcal{O}(n_\ell^{-1/2})$ with $n_\ell$ the signal length. This theoretical guarantee demonstrates the efficiency of the proposed computational schema. Numerical experiments on multivariate biosignals and image data show that STMA leads to significant and consistent performance gains between datasets acquired with very different settings. Notably, STMA is a pre-processing step complementary to state-of-the-art deep learning methods.

Updated: 2024-07-19 13:33:38

标题: 多源和测试时间域自适应在多变量信号上的应用：使用时空蒙日对齐

摘要: 机器学习在诸如计算机视觉或生物医学数据等信号上的应用经常面临重大挑战，因为存在硬件设备或会话记录之间的变异性。这种变异性构成了一个域适应（DA）问题，因为训练和测试数据分布通常不同。在这项工作中，我们提出了时空蒙日对齐（STMA）来减轻这些变异性。这种基于最优输运（OT）的方法通过将多变量信号的交叉功率谱密度（cross-PSD）映射到源域（多源DA）的Wasserstein重心来适应。对新域的预测可以通过滤波来进行，而无需使用源数据重新训练模型（测试时DA）。我们还研究和讨论了该方法的两种特殊情况，即时域蒙日对齐（TMA）和空间蒙日对齐（SMA）。为映射估计推导出了非渐近浓度界限，揭示了一个带有偏差加方差误差结构的变异性衰减速率为$\mathcal{O}(n_\ell^{-1/2})$，其中$n_\ell$为信号长度。这种理论保证证明了所提出的计算模式的效率。对多变量生物信号和图像数据的数值实验表明，STMA导致了在非常不同设置下获取的数据集之间显著且一致的性能增益。值得注意的是，STMA是一种与最先进的深度学习方法互补的预处理步骤。

更新时间: 2024-07-19 13:33:38

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2407.14303v1

Theoretical Analysis on Block Time Distributions in Byzantine Fault-Tolerant Consensus Blockchains

Some blockchain networks employ a distributed consensus algorithm featuring Byzantine fault tolerance. Notably, certain public chains, such as Cosmos and Tezos, which operate on a proof-of-stake mechanism, have adopted this algorithm. While it is commonly assumed that these blockchains maintain a nearly constant block creation time, empirical analysis reveals fluctuations in this interval; this phenomenon has received limited attention. In this paper, we propose a mathematical model to account for the processes of block propagation and validation within Byzantine fault-tolerant consensus blockchains, aiming to theoretically analyze the probability distribution of block time. First, we propose stochastic processes governing the broadcasting communications among validator nodes. Consequently, we theoretically demonstrate that the probability distribution of broadcast time among validator nodes adheres to the Gumbel distribution. This finding indicates that the distribution of block time typically arises from convolving multiple Gumbel distributions. Additionally, we derive an approximate formula for the block time distribution suitable for data analysis purposes. By fitting this approximation to real-world block time data, we demonstrate the consistent estimation of block time distribution parameters.

Updated: 2024-07-19 13:30:46

标题: 拜占庭容错共识区块链中区块时间分布的理论分析

摘要: 一些区块链网络采用了一种具有拜占庭容错特性的分布式共识算法。值得注意的是，某些公链，如Cosmos和Tezos，采用了这种算法，并运行在权益证明机制上。虽然通常认为这些区块链保持几乎恒定的区块创建时间，但实证分析显示出这一间隔存在波动；这一现象受到了有限的关注。在本文中，我们提出了一个数学模型，以解释在拜占庭容错共识区块链中的区块传播和验证过程，旨在理论上分析区块时间的概率分布。首先，我们提出了统治验证节点之间广播通信的随机过程。因此，我们在理论上证明了验证节点之间广播时间的概率分布符合Gumbel分布。这一发现表明，区块时间的分布通常来自于多个Gumbel分布的卷积。此外，我们推导出了适用于数据分析目的的区块时间分布的近似公式。通过将这个近似拟合到真实世界的区块时间数据中，我们展示了对区块时间分布参数的一致估计。

更新时间: 2024-07-19 13:30:46

领域: cs.DC,cs.CR,physics.data-an

下载: http://arxiv.org/abs/2407.14299v1

CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units

Multilingual code-switching research is often hindered by the lack and linguistically biased status of available datasets. To expand language representation, we synthesize code-switching data by replacing intonation units detected through PSST, a speech segmentation model fine-tuned from OpenAI's Whisper, using a speech-to-text translation dataset, CoVoST 2. With our dataset, CoVoSwitch, spanning 13 languages, we evaluate the code-switching translation performance of two multilingual translation models, M2M-100 418M and NLLB-200 600M. We reveal that the inclusion of code-switching units results in higher translation performance than monolingual settings and that models are better at code-switching translation into English than non-English. Further, low-resource languages gain most from integration of code-switched units when translating into English but much less when translating into non-English. Translations into low-resource languages also perform worse than even raw code-switched inputs. We find that systems excel at copying English tokens but struggle with non-English tokens, that the off-target problem in monolingual settings is also relevant in code-switching settings, and that models hallucinate in code-switching translation by introducing words absent in both of the original source sentences. CoVoSwitch and code are available at https://github.com/sophiayk20/covoswitch.

Updated: 2024-07-19 13:26:35

标题: CoVoSwitch：基于语调单元的合成代码交替文本机器翻译

摘要: 多语言混合代码切换研究通常受限于可用数据集的缺乏和语言偏见。为了扩大语言表示，我们通过使用从OpenAI的Whisper微调的语音分割模型PSST检测到的语调单元，合成了代码切换数据，使用了一个语音到文本翻译数据集CoVoST 2。我们的数据集CoVoSwitch涵盖了13种语言，我们评估了两个多语言翻译模型M2M-100 418M和NLLB-200 600M的代码切换翻译性能。我们发现，包含代码切换单元可以比单语设置实现更高的翻译性能，并且模型在将代码切换翻译成英语时比非英语更好。此外，资源稀缺的语言在将代码切换单元整合到英语翻译中时获益最大，但在翻译成非英语时获益较少。翻译成资源稀缺语言的表现甚至比原始代码切换输入还要差。我们发现系统擅长复制英语标记，但在处理非英语标记时遇到困难，在单语设置中的偏离目标问题在代码切换设置中也存在，并且模型在代码切换翻译中会产生幻觉，引入原始源句中都不存在的单词。CoVoSwitch和代码可在https://github.com/sophiayk20/covoswitch获得。

更新时间: 2024-07-19 13:26:35

领域: cs.CL,cs.AI,eess.AS

下载: http://arxiv.org/abs/2407.14295v1

Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent

The advent of large language models (LLMs) has revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks. However, the pre-training or fine-tuning of LLMs within a federated learning (FL) framework poses substantial challenges, including considerable computational and memory resource demands, as well as communication bottlenecks between servers and clients. Existing solutions either make the unrealistic assumption that the entire model is exchanged for training, or apply parameter-effective fine-tuning methods from centralized learning to train LLMs in FL which tend to underperform during training or fine-tuning stages due to the limited search subspace of parameter updating. In this paper, we introduce a novel method for the efficient training and fine-tuning of LLMs in FL, with minimal resource consumption. Our approach, termed FedCyBGD, utilizes Cycle Block Gradient Descent to periodically update the model. In particular, we design a compression scheme for FedCyBGD, aiming to further decrease the model download cost. It enables full parameter training in FL with only selected block updates and uploads, thereby reducing communication, computation, and memory costs. Our method achieves state-of-the-art performance for FL LLM training, while significantly reducing associated costs. Codes are provided here.

Updated: 2024-07-19 13:22:02

标题: 保存所有：通过循环块梯度下降实现联邦大型语言模型的完整参数调整

摘要: 大语言模型（LLMs）的出现彻底改变了深度学习范式，在各种任务中取得了令人印象深刻的成果。然而，在联邦学习（FL）框架内对LLMs进行预训练或微调面临着巨大挑战，包括大量的计算和内存资源需求，以及服务器和客户端之间的通信瓶颈。现有解决方案要么做出不切实际的假设，即整个模型被交换用于训练，要么将来自中心化学习的参数有效微调方法应用于FL中的LLMs训练，这往往会在训练或微调阶段表现不佳，因为参数更新的搜索子空间有限。在本文中，我们介绍了一种新颖的方法，用于在FL中高效训练和微调LLMs，资源消耗最小。我们的方法被称为FedCyBGD，利用Cycle Block Gradient Descent周期性更新模型。特别地，我们为FedCyBGD设计了一种压缩方案，旨在进一步降低模型下载成本。它能够在FL中进行全参数训练，仅通过选择的块更新和上传，从而减少通信、计算和内存成本。我们的方法在FL LLM训练中实现了最先进的性能，同时显著降低相关成本。代码在此提供。

更新时间: 2024-07-19 13:22:02

领域: cs.LG

下载: http://arxiv.org/abs/2406.11187v2

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

World models can foresee the outcomes of different actions, which is of paramount importance for autonomous driving. Nevertheless, existing driving world models still have limitations in generalization to unseen environments, prediction fidelity of critical details, and action controllability for flexible application. In this paper, we present Vista, a generalizable driving world model with high fidelity and versatile controllability. Based on a systematic diagnosis of existing methods, we introduce several key ingredients to address these limitations. To accurately predict real-world dynamics at high resolution, we propose two novel losses to promote the learning of moving instances and structural information. We also devise an effective latent replacement approach to inject historical frames as priors for coherent long-horizon rollouts. For action controllability, we incorporate a versatile set of controls from high-level intentions (command, goal point) to low-level maneuvers (trajectory, angle, and speed) through an efficient learning strategy. After large-scale training, the capabilities of Vista can seamlessly generalize to different scenarios. Extensive experiments on multiple datasets show that Vista outperforms the most advanced general-purpose video generator in over 70% of comparisons and surpasses the best-performing driving world model by 55% in FID and 27% in FVD. Moreover, for the first time, we utilize the capacity of Vista itself to establish a generalizable reward for real-world action evaluation without accessing the ground truth actions.

Updated: 2024-07-19 13:20:05

标题: “Vista：具有高保真度和多功能可控性的通用驾驶世界模型”

摘要: 世界模型可以预测不同行为的结果，这对于自动驾驶至关重要。然而，现有的驾驶世界模型在泛化到未见环境、关键细节预测的准确性和行动可控性方面仍存在局限性。本文介绍了Vista，一个具有高度适用性和灵活可控性的驾驶世界模型。通过对现有方法的系统诊断，我们引入了几个关键要素来解决这些局限性。为了准确预测高分辨率的现实世界动态，我们提出了两种新的损失函数，以促进移动实例和结构信息的学习。我们还设计了一种有效的潜在替换方法，将历史帧注入作为一致的长期预测的先验。为了增强行动可控性，我们通过高效的学习策略，将多样化的控制从高级意图（命令、目标点）到低级操作（轨迹、角度和速度）结合在一起。经过大规模训练，Vista的能力可以无缝地泛化到不同的场景。在多个数据集上进行的广泛实验表明，Vista在超过70%的比较中优于最先进的通用视频生成器，并在FID方面比最佳驾驶世界模型提高了55%，在FVD方面提高了27%。此外，我们首次利用Vista本身的能力建立了一个适用于实际行动评估的通用奖励，而无需访问地面真实行动。

更新时间: 2024-07-19 13:20:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.17398v3

PACCOR4ESP: Embedded Device Security Attestation using Platform Attribute Certificates

Verifying the integrity of embedded device characteristics is required to ensure secure operation of a device. One central challenge is to securely extract and store device-specific configurations for future verification. Existing device attestation schemes suffer from notable limitations, including a lack of standardization and a failure to encompass all hardware and software aspects inherent to a platform. This paper proposes an extension of the NSA Cybersecurity Directorate's Platform Attribute Certificate Creator (PACCOR) for the ESP32, a widely-used microcontroller series. Platform Attribute Certificates store device characteristics as per the Trusted Computing Group's Platform Certificate Profile. As of today, there is little research on hybrid attestation schemes utilizing Platform Attribute Certificates on embedded devices, which this work addresses. This paper presents a collection of attacks that can be detected using PACCOR4ESP. The toolkit extracts security-relevant information from an ESP32-S3, such as the firmware hash, bootloader hash, GPIO pin configuration, and a reference to the endorsement key of the secure element, and automatically embeds it into a Platform Attribute Certificate. Lastly, this work shows how PACCOR4ESP can be integrated with existing embedded device attestation frameworks, such as RAS, CRAFT, and SEDA.

Updated: 2024-07-19 13:17:00

标题: PACCOR4ESP：使用平台属性证书的嵌入式设备安全认证

摘要: 验证嵌入式设备特征的完整性是确保设备安全运行的必要步骤。一个核心挑战是安全地提取和存储设备特定配置以供将来验证。现有的设备认证方案存在显著的限制，包括缺乏标准化以及未能涵盖平台固有的所有硬件和软件方面。本文提出了国家安全局网络安全局的平台属性证书创建器（PACCOR）的扩展版本，针对广泛使用的微控制器系列ESP32。平台属性证书按照可信计算组的平台证书配置文件存储设备特征。迄今为止，对于在嵌入式设备上利用平台属性证书的混合认证方案的研究还很少，本文正是探讨了这一问题。本文介绍了可以使用PACCOR4ESP检测的一系列攻击。该工具包从ESP32-S3中提取与安全相关的信息，如固件哈希、引导加载程序哈希、GPIO引脚配置以及安全元素的认证密钥引用，并自动将其嵌入到平台属性证书中。最后，本文展示了如何将PACCOR4ESP与现有的嵌入式设备认证框架集成，如RAS、CRAFT和SEDA。

更新时间: 2024-07-19 13:17:00

领域: cs.CR

下载: http://arxiv.org/abs/2407.14286v1

Are you still on track!? Catching LLM Task Drift with Activations

Large Language Models (LLMs) are routinely used in retrieval-augmented applications to orchestrate tasks and process inputs from users and other sources. These inputs, even in a single LLM interaction, can come from a variety of sources, of varying trustworthiness and provenance. This opens the door to prompt injection attacks, where the LLM receives and acts upon instructions from supposedly data-only sources, thus deviating from the user's original instructions. We define this as task drift, and we propose to catch it by scanning and analyzing the LLM's activations. We compare the LLM's activations before and after processing the external input in order to detect whether this input caused instruction drift. We develop two probing methods and find that simply using a linear classifier can detect drift with near perfect ROC AUC on an out-of-distribution test set. We show that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions, without being trained on any of these attacks. Our setup does not require any modification of the LLM (e.g., fine-tuning) or any text generation, thus maximizing deployability and cost efficiency and avoiding reliance on unreliable model output. To foster future research on activation-based task inspection, decoding, and interpretability, we will release our large-scale TaskTracker toolkit, comprising a dataset of over 500K instances, representations from 5 SoTA language models, and inspection tools.

Updated: 2024-07-19 13:07:25

标题: 你还在追踪进度吗！？利用激活捕捉LLM任务漂移

摘要: 大型语言模型（LLMs）常常用于检索增强应用程序中，以编排任务并处理来自用户和其他来源的输入。即使在单个LLM交互中，这些输入也可能来自各种来源，信任度和出处各异。这为即时注入攻击敞开了大门，LLM接收并执行来自所谓仅包含数据的来源的指令，从而偏离用户的原始指令。我们将这定义为任务漂移，并提议通过扫描和分析LLM的激活来捕捉它。我们比较LLM在处理外部输入之前和之后的激活，以检测该输入是否导致指令漂移。我们开发了两种探测方法，并发现仅使用线性分类器可以在分布外测试集上几乎完美地检测漂移。我们展示了这种方法惊人地推广到未见任务领域，例如即时注入、越狱和恶意指令，而无需在任何这些攻击上进行训练。我们的设置不需要对LLM进行任何修改（例如，微调）或进行任何文本生成，从而最大限度地提高可部署性和成本效率，并避免依赖不可靠的模型输出。为了促进基于激活的任务检查、解码和可解释性的未来研究，我们将发布我们的大规模TaskTracker工具包，包括超过500K个实例的数据集、5个SoTA语言模型的表示和检测工具。

更新时间: 2024-07-19 13:07:25

领域: cs.CR,cs.CL,cs.CY

下载: http://arxiv.org/abs/2406.00799v4

Learn and Don't Forget: Adding a New Language to ASR Foundation Models

Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language set. Fine-tuning, while simple, may degrade the accuracy of the original set. We compare three approaches that exploit adaptation parameters: soft language code tuning, train only the language code; soft prompt tuning, train prepended tokens; and LoRA where a small set of additional parameters are optimised. Elastic Weight Consolidation (EWC) offers an alternative compromise with the potential to maintain performance in specific target languages. Results show that direct fine-tuning yields the best performance for the new language but degrades existing language capabilities. EWC can address this issue for specific languages. If only adaptation parameters are used, the language capabilities are maintained but at the cost of performance in the new language.

Updated: 2024-07-19 13:07:06

标题: 学习而不要忘记：将一种新语言添加到ASR基础模型

摘要: 基础ASR模型通常支持许多语言，例如Whisper中的100种语言。然而，在集成额外的、通常是低资源语言的工作方面，目前存在着有限的研究，同时还要保持原始语言集上的性能。微调虽然简单，但可能会降低原始集的准确性。我们比较了三种利用适应参数的方法：软语言代码调整，仅训练语言代码；软提示调整，训练前置标记；以及LoRA，其中优化了一小组额外的参数。弹性权重合并（EWC）提供了一种潜在的折衷方案，可以维持特定目标语言的性能。结果显示，直接微调对新语言的性能效果最好，但会降低现有语言的能力。EWC可以解决这个问题，用于特定语言。如果只使用适应参数，则可以保持语言能力，但会以牺牲新语言的性能为代价。

更新时间: 2024-07-19 13:07:06

领域: eess.AS,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2407.06800v2

How to Blend Concepts in Diffusion Models

For the last decade, there has been a push to use multi-dimensional (latent) spaces to represent concepts; and yet how to manipulate these concepts or reason with them remains largely unclear. Some recent methods exploit multiple latent representations and their connection, making this research question even more entangled. Our goal is to understand how operations in the latent space affect the underlying concepts. To that end, we explore the task of concept blending through diffusion models. Diffusion models are based on a connection between a latent representation of textual prompts and a latent space that enables image reconstruction and generation. This task allows us to try different text-based combination strategies, and evaluate easily through a visual analysis. Our conclusion is that concept blending through space manipulation is possible, although the best strategy depends on the context of the blend.

Updated: 2024-07-19 13:05:57

标题: 如何在扩散模型中融合概念

摘要: 在过去的十年中，人们一直在推动使用多维（潜在）空间来表示概念；然而如何操作这些概念或与它们推理仍然大部分不清楚。一些最近的方法利用多个潜在表示及其连接，使这个研究问题变得更加纠结。我们的目标是了解潜在空间中的操作如何影响基本概念。为此，我们通过扩散模型来探索概念融合的任务。扩散模型基于文本提示的潜在表示与一个能够进行图像重建和生成的潜在空间之间的联系。这个任务使我们可以尝试不同的基于文本的组合策略，并通过视觉分析轻松评估。我们的结论是，通过空间操作进行概念融合是可能的，尽管最佳策略取决于混合的上下文。

更新时间: 2024-07-19 13:05:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.14280v1

Does Refusal Training in LLMs Generalize to the Past Tense?

Refusal training is widely used to prevent LLMs from generating harmful, undesirable, or illegal outputs. We reveal a curious generalization gap in the current refusal training approaches: simply reformulating a harmful request in the past tense (e.g., "How to make a Molotov cocktail?" to "How did people make a Molotov cocktail?") is often sufficient to jailbreak many state-of-the-art LLMs. We systematically evaluate this method on Llama-3 8B, Claude-3.5 Sonnet, GPT-3.5 Turbo, Gemma-2 9B, Phi-3-Mini, GPT-4o mini, GPT-4o, and R2D2 models using GPT-3.5 Turbo as a reformulation model. For example, the success rate of this simple attack on GPT-4o increases from 1% using direct requests to 88% using 20 past tense reformulation attempts on harmful requests from JailbreakBench with GPT-4 as a jailbreak judge. Interestingly, we also find that reformulations in the future tense are less effective, suggesting that refusal guardrails tend to consider past historical questions more benign than hypothetical future questions. Moreover, our experiments on fine-tuning GPT-3.5 Turbo show that defending against past reformulations is feasible when past tense examples are explicitly included in the fine-tuning data. Overall, our findings highlight that the widely used alignment techniques -- such as SFT, RLHF, and adversarial training -- employed to align the studied models can be brittle and do not always generalize as intended. We provide code and jailbreak artifacts at https://github.com/tml-epfl/llm-past-tense.

Updated: 2024-07-19 13:03:01

标题: 拒绝训练在LLMs中是否推广到过去时态？

摘要: 拒绝训练被广泛用于防止LLMs生成有害、不良或非法的输出。我们揭示了当前拒绝训练方法中一个有趣的概括差距：简单地将一个有害请求转换成过去时（例如，将“如何制作摩洛托夫鸡尾酒？”改成“人们是如何制作摩洛托夫鸡尾酒的？”）通常足以越狱许多最先进的LLMs。我们在Llama-3 8B、Claude-3.5 Sonnet、GPT-3.5 Turbo、Gemma-2 9B、Phi-3-Mini、GPT-4o mini、GPT-4o和R2D2模型上系统评估了这种方法，使用GPT-3.5 Turbo作为转换模型。例如，对GPT-4o的这种简单攻击的成功率，从直接请求的1%增加到使用20个过去时改写尝试在JailbreakBench上的有害请求中达到88%，使用GPT-4作为越狱法官。有趣的是，我们还发现将来时的改写效果较差，这表明拒绝保护栏往往认为过去的历史问题比假设未来问题更温和。此外，我们在对GPT-3.5 Turbo进行微调的实验表明，当过去时的例子明确包含在微调数据中时，防御过去时的改写是可行的。总的来说，我们的发现突显了广泛使用的对齐技术（如SFT、RLHF和对抗性训练）用于对齐研究模型可能是脆弱的，并且不总是按照预期一样泛化。我们在https://github.com/tml-epfl/llm-past-tense提供了代码和越狱工件。

更新时间: 2024-07-19 13:03:01

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.11969v2

LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic

We introduce LLM-ARC, a neuro-symbolic framework designed to enhance the logical reasoning capabilities of Large Language Models (LLMs), by combining them with an Automated Reasoning Critic (ARC). LLM-ARC employs an Actor-Critic method where the LLM Actor generates declarative logic programs along with tests for semantic correctness, while the Automated Reasoning Critic evaluates the code, runs the tests and provides feedback on test failures for iterative refinement. Implemented using Answer Set Programming (ASP), LLM-ARC achieves a new state-of-the-art accuracy of 88.32% on the FOLIO benchmark which tests complex logical reasoning capabilities. Our experiments demonstrate significant improvements over LLM-only baselines, highlighting the importance of logic test generation and iterative self-refinement. We achieve our best result using a fully automated self-supervised training loop where the Actor is trained on end-to-end dialog traces with Critic feedback. We discuss potential enhancements and provide a detailed error analysis, showcasing the robustness and efficacy of LLM-ARC for complex natural language reasoning tasks.

Updated: 2024-07-19 12:59:11

标题: LLM-ARC：通过自动推理评论者增强LLMs

摘要: 我们介绍了LLM-ARC，这是一个旨在增强大型语言模型（LLMs）的逻辑推理能力的神经符号框架，通过将它们与自动推理评论家（ARC）结合起来。LLM-ARC采用了一个Actor-Critic方法，其中LLM Actor生成声明性逻辑程序以及测试语义正确性，而自动推理评论家评估代码，运行测试，并对测试失败提供反馈以进行迭代改进。LLM-ARC使用Answer Set Programming（ASP）实现，在测试复杂逻辑推理能力的FOLIO基准测试中实现了88.32％的最新准确性。我们的实验表明，在LLM-only基线上有显着改进，突出了逻辑测试生成和迭代自我改进的重要性。我们通过完全自动化的自监督训练循环获得最佳结果，其中Actor在端到端对话跟踪上接受评论家的反馈进行训练。我们讨论了潜在的增强措施，并提供了详细的错误分析，展示了LLM-ARC在复杂自然语言推理任务中的稳健性和有效性。

更新时间: 2024-07-19 12:59:11

领域: cs.CL,cs.AI,cs.LO

下载: http://arxiv.org/abs/2406.17663v2

Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi-Pumped Soft SIMD Operations

Recent advancements in quantization and mixed-precision approaches offers substantial opportunities to improve the speed and energy efficiency of Neural Networks (NN). Research has shown that individual parameters with varying low precision, can attain accuracies comparable to full-precision counterparts. However, modern embedded microprocessors provide very limited support for mixed-precision NNs regarding both Instruction Set Architecture (ISA) extensions and their hardware design for efficient execution of mixed-precision operations, i.e., introducing several performance bottlenecks due to numerous instructions for data packing and unpacking, arithmetic unit under-utilizations etc. In this work, we bring together, for the first time, ISA extensions tailored to mixed-precision hardware optimizations, targeting energy-efficient DNN inference on leading RISC-V CPU architectures. To this end, we introduce a hardware-software co-design framework that enables cooperative hardware design, mixed-precision quantization, ISA extensions and inference in cycle-accurate emulations. At hardware level, we firstly expand the ALU unit within our proof-of-concept micro-architecture to support configurable fine grained mixed-precision arithmetic operations. Subsequently, we implement multi-pumping to minimize execution latency, with an additional soft SIMD optimization applied for 2-bit operations. At the ISA level, three distinct MAC instructions are encoded extending the RISC-V ISA, and exposed up to the compiler level, each corresponding to a different mixed-precision operational mode. Our extensive experimental evaluation over widely used DNNs and datasets, such as CIFAR10 and ImageNet, demonstrates that our framework can achieve, on average, 15x energy reduction for less than 1% accuracy loss and outperforms the ISA-agnostic state-of-the-art RISC-V cores.

Updated: 2024-07-19 12:54:04

标题: RISC-V核上的混合精度神经网络：用于多泵软SIMD操作的ISA扩展

摘要: 最近在量化和混合精度方法方面的进展为改善神经网络的速度和能效提供了重要机遇。研究表明，具有不同低精度的个别参数可以达到与完整精度对应物相媲美的准确度。然而，现代嵌入式微处理器在指令集架构扩展和硬件设计方面对混合精度神经网络提供了非常有限的支持，导致了许多性能瓶颈，如需要大量指令进行数据打包和解包、算术单元的低利用率等。在这项工作中，我们首次将针对混合精度硬件优化的ISA扩展与能源高效的DNN推断结合在一起，针对领先的RISC-V CPU架构。为此，我们引入了一种硬件-软件协同设计框架，实现了协同硬件设计、混合精度量化、ISA扩展和在周期精确模拟中的推断。在硬件层面上，我们首先扩展了我们的概念验证微架构中的ALU单元，以支持可配置的细粒度混合精度算术运算。随后，我们实施了多泵技术以最小化执行延迟，并针对2位操作应用了额外的软件SIMD优化。在ISA层面上，我们对RISC-V ISA进行了扩展，编码了三个不同的MAC指令，并将其暴露到编译器级别，每个指令对应不同的混合精度操作模式。我们在广泛使用的DNN和数据集上进行了广泛的实验评估，如CIFAR10和ImageNet，结果表明我们的框架可以平均实现15倍的能量降低，准确度损失不到1%，并且优于不考虑ISA的最先进的RISC-V核心。

更新时间: 2024-07-19 12:54:04

领域: cs.AR,cs.AI

下载: http://arxiv.org/abs/2407.14274v1

A Learning-based Adaptive Compliance Method for Symmetric Bi-manual Manipulation

Symmetric bi-manual manipulation is an essential skill in on-orbit operations due to its potent load capacity. Previous works have applied compliant control to maintain the stability of manipulations. However, traditional methods have viewed motion planning and compliant control as two separate modules, which can lead to conflicts with the simultaneous change of the desired trajectory and impedance parameters in the presence of external forces and disturbances. Additionally, the joint usage of these two modules requires experts to manually adjust parameters. To achieve high efficiency while enhancing adaptability, we propose a novel Learning-based Adaptive Compliance algorithm (LAC) that improves the efficiency and robustness of symmetric bi-manual manipulation. Specifically, the algorithm framework integrates desired trajectory generation and impedance-parameter adjustment under a unified framework to mitigate contradictions and improve efficiency. Second, we introduce a centralized Actor-Critic framework with LSTM networks preprocessing the force states, enhancing the synchronization of bi-manual manipulation. When evaluated in dual-arm peg-in-hole assembly experiments, our method outperforms baseline algorithms in terms of optimality and robustness.

Updated: 2024-07-19 12:53:52

标题: 一种基于学习的对称双手操控行为自适应合规性方法

摘要: 对称双手操作是太空轨道操作中的一项基本技能，因为它具有强大的负载能力。先前的研究已经应用了柔性控制来维持操作的稳定性。然而，传统方法将运动规划和柔性控制视为两个独立的模块，这可能会导致在外部力和干扰存在的情况下，期望轨迹和阻抗参数同时变化时出现冲突。此外，这两个模块的联合使用需要专家手动调整参数。为了在提高适应性的同时实现高效率，我们提出了一种新颖的基于学习的自适应柔性算法（LAC），该算法提高了对称双手操作的效率和稳健性。具体来说，该算法框架将期望轨迹生成和阻抗参数调整集成到统一框架下，以减轻矛盾并提高效率。其次，我们引入了一个集中式的Actor-Critic框架，其中LSTM网络对力状态进行预处理，增强了双手操作的同步性。在双臂插销装配实验中评估时，我们的方法在最优性和稳健性方面优于基线算法。

更新时间: 2024-07-19 12:53:52

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2303.15262v2

Simple, unified analysis of Johnson-Lindenstrauss with applications

We present a simplified and unified analysis of the Johnson-Lindenstrauss (JL) lemma, a cornerstone of dimensionality reduction for managing high-dimensional data. Our approach simplifies understanding and unifies various constructions under the JL framework, including spherical, binary-coin, sparse JL, Gaussian, and sub-Gaussian models. This unification preserves the intrinsic geometry of data, essential for applications from streaming algorithms to reinforcement learning. We provide the first rigorous proof of the spherical construction's effectiveness and introduce a general class of sub-Gaussian constructions within this simplified framework. Central to our contribution is an innovative extension of the Hanson-Wright inequality to high dimensions, complete with explicit constants. By using simple yet powerful probabilistic tools and analytical techniques, such as an enhanced diagonalization process, our analysis solidifies the theoretical foundation of the JL lemma by removing an independence assumption and extends its practical applicability to contemporary algorithms.

Updated: 2024-07-19 12:49:42

标题: 简单统一的Johnson-Lindenstrauss分析及其应用

摘要: 我们提出了对Johnson-Lindenstrauss（JL）引理的简化和统一分析，这是处理高维数据的降维的基石。我们的方法简化了对JL框架下各种构造的理解，并统一了球形、二进制、稀疏JL、高斯和次高斯模型。这种统一保留了数据的内在几何结构，对于从流式算法到强化学习等应用至关重要。我们提供了对球形构造有效性的首个严格证明，并在这一简化框架内引入了一个通用的次高斯构造类。我们的贡献的核心是将Hanson-Wright不等式推广到高维空间，包括明确的常数。通过使用简单但强大的概率工具和分析技术，如增强的对角化过程，我们的分析巩固了JL引理的理论基础，通过消除一个独立性假设将其实际适用性扩展到当代算法。

更新时间: 2024-07-19 12:49:42

领域: stat.ML,cs.DS,cs.LG,math.PR

下载: http://arxiv.org/abs/2402.10232v4

L^2CL: Embarrassingly Simple Layer-to-Layer Contrastive Learning for Graph Collaborative Filtering

Graph neural networks (GNNs) have recently emerged as an effective approach to model neighborhood signals in collaborative filtering. Towards this research line, graph contrastive learning (GCL) demonstrates robust capabilities to address the supervision label shortage issue through generating massive self-supervised signals. Despite its effectiveness, GCL for recommendation suffers seriously from two main challenges: i) GCL relies on graph augmentation to generate semantically different views for contrasting, which could potentially disrupt key information and introduce unwanted noise; ii) current works for GCL primarily focus on contrasting representations using sophisticated networks architecture (usually deep) to capture high-order interactions, which leads to increased computational complexity and suboptimal training efficiency. To this end, we propose L2CL, a principled Layer-to-Layer Contrastive Learning framework that contrasts representations from different layers. By aligning the semantic similarities between different layers, L2CL enables the learning of complex structural relationships and gets rid of the noise perturbation in stochastic data augmentation. Surprisingly, we find that L2CL, using only one-hop contrastive learning paradigm, is able to capture intrinsic semantic structures and improve the quality of node representation, leading to a simple yet effective architecture. We also provide theoretical guarantees for L2CL in minimizing task-irrelevant information. Extensive experiments on five real-world datasets demonstrate the superiority of our model over various state-of-the-art collaborative filtering methods. Our code is available at https://github.com/downeykking/L2CL.

Updated: 2024-07-19 12:45:21

标题: L^2CL：图协同过滤的简单至尴尬的层对层对比学习

摘要: 图神经网络（GNNs）最近已经成为一种有效的方法，用于模拟协同过滤中的邻域信号。在这一研究方向上，图对比学习（GCL）通过生成大量的自监督信号，展现出了解决监督标签短缺问题的强大能力。尽管其有效性，推荐中的GCL却面临两个主要挑战：i）GCL依赖于图增强来生成语义不同的视图以进行对比，这可能会破坏关键信息并引入不必要的噪声；ii）目前的GCL工作主要集中在使用复杂网络架构（通常为深度网络）来对比表示，以捕捉高阶交互作用，这导致计算复杂度增加并减少训练效率。因此，我们提出了L2CL，一种基于层对层对比学习框架，用于对比来自不同层的表示。通过对齐不同层之间的语义相似性，L2CL能够学习复杂的结构关系，并消除随机数据增强中的噪声扰动。令人惊讶的是，我们发现L2CL仅使用一跳对比学习范式，就能够捕捉内在语义结构并改善节点表示的质量，从而实现简单而有效的架构。我们还为L2CL在最小化与任务无关信息方面提供了理论保证。对五个真实世界数据集的大量实验证明了我们的模型优于各种最先进的协同过滤方法。我们的代码可在https://github.com/downeykking/L2CL 上获得。

更新时间: 2024-07-19 12:45:21

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.14266v1

Sequential Model for Predicting Patient Adherence in Subcutaneous Immunotherapy for Allergic Rhinitis

Objective: Subcutaneous Immunotherapy (SCIT) is the long-lasting causal treatment of allergic rhinitis (AR). How to enhance the adherence of patients to maximize the benefit of allergen immunotherapy (AIT) plays a crucial role in the management of AIT. This study aims to leverage novel machine learning models to precisely predict the risk of non-adherence of AR patients and related local symptom scores in three years SCIT. Methods: The research develops and analyzes two models, sequential latent-variable model (SLVM) of Stochastic Latent Actor-Critic (SLAC) and Long Short-Term Memory (LSTM) evaluating them based on scoring and adherence prediction capabilities. Results: Excluding the biased samples at the first time step, the predictive adherence accuracy of the SLAC models is from 60\% to 72\%, and for LSTM models, it is 66\% to 84\%, varying according to the time steps. The range of Root Mean Square Error (RMSE) for SLAC models is between 0.93 and 2.22, while for LSTM models it is between 1.09 and 1.77. Notably, these RMSEs are significantly lower than the random prediction error of 4.55. Conclusion: We creatively apply sequential models in the long-term management of SCIT with promising accuracy in the prediction of SCIT nonadherence in AR patients. While LSTM outperforms SLAC in adherence prediction, SLAC excels in score prediction for patients undergoing SCIT for AR. The state-action-based SLAC adds flexibility, presenting a novel and effective approach for managing long-term AIT.

Updated: 2024-07-19 12:42:17

标题: 连续模型用于预测过敏性鼻炎患者皮下免疫治疗的依从性

摘要: 目标：皮下免疫疗法（SCIT）是过敏性鼻炎（AR）的长效治疗方法。如何提高患者的依从性以最大化过敏原免疫疗法（AIT）的效益在AIT的管理中起着关键作用。本研究旨在利用新颖的机器学习模型精确预测AR患者在三年的SCIT过程中的不依从风险和相关局部症状评分。方法：研究开发并分析了两种模型，随机潜变量模型（SLVM）和长短期记忆（LSTM），并基于评分和依从性预测能力对它们进行评估。结果：在排除第一个时间步骤的偏倚样本后，SLAC模型的预测依从性准确率在60\%至72\%之间，而LSTM模型在66\%至84\%之间变化，具体取决于时间步骤。SLAC模型的均方根误差（RMSE）范围在0.93至2.22之间，而LSTM模型在1.09至1.77之间。值得注意的是，这些RMSE明显低于随机预测误差4.55。结论：我们创造性地将序贯模型应用于SCIT的长期管理中，在预测AR患者在SCIT中不依从方面具有令人期待的准确性。虽然LSTM在依从性预测方面表现优于SLAC，但SLAC在进行AR的SCIT患者的评分预测方面表现出色。基于状态行为的SLAC增加了灵活性，提供了一种新颖且有效的管理长期AIT的方法。

更新时间: 2024-07-19 12:42:17

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2401.11447v5

Towards Scene Graph Anticipation

Spatio-temporal scene graphs represent interactions in a video by decomposing scenes into individual objects and their pair-wise temporal relationships. Long-term anticipation of the fine-grained pair-wise relationships between objects is a challenging problem. To this end, we introduce the task of Scene Graph Anticipation (SGA). We adapt state-of-the-art scene graph generation methods as baselines to anticipate future pair-wise relationships between objects and propose a novel approach SceneSayer. In SceneSayer, we leverage object-centric representations of relationships to reason about the observed video frames and model the evolution of relationships between objects. We take a continuous time perspective and model the latent dynamics of the evolution of object interactions using concepts of NeuralODE and NeuralSDE, respectively. We infer representations of future relationships by solving an Ordinary Differential Equation and a Stochastic Differential Equation, respectively. Extensive experimentation on the Action Genome dataset validates the efficacy of the proposed methods.

Updated: 2024-07-19 12:40:28

标题: 朝向场景图预测

摘要: 时空场景图通过将场景分解为独立对象及其两两之间的时间关系来表示视频中的交互。长期预测对象之间精细的两两关系是一个具有挑战性的问题。为此，我们引入了场景图预测（SGA）任务。我们将最先进的场景图生成方法作为基线，以预测对象之间未来的两两关系，并提出了一种新颖的方法SceneSayer。在SceneSayer中，我们利用对象为中心的关系表示来推理观察到的视频帧，并建模对象之间关系的演变。我们采用连续时间的视角，分别使用神经ODE和神经SDE的概念来建模对象交互演变的潜在动态。我们通过分别解决普通微分方程和随机微分方程来推断未来关系的表示。对Action Genome数据集进行的大量实验验证了所提出方法的有效性。

更新时间: 2024-07-19 12:40:28

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.04899v2

Hyperparameter Optimization for Driving Strategies Based on Reinforcement Learning

This paper focuses on hyperparameter optimization for autonomous driving strategies based on Reinforcement Learning. We provide a detailed description of training the RL agent in a simulation environment. Subsequently, we employ Efficient Global Optimization algorithm that uses Gaussian Process fitting for hyperparameter optimization in RL. Before this optimization phase, Gaussian process interpolation is applied to fit the surrogate model, for which the hyperparameter set is generated using Latin hypercube sampling. To accelerate the evaluation, parallelization techniques are employed. Following the hyperparameter optimization procedure, a set of hyperparameters is identified, resulting in a noteworthy enhancement in overall driving performance. There is a substantial increase of 4\% when compared to existing manually tuned parameters and the hyperparameters discovered during the initialization process using Latin hypercube sampling. After the optimization, we analyze the obtained results thoroughly and conduct a sensitivity analysis to assess the robustness and generalization capabilities of the learned autonomous driving strategies. The findings from this study contribute to the advancement of Gaussian process based Bayesian optimization to optimize the hyperparameters for autonomous driving in RL, providing valuable insights for the development of efficient and reliable autonomous driving systems.

Updated: 2024-07-19 12:40:08

标题: 基于强化学习的驾驶策略的超参数优化

摘要: 本文关注基于强化学习的自动驾驶策略的超参数优化。我们提供了在模拟环境中训练RL代理的详细描述。随后，我们采用了使用高斯过程拟合的Efficient Global Optimization算法来进行RL中的超参数优化。在优化阶段之前，高斯过程插值被应用于拟合替代模型，其中使用拉丁超立方采样生成超参数集。为了加速评估，采用了并行化技术。在超参数优化过程之后，确定了一组超参数，导致整体驾驶性能显著提升。与现有手动调整参数及使用拉丁超立方采样在初始化过程中发现的超参数相比，性能提高了4％。优化后，我们对获得的结果进行了彻底分析，并进行了敏感性分析，以评估学习的自动驾驶策略的稳健性和泛化能力。本研究的发现有助于推进基于高斯过程的贝叶斯优化技术，以优化RL中自动驾驶的超参数，为高效可靠的自动驾驶系统的发展提供宝贵见解。

更新时间: 2024-07-19 12:40:08

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2407.14262v1

Voices in a Crowd: Searching for Clusters of Unique Perspectives

Language models have been shown to reproduce underlying biases existing in their training data, which is the majority perspective by default. Proposed solutions aim to capture minority perspectives by either modelling annotator disagreements or grouping annotators based on shared metadata, both of which face significant challenges. We propose a framework that trains models without encoding annotator metadata, extracts latent embeddings informed by annotator behaviour, and creates clusters of similar opinions, that we refer to as voices. Resulting clusters are validated post-hoc via internal and external quantitative metrics, as well a qualitative analysis to identify the type of voice that each cluster represents. Our results demonstrate the strong generalisation capability of our framework, indicated by resulting clusters being adequately robust, while also capturing minority perspectives based on different demographic factors throughout two distinct datasets.

Updated: 2024-07-19 12:37:15

标题: 人群中的声音：寻找独特观点的聚类

摘要: 语言模型已被证明能够复制其训练数据中存在的基本偏见，这默认是大多数人的观点。提出的解决方案旨在通过建模注释者的分歧或基于共享元数据对注释者进行分组来捕捉少数群体的观点，但这两种方法都面临着重大挑战。我们提出了一个框架，该框架在不编码注释者元数据的情况下训练模型，提取由注释者行为信息驱动的潜在嵌入，并创建类似意见的群集，我们将其称为“声音”。通过内部和外部定量指标以及定性分析，验证后得到的群集是有效稳健的，同时还根据不同的人口统计因素捕捉了两个不同数据集中的少数群体观点。我们的结果表明，我们的框架具有强大的泛化能力，根据结果的群集既足够稳健，又能捕捉基于不同人口统计因素的少数群体观点。

更新时间: 2024-07-19 12:37:15

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.14259v1

Obfuscated Location Disclosure for Remote ID Enabled Drones

The Remote ID (RID) regulation recently introduced by several aviation authorities worldwide (including the US and EU) forces commercial drones to regularly (max. every second) broadcast plaintext messages on the wireless channel, providing information about the drone identifier and current location, among others. Although these regulations increase the accountability of drone operations and improve traffic management, they allow malicious users to track drones via the disclosed information, possibly leading to drone capture and severe privacy leaks. In this paper, we propose Obfuscated Location disclOsure for RID-enabled drones (OLO-RID), a solution modifying and extending the RID regulation while preserving drones' location privacy. Rather than disclosing the actual drone's location, drones equipped with OLO-RID disclose a differentially private obfuscated location in a mobile scenario. OLO-RID also extends RID messages with encrypted location information, accessible only by authorized entities and valuable to obtain the current drone's location in safety-critical use cases. We design, implement, and deploy OLO-RID on a Raspberry Pi 3 and release the code of our implementation as open-source. We also perform an extensive performance assessment of the runtime overhead of our solution in terms of processing, communication, memory, and energy consumption. We show that OLO-RID can generate RID messages on a constrained device in less than 0.16 s while also requiring a minimal energy toll on a relevant device (0.0236% of energy for a DJI Mini 2). We also evaluate the utility of the proposed approach in the context of three reference use cases involving the drones' location usage, demonstrating minimal performance degradation when trading off location privacy and utility for next-generation RID-compliant drone ecosystems.

Updated: 2024-07-19 12:35:49

标题: 为远程身份识别启用的无人机的隐匿位置披露

摘要: 最近，全球多个航空管理机构（包括美国和欧盟）引入了远程识别（RID）规定，强制商用无人机定期（最多每秒一次）在无线信道上广播明文消息，提供有关无人机标识和当前位置等信息。尽管这些规定增加了无人机操作的问责制度并改善了交通管理，但它们允许恶意用户通过披露的信息跟踪无人机，可能导致无人机被捕获和严重侵犯隐私。在本文中，我们提出了面向启用RID的无人机的混淆位置披露（OLO-RID）解决方案，该解决方案修改和扩展了RID规定，同时保护了无人机的位置隐私。与披露实际无人机位置不同，配备OLO-RID的无人机在移动场景中披露差分私密化的混淆位置。 OLO-RID还通过加密位置信息扩展RID消息，仅由授权实体访问，对于获取当前无人机位置在安全关键的使用情况下非常有价值。我们在树莓派3上设计、实施和部署OLO-RID，并将我们的实施代码作为开源发布。我们还对我们的解决方案在处理、通信、内存和能量消耗方面的运行时开销进行了广泛的性能评估。我们展示了OLO-RID可以在受限设备上生成RID消息少于0.16秒，同时对相关设备的能量开销要求最小（对于DJI Mini 2的能量消耗为0.0236％）。我们还评估了在涉及无人机位置使用的三个参考用例背景下所提出方法的效用，表明在权衡下一代RID兼容无人机生态系统的位置隐私和效用时，性能下降最小。

更新时间: 2024-07-19 12:35:49

领域: cs.CR

下载: http://arxiv.org/abs/2407.14256v1

Bridging the Gap: A Survey and Classification of Research-Informed Ethical Hacking Tools

The majority of Ethical Hacking (EH) tools utilised in penetration testing are developed by practitioners within the industry or underground communities. Similarly, academic researchers have also contributed to developing security tools. However, there appears to be limited awareness among practitioners of academic contributions in this domain, creating a significant gap between industry and academia's contributions to EH tools. This research paper aims to survey the current state of EH academic research, primarily focusing on research-informed security tools. We categorise these tools into process-based frameworks (such as PTES and Mitre ATT\&CK) and knowledge-based frameworks (such as CyBOK and ACM CCS). This classification provides a comprehensive overview of novel, research-informed tools, considering their functionality and application areas. The analysis covers licensing, release dates, source code availability, development activity, and peer review status, providing valuable insights into the current state of research in this field.

Updated: 2024-07-19 12:35:17

标题: 填补空白：研究导向的道德黑客工具的调查和分类

摘要: 大多数在渗透测试中使用的道德黑客（EH）工具是由行业从业者或地下社区开发的。同样，学术研究人员也为开发安全工具做出了贡献。然而，从业人员似乎对学术界在这一领域的贡献了解有限，导致了行业和学术界在EH工具方面的贡献之间存在显著差距。本研究旨在调查当前EH学术研究的现状，主要关注以研究为基础的安全工具。我们将这些工具分类为基于过程的框架（如PTES和Mitre ATT＆CK）和基于知识的框架（如CyBOK和ACM CCS）。这种分类提供了对新颖的、研究为基础的工具的全面概述，考虑了它们的功能和应用领域。分析涵盖了许可、发布日期、源代码可用性、开发活动和同行评审状态，为了解该领域的当前研究状况提供了有价值的见解。

更新时间: 2024-07-19 12:35:17

领域: cs.CR

下载: http://arxiv.org/abs/2407.14255v1

Personalized Multi-tier Federated Learning

The key challenge of personalized federated learning (PerFL) is to capture the statistical heterogeneity properties of data with inexpensive communications and gain customized performance for participating devices. To address these, we introduced personalized federated learning in multi-tier architecture (PerMFL) to obtain optimized and personalized local models when there are known team structures across devices. We provide theoretical guarantees of PerMFL, which offers linear convergence rates for smooth strongly convex problems and sub-linear convergence rates for smooth non-convex problems. We conduct numerical experiments demonstrating the robust empirical performance of PerMFL, outperforming the state-of-the-art in multiple personalized federated learning tasks.

Updated: 2024-07-19 12:31:15

标题: 个性化多层级联合学习

摘要: 个性化联邦学习（PerFL）的关键挑战是通过廉价的通信捕捉数据的统计异质性属性，并为参与设备获得定制化性能。为了解决这些问题，我们引入了多层架构中的个性化联邦学习（PerMFL），在设备之间已知团队结构的情况下，获得优化和个性化的本地模型。我们提供了PerMFL的理论保证，针对平滑强凸问题提供了线性收敛率，并针对平滑非凸问题提供了次线性收敛率。我们进行了数值实验，展示了PerMFL的强大实证性能，超越了多个个性化联邦学习任务中的最新技术。

更新时间: 2024-07-19 12:31:15

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2407.14251v1

An Attention-based Representation Distillation Baseline for Multi-Label Continual Learning

The field of Continual Learning (CL) has inspired numerous researchers over the years, leading to increasingly advanced countermeasures to the issue of catastrophic forgetting. Most studies have focused on the single-class scenario, where each example comes with a single label. The recent literature has successfully tackled such a setting, with impressive results. Differently, we shift our attention to the multi-label scenario, as we feel it to be more representative of real-world open problems. In our work, we show that existing state-of-the-art CL methods fail to achieve satisfactory performance, thus questioning the real advance claimed in recent years. Therefore, we assess both old-style and novel strategies and propose, on top of them, an approach called Selective Class Attention Distillation (SCAD). It relies on a knowledge transfer technique that seeks to align the representations of the student network -- which trains continuously and is subject to forgetting -- with the teacher ones, which is pretrained and kept frozen. Importantly, our method is able to selectively transfer the relevant information from the teacher to the student, thereby preventing irrelevant information from harming the student's performance during online training. To demonstrate the merits of our approach, we conduct experiments on two different multi-label datasets, showing that our method outperforms the current state-of-the-art Continual Learning methods. Our findings highlight the importance of addressing the unique challenges posed by multi-label environments in the field of Continual Learning. The code of SCAD is available at https://github.com/aimagelab/SCAD-LOD-2024.

Updated: 2024-07-19 12:30:03

标题: 一种基于注意力的表示蒸馏基线模型，用于多标签持续学习

摘要: 持续学习（CL）领域多年来一直吸引了众多研究人员，导致对灾难性遗忘问题的越来越先进的对策。大多数研究都集中在单类别情景上，每个示例都带有一个标签。最近的文献成功地解决了这种情况，并取得了令人印象深刻的结果。不同的是，我们将注意力转向多标签情景，因为我们认为这更具代表性，代表了现实世界中的开放性问题。在我们的工作中，我们展示了现有最先进的CL方法无法实现令人满意的性能，因此质疑了近年来声称的真正进步。因此，我们评估了传统和新颖的策略，并提出了一种称为选择性类别关注蒸馏（SCAD）的方法。它依赖于一种知识传输技术，旨在使持续训练并容易遗忘的学生网络的表示与预训练并保持冻结的教师网络的表示一致。重要的是，我们的方法能够选择性地从教师向学生传递相关信息，从而防止无关信息在在线训练期间损害学生的性能。为了展示我们方法的优点，我们在两个不同的多标签数据集上进行实验，结果显示我们的方法优于当前最先进的持续学习方法。我们的发现突出了在持续学习领域应对多标签环境所提出的独特挑战的重要性。SCAD的代码可在https://github.com/aimagelab/SCAD-LOD-2024获得。

更新时间: 2024-07-19 12:30:03

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.14249v1

Liquid Staking Tokens in Automated Market Makers

This paper studies liquid staking tokens (LSTs) on automated market makers (AMMs), both theoretically and empirically. LSTs are tokenized representations of staked assets on proof-of-stake blockchains. First, we model LST-liquidity on AMMs theoretically, categorizing suitable AMM types for LST liquidity and deriving formulas for the necessary returns from trading fees to adequately compensate liquidity providers under the particular price trajectories of LSTs. For the latter, two relevant metrics are considered: (1) losses compared to holding the liquidity outside the AMM (loss-versus-holding, or "impermanent loss"), and (2) the relative profitability compared to fully staking the capital (loss-versus-staking) which is specifically tailored to the case of LST-liquidity. Next, we empirically measure these metrics for Ethereum LSTs across the most relevant AMM pools. We find that, while trading fees often compensate for impermanent loss, fully staking is more profitable for many pools, raising questions about the sustainability of the current LST liquidity allocation to AMMs.

Updated: 2024-07-19 12:23:36

标题: Automated Market Makers 中的流动性质抵代币

摘要: 这篇论文从理论和实证两方面研究了自动做市商（AMMs）上的流动质押代币（LSTs）。LSTs是基于权益证明区块链上抵押资产的代币化表示。首先，我们从理论上对AMMs上的LST流动性进行建模，将适用于LST流动性的AMM类型进行分类，并推导出关于交易费用的必要回报的公式，以充分补偿特定价格轨迹下的流动性提供者。对于后者，考虑了两个相关的指标：（1）与在AMM之外保留流动性相比的损失（损失与持有，或“不完全损失”），以及（2）相对于完全抵押资本的相对盈利性（损失与抵押），这是专门针对LST流动性的情况量身定制的。接下来，我们实证地测量了以太坊LST在最相关的AMM池中的这些指标。我们发现，虽然交易费通常可以补偿不完全损失，但对于许多池而言，完全抵押更具盈利性，这引发了关于当前LST流动性分配给AMMs的可持续性的问题。

更新时间: 2024-07-19 12:23:36

领域: cs.CR

下载: http://arxiv.org/abs/2403.10226v2

KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

Large language models (LLMs) as autonomous agents offer a novel avenue for tackling real-world challenges through a knowledge-driven manner. These LLM-enhanced methodologies excel in generalization and interpretability. However, the complexity of driving tasks often necessitates the collaboration of multiple, heterogeneous agents, underscoring the need for such LLM-driven agents to engage in cooperative knowledge sharing and cognitive synergy. Despite the promise of LLMs, current applications predominantly center around single agent scenarios. To broaden the horizons of knowledge-driven strategies and bolster the generalization capabilities of autonomous agents, we propose the KoMA framework consisting of multi-agent interaction, multi-step planning, shared-memory, and ranking-based reflection modules to enhance multi-agents' decision-making in complex driving scenarios. Based on the framework's generated text descriptions of driving scenarios, the multi-agent interaction module enables LLM agents to analyze and infer the intentions of surrounding vehicles, akin to human cognition. The multi-step planning module enables LLM agents to analyze and obtain final action decisions layer by layer to ensure consistent goals for short-term action decisions. The shared memory module can accumulate collective experience to make superior decisions, and the ranking-based reflection module can evaluate and improve agent behavior with the aim of enhancing driving safety and efficiency. The KoMA framework not only enhances the robustness and adaptability of autonomous driving agents but also significantly elevates their generalization capabilities across diverse scenarios. Empirical results demonstrate the superiority of our approach over traditional methods, particularly in its ability to handle complex, unpredictable driving environments without extensive retraining.

Updated: 2024-07-19 12:13:08

标题: KoMA：基于大型语言模型的自主驾驶知识驱动多Agent框架

摘要: 大型语言模型(LLMs)作为自主代理提供了一种新颖的途径，通过知识驱动的方式来解决现实世界的挑战。这些LLM增强方法在概括性和可解释性方面表现优异。然而，驾驶任务的复杂性往往需要多个异构代理的协作，强调了这些LLM驱动代理需要参与合作知识共享和认知协同的必要性。尽管LLMs具有潜力，但当前应用主要集中在单一代理情景中。为了拓展知识驱动策略的视野并增强自主代理的概括能力，我们提出了KoMA框架，包括多代理交互、多步规划、共享内存和基于排名的反思模块，以增强多代理在复杂驾驶情景下的决策能力。基于框架生成的驾驶情景文本描述，多代理交互模块使LLM代理能够分析和推断周围车辆的意图，类似于人类认知。多步规划模块使LLM代理能够逐层分析并获取最终行动决策，以确保短期行动决策的一致性目标。共享内存模块可以积累集体经验以做出更优越的决策，而基于排名的反思模块可以评估和改进代理行为，旨在增强驾驶安全性和效率。KoMA框架不仅增强了自主驾驶代理的稳健性和适应性，还显著提升了它们在各种情景下的概括能力。实证结果表明，我们的方法在处理复杂、不可预测的驾驶环境方面优于传统方法，尤其是在不需要大规模重新训练的情况下。

更新时间: 2024-07-19 12:13:08

领域: cs.AI

下载: http://arxiv.org/abs/2407.14239v1

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer

Direct speech-to-speech translation (S2ST) with discrete self-supervised representations has achieved remarkable accuracy, but is unable to preserve the speaker timbre of the source speech. Meanwhile, the scarcity of high-quality speaker-parallel data poses a challenge for learning style transfer during translation. We design an S2ST pipeline with style-transfer capability on the basis of discrete self-supervised speech representations and codec units. The acoustic language model we introduce for style transfer leverages self-supervised in-context learning, acquiring style transfer ability without relying on any speaker-parallel data, thereby overcoming data scarcity. By using extensive training data, our model achieves zero-shot cross-lingual style transfer on previously unseen source languages. Experiments show that our model generates translated speeches with high fidelity and speaker similarity. Audio samples are available at http://stylelm.github.io/ .

Updated: 2024-07-19 12:11:52

标题: 使用基于离散单元风格转移的语音到语音翻译

摘要: 具有离散自监督表示的直接语音到语音翻译（S2ST）已经取得了显著的准确性，但无法保留源语音的说话者音色。同时，高质量说话者平行数据的稀缺性对学习翻译过程中的风格转移构成挑战。我们基于离散自监督语音表示和编解码单元设计了具有风格转移能力的S2ST管道。我们引入用于风格转移的声学语言模型利用自监督上下文学习，获得了风格转移能力，无需依赖任何说话者平行数据，从而克服了数据稀缺性。通过使用大量训练数据，我们的模型实现了在以前未见的源语言上的零射跨语言风格转移。实验证明，我们的模型生成了具有高保真度和说话者相似性的翻译语音。音频样本可在http://stylelm.github.io/ 上获得。

更新时间: 2024-07-19 12:11:52

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2309.07566v2

Hyper-Heuristics Can Profit From Global Variation Operators

In recent work, Lissovoi, Oliveto, and Warwicker (Artificial Intelligence (2023)) proved that the Move Acceptance Hyper-Heuristic (MAHH) leaves the local optimum of the multimodal CLIFF benchmark with remarkable efficiency. The $O(n^3)$ runtime of the MAHH, for almost all cliff widths $d\ge 2,$ is significantly better than the $\Theta(n^d)$ runtime of simple elitist evolutionary algorithms (EAs) on CLIFF. In this work, we first show that this advantage is specific to the CLIFF problem and does not extend to the JUMP benchmark, the most prominent multi-modal benchmark in the theory of randomized search heuristics. We prove that for any choice of the MAHH selection parameter $p$, the expected runtime of the MAHH on a JUMP function with gap size $m = O(n^{1/2})$ is at least $\Omega(n^{2m-1} / (2m-1)!)$. This is significantly slower than the $O(n^m)$ runtime of simple elitist EAs. Encouragingly, we also show that replacing the local one-bit mutation operator in the MAHH with the global bit-wise mutation operator, commonly used in EAs, yields a runtime of $\min\{1, O(\frac{e\ln(n)}{m})^m\} \, O(n^m)$ on JUMP functions. This is at least as good as the runtime of simple elitist EAs. For larger values of $m$, this result proves an asymptotic performance gain over simple EAs. As our proofs reveal, the MAHH profits from its ability to walk through the valley of lower objective values in moderate-size steps, always accepting inferior solutions. This is the first time that such an optimization behavior is proven via mathematical means. Generally, our result shows that combining two ways of coping with local optima, global mutation and accepting inferior solutions, can lead to considerable performance gains.

Updated: 2024-07-19 12:10:05

标题: 超启发式算法可以从全局变异算子中获益

摘要: 在最近的工作中，Lissovoi, Oliveto和Warwicker（《人工智能》（2023））证明了移动接受超启发式（MAHH）以非凡的效率离开了多模态CLIFF基准的局部最优解。对于几乎所有悬崖宽度$d\ge 2$，MAHH的$O(n^3)$运行时间明显优于CLIFF上简单精英进化算法（EAs）的$\Theta(n^d)$运行时间。在这项工作中，我们首先展示了这种优势仅适用于CLIFF问题，而不适用于JUMP基准，这是随机搜索启发式理论中最重要的多模态基准。我们证明，对于MAHH选择参数$p$的任何选择，在JUMP函数中，间隙大小为$m = O(n^{1/2})$的情况下，MAHH的期望运行时间至少为$\Omega(n^{2m-1} / (2m-1)！)$。这比简单精英EAs的$O(n^m)$运行时间慢得多。令人鼓舞的是，我们还展示了将MAHH中的局部单比特突变算子替换为全局比特突变算子，通常在EAs中使用，可以在JUMP函数上获得$\min\{1，O(\frac{e\ln(n)}{m})^m\} \, O(n^m)$的运行时间。这至少与简单精英EAs的运行时间一样好。对于较大的$m$值，这个结果证明了相对于简单EAs的渐近性能提升。正如我们的证明所揭示的，MAHH受益于其能够在中等规模的步长中走过较低目标值谷底的能力，始终接受较差的解决方案。这是第一次通过数学手段证明了这种优化行为。总的来说，我们的结果表明，结合应对局部最优解的两种方式，全局突变和接受较差解决方案，可以带来显著的性能提升。

更新时间: 2024-07-19 12:10:05

领域: cs.NE,cs.AI,cs.DS

下载: http://arxiv.org/abs/2407.14237v1

Cross-Validation Is All You Need: A Statistical Approach To Label Noise Estimation

Machine learning models experience deteriorated performance when trained in the presence of noisy labels. This is particularly problematic for medical tasks, such as survival prediction, which typically face high label noise complexity with few clear-cut solutions. Inspired by the large fluctuations across folds in the cross-validation performance of survival analyses, we design Monte-Carlo experiments to show that such fluctuation could be caused by label noise. We propose two novel and straightforward label noise detection algorithms that effectively identify noisy examples by pinpointing the samples that more frequently contribute to inferior cross-validation results. We first introduce Repeated Cross-Validation (ReCoV), a parameter-free label noise detection algorithm that is robust to model choice. We further develop fastReCoV, a less robust but more tractable and efficient variant of ReCoV suitable for deep learning applications. Through extensive experiments, we show that ReCoV and fastReCoV achieve state-of-the-art label noise detection performance in a wide range of modalities, models and tasks, including survival analysis, which has yet to be addressed in the literature. Our code and data are publicly available at https://github.com/GJiananChen/ReCoV.

Updated: 2024-07-19 12:08:27

标题: 交叉验证就是你所需要的：一种统计方法来估计标签噪声

摘要: 机器学习模型在存在嘈杂标签的情况下训练时性能会下降。这对于医疗任务特别有问题，比如生存预测，通常面临着高标签噪声复杂度，而解决方案很少清晰。受交叉验证中生存分析性能在不同折叠中出现大幅波动的启发，我们设计了蒙特卡洛实验，展示这种波动可能是由标签噪声引起的。我们提出了两种新颖且直观的标签噪声检测算法，通过定位更频繁导致交叉验证结果较差的样本，有效识别出嘈杂样本。首先介绍了Repeated Cross-Validation（ReCoV），这是一个无需参数选择的标签噪声检测算法，对模型选择具有鲁棒性。我们进一步开发了fastReCoV，这是ReCoV的一个不太鲁棒但更易处理和高效的变体，适用于深度学习应用。通过广泛实验，我们展示了ReCoV和fastReCoV在各种模态、模型和任务中，包括尚未在文献中讨论的生存分析中，实现了最先进的标签噪声检测性能。我们的代码和数据可以在 https://github.com/GJiananChen/ReCoV 上公开获取。

更新时间: 2024-07-19 12:08:27

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2306.13990v2

DUPLEX: Dual GAT for Complex Embedding of Directed Graphs

Current directed graph embedding methods build upon undirected techniques but often inadequately capture directed edge information, leading to challenges such as: (1) Suboptimal representations for nodes with low in/out-degrees, due to the insufficient neighbor interactions; (2) Limited inductive ability for representing new nodes post-training; (3) Narrow generalizability, as training is overly coupled with specific tasks. In response, we propose DUPLEX, an inductive framework for complex embeddings of directed graphs. It (1) leverages Hermitian adjacency matrix decomposition for comprehensive neighbor integration, (2) employs a dual GAT encoder for directional neighbor modeling, and (3) features two parameter-free decoders to decouple training from particular tasks. DUPLEX outperforms state-of-the-art models, especially for nodes with sparse connectivity, and demonstrates robust inductive capability and adaptability across various tasks. The code is available at https://github.com/alipay/DUPLEX.

Updated: 2024-07-19 12:04:29

标题: DUPLEX：用于有向图复杂嵌入的双GAT

摘要: 当前的有向图嵌入方法建立在无向技术的基础上，但通常不能充分捕捉有向边信息，导致诸如以下挑战：（1）由于邻居交互不足，低入/出度节点的表示不够优化；（2）对于在训练后表示新节点的归纳能力有限；（3）由于训练过于与特定任务耦合，广义性较窄。为此，我们提出了DUPLEX，一个复杂的有向图嵌入的归纳框架。它（1）利用Hermitian邻接矩阵分解来实现全面的邻居整合，（2）采用双GAT编码器进行方向性邻居建模，（3）具有两个无参数解码器来解耦训练与特定任务。DUPLEX在性能上优于最先进的模型，尤其在稀疏连接节点上表现出色，并展示了对各种任务的强大归纳能力和适应性。代码可在https://github.com/alipay/DUPLEX 获取。

更新时间: 2024-07-19 12:04:29

领域: cs.LG

下载: http://arxiv.org/abs/2406.05391v2

Realistic Evaluation of Test-Time Adaptation Algorithms: Unsupervised Hyperparameter Selection

Test-Time Adaptation (TTA) has recently emerged as a promising strategy for tackling the problem of machine learning model robustness under distribution shifts by adapting the model during inference without access to any labels. Because of task difficulty, hyperparameters strongly influence the effectiveness of adaptation. However, the literature has provided little exploration into optimal hyperparameter selection. In this work, we tackle this problem by evaluating existing TTA methods using surrogate-based hp-selection strategies (which do not assume access to the test labels) to obtain a more realistic evaluation of their performance. We show that some of the recent state-of-the-art methods exhibit inferior performance compared to the previous algorithms when using our more realistic evaluation setup. Further, we show that forgetting is still a problem in TTA as the only method that is robust to hp-selection resets the model to the initial state at every step. We analyze different types of unsupervised selection strategies, and while they work reasonably well in most scenarios, the only strategies that work consistently well use some kind of supervision (either by a limited number of annotated test samples or by using pretraining data). Our findings underscore the need for further research with more rigorous benchmarking by explicitly stating model selection strategies, to facilitate which we open-source our code.

Updated: 2024-07-19 11:58:30

标题: 测试时间适应算法的现实评估：无监督超参数选择

摘要: 测试时间适应（TTA）最近作为一种有前景的策略出现，用于解决机器学习模型在分布转移下的鲁棒性问题，通过在推断过程中调整模型而无需访问任何标签。由于任务难度，超参数强烈影响适应的有效性。然而，文献对最佳超参数选择提供了很少的探索。在本研究中，我们通过使用基于替代的hp选择策略（不假设访问测试标签）评估现有的TTA方法，以获得更真实的性能评估。我们发现，一些最近的最先进方法在使用我们更真实的评估设置时表现不佳，与以前的算法相比。此外，我们发现在TTA中遗忘仍然是一个问题，因为唯一能够在hp选择时保持稳健的方法是在每一步将模型重置为初始状态。我们分析了不同类型的无监督选择策略，虽然它们在大多数情况下表现良好，但唯一能够始终良好运行的策略使用某种形式的监督（通过有限数量的注释测试样本或使用预训练数据）。我们的发现强调了需要有更严格的基准测试进行进一步研究，明确说明模型选择策略，为此我们开源我们的代码。

更新时间: 2024-07-19 11:58:30

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.14231v1

ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading

Glaucoma is one of the leading causes of vision impairment. Digital imaging techniques, such as color fundus photography (CFP) and optical coherence tomography (OCT), provide quantitative and noninvasive methods for glaucoma diagnosis. Recently, in the field of computer-aided glaucoma diagnosis, multi-modality methods that integrate the CFP and OCT modalities have achieved greater diagnostic accuracy compared to single-modality methods. However, it remains challenging to extract reliable features due to the high similarity of medical images and the unbalanced multi-modal data distribution. Moreover, existing methods overlook the uncertainty estimation of different modalities, leading to unreliable predictions. To address these challenges, we propose a novel framework, namely ETSCL, which consists of a contrastive feature extraction stage and a decision-level fusion stage. Specifically, the supervised contrastive loss is employed to enhance the discriminative power in the feature extraction process, resulting in more effective features. In addition, we utilize the Frangi vesselness algorithm as a preprocessing step to incorporate vessel information to assist in the prediction. In the decision-level fusion stage, an evidence theory-based multi-modality classifier is employed to combine multi-source information with uncertainty estimation. Extensive experiments demonstrate that our method achieves state-of-the-art performance. The code is available at \url{https://github.com/master-Shix/ETSCL}.

Updated: 2024-07-19 11:57:56

标题: ETSCL：基于证据理论的监督对比学习框架用于多模式青光眼分级

摘要: 青光眼是视力损伤的主要原因之一。数字成像技术，如彩色眼底照相（CFP）和光学相干断层扫描（OCT），为青光眼诊断提供了定量且无创的方法。最近，在计算机辅助青光眼诊断领域，将CFP和OCT模态集成的多模态方法相较于单模态方法具有更高的诊断准确性。然而，由于医学图像的高相似性和不平衡的多模态数据分布，提取可靠特征仍然具有挑战性。此外，现有方法忽视了不同模态的不确定性估计，导致预测不可靠。为了解决这些挑战，我们提出了一种新颖的框架，即ETSCL，它包括对比特征提取阶段和决策级融合阶段。具体来说，采用监督对比损失来增强特征提取过程中的区分能力，从而产生更有效的特征。此外，我们利用Frangi血管算法作为预处理步骤，将血管信息融入以辅助预测。在决策级融合阶段，采用基于证据理论的多模态分类器将多源信息与不确定性估计相结合。大量实验证明我们的方法达到了最先进的性能水平。代码可在\url{https://github.com/master-Shix/ETSCL}获取。

更新时间: 2024-07-19 11:57:56

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.14230v1

Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models

This paper presents Words2Contact, a language-guided multi-contact placement pipeline leveraging large language models and vision language models. Our method is a key component for language-assisted teleoperation and human-robot cooperation, where human operators can instruct the robots where to place their support contacts before whole-body reaching or manipulation using natural language. Words2Contact transforms the verbal instructions of a human operator into contact placement predictions; it also deals with iterative corrections, until the human is satisfied with the contact location identified in the robot's field of view. We benchmark state-of-the-art LLMs and VLMs for size and performance in contact prediction. We demonstrate the effectiveness of the iterative correction process, showing that users, even naive, quickly learn how to instruct the system to obtain accurate locations. Finally, we validate Words2Contact in real-world experiments with the Talos humanoid robot, instructed by human operators to place support contacts on different locations and surfaces to avoid falling when reaching for distant objects.

Updated: 2024-07-19 11:57:34

标题: Words2Contact：使用基础模型从口头指示中识别支持联系人

摘要: 本文介绍了Words2Contact，这是一个利用大型语言模型和视觉语言模型的语言引导多接触点放置流水线。我们的方法是语言辅助远程操作和人机合作的关键组成部分，其中人类操作员可以通过自然语言指导机器人在全身伸展或操作之前放置支撑接触点。Words2Contact将人类操作员的口头指令转化为接触点放置预测；它还处理迭代校正，直到人类满意于机器人视野中确定的接触位置。我们在接触点预测的大小和性能方面对最先进的LLMs和VLMs进行了基准测试。我们展示了迭代校正过程的有效性，表明用户甚至是初学者也能快速学会如何指导系统以获取准确的位置。最后，我们在真实世界实验中验证了Words2Contact，通过人类操作员指导Talos人形机器人在不同位置和表面放置支撑接触点，以避免在伸手取远处物体时摔倒。

更新时间: 2024-07-19 11:57:34

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2407.14229v1

Towards End-to-End Spoken Grammatical Error Correction

Grammatical feedback is crucial for L2 learners, teachers, and testers. Spoken grammatical error correction (GEC) aims to supply feedback to L2 learners on their use of grammar when speaking. This process usually relies on a cascaded pipeline comprising an ASR system, disfluency removal, and GEC, with the associated concern of propagating errors between these individual modules. In this paper, we introduce an alternative "end-to-end" approach to spoken GEC, exploiting a speech recognition foundation model, Whisper. This foundation model can be used to replace the whole framework or part of it, e.g., ASR and disfluency removal. These end-to-end approaches are compared to more standard cascaded approaches on the data obtained from a free-speaking spoken language assessment test, Linguaskill. Results demonstrate that end-to-end spoken GEC is possible within this architecture, but the lack of available data limits current performance compared to a system using large quantities of text-based GEC data. Conversely, end-to-end disfluency detection and removal, which is easier for the attention-based Whisper to learn, does outperform cascaded approaches. Additionally, the paper discusses the challenges of providing feedback to candidates when using end-to-end systems for spoken GEC.

Updated: 2024-07-19 11:32:32

标题: 朝向端到端口语语法错误纠正

摘要: 语法反馈对于第二语言学习者、教师和测试者至关重要。口语语法错误纠正（GEC）旨在为第二语言学习者在说话时提供语法使用方面的反馈。这个过程通常依赖于一个级联管道，包括自动语音识别系统、去除口吃、和GEC，涉及到在这些单独模块之间传播错误的问题。在本文中，我们介绍了一种替代的“端到端”口语GEC方法，利用了一个语音识别基础模型Whisper。这个基础模型可以用来替换整个框架或其中的一部分，例如自动语音识别和去除口吃。这些端到端方法与更标准的级联方法在从自由口语语言评估测试Linguaskill获得的数据上进行了比较。结果表明，在这种架构中端到端口语GEC是可能的，但由于可用数据的缺乏，当前性能受到限制，与使用大量基于文本的GEC数据的系统相比。另一方面，端到端口吃检测和去除，对于基于注意力的Whisper来说更容易学习，确实优于级联方法。此外，本文讨论了在使用端到端系统进行口语GEC时向考生提供反馈的挑战。

更新时间: 2024-07-19 11:32:32

领域: cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2311.05550v2

Domain Adaptation for Industrial Time-series Forecasting via Counterfactual Inference

Industrial time-series, as a structural data responds to production process information, can be utilized to perform data-driven decision-making for effective monitoring of industrial production process. However, there are some challenges for time-series forecasting in industry, e.g., predicting few-shot caused by data shortage, and decision-confusing caused by unknown treatment policy. To cope with the problems, we propose a novel causal domain adaptation framework, Causal Domain Adaptation (CDA) forecaster to improve the performance on the interested domain with limited data (target). Firstly, we analyze the causality existing along with treatments, and thus ensure the shared causality over time. Subsequently, we propose an answer-based attention mechanism to achieve domain-invariant representation by the shared causality in both domains. Then, a novel domain-adaptation is built to model treatments and outcomes jointly training on source and target domain. The main insights are that our designed answer-based attention mechanism allows the target domain to leverage the existed causality in source time-series even with different treatments, and our forecaster can predict the counterfactual outcome of industrial time-series, meaning a guidance in production process. Compared with commonly baselines, our method on real-world and synthetic oilfield datasets demonstrates the effectiveness in across-domain prediction and the practicality in guiding production process

Updated: 2024-07-19 11:19:43

标题: 领域自适应在工业时间序列预测中的应用：通过反事实推断

摘要: 工业时间序列作为结构化数据对生产过程信息做出响应，可以用于进行基于数据的决策，以有效监测工业生产过程。然而，在工业领域进行时间序列预测面临一些挑战，比如由数据短缺引起的少样本预测，以及由未知处理政策引起的决策混淆。为了解决这些问题，我们提出了一种新颖的因果领域适应框架，Causal Domain Adaptation (CDA) 预测器，以提高在有限数据（目标）领域的性能。首先，我们分析了沿着处理存在的因果关系，并确保随着时间的推移共享因果关系。随后，我们提出了一种基于答案的注意机制，通过在两个领域中共享的因果关系实现领域不变表示。然后，建立了一种新颖的领域适应模型，同时在源领域和目标领域上进行处理和结果的联合训练。我们的主要见解是，我们设计的基于答案的注意机制允许目标领域利用源时间序列中存在的因果关系，即使处理不同，我们的预测器可以预测工业时间序列的反事实结果，为生产过程提供指导。与常见基线方法相比，我们在真实世界和合成油田数据集上的方法展示了跨领域预测的有效性和在指导生产过程中的实用性。

更新时间: 2024-07-19 11:19:43

领域: cs.LG,cs.IT,math.IT

下载: http://arxiv.org/abs/2407.14214v1

Enhanced Mortality Prediction in ICU Stroke Patients via Deep Learning

Background: Stroke is second-leading cause of disability and death among adults. Approximately 17 million people suffer from a stroke annually, with about 85% being ischemic strokes. Predicting mortality of ischemic stroke patients in intensive care unit (ICU) is crucial for optimizing treatment strategies, allocating resources, and improving survival rates. Methods: We acquired data on ICU ischemic stroke patients from MIMIC-IV database, including diagnoses, vital signs, laboratory tests, medications, procedures, treatments, and clinical notes. Stroke patients were randomly divided into training (70%, n=2441), test (15%, n=523), and validation (15%, n=523) sets. To address data imbalances, we applied Synthetic Minority Over-sampling Technique (SMOTE). We selected 30 features for model development, significantly reducing feature number from 1095 used in the best study. We developed a deep learning model to assess mortality risk and implemented several baseline machine learning models for comparison. Results: XGB-DL model, combining XGBoost for feature selection and deep learning, effectively minimized false positives. Model AUROC improved from 0.865 (95% CI: 0.821 - 0.905) on first day to 0.903 (95% CI: 0.868 - 0.936) by fourth day using data from 3,646 ICU mortality patients in the MIMIC-IV database with 0.945 AUROC (95% CI: 0.944 - 0.947) during training. Although other ML models also performed well in terms of AUROC, we chose Deep Learning for its higher specificity. Conclusions: Through enhanced feature selection and data cleaning, proposed model demonstrates a 13% AUROC improvement compared to existing models while reducing feature number from 1095 in previous studies to 30.

Updated: 2024-07-19 11:17:42

标题: 通过深度学习提高ICU中中风患者的死亡预测

摘要: 背景：中风是成年人残疾和死亡的第二大原因。每年约有1700万人患中风，其中约85%为缺血性中风。预测ICU中缺血性中风患者的死亡率对于优化治疗策略、分配资源和提高生存率至关重要。方法：我们从MIMIC-IV数据库获取了ICU中缺血性中风患者的数据，包括诊断、生命体征、实验室检查、药物、程序、治疗和临床记录。中风患者被随机分为训练组（70%，n=2441）、测试组（15%，n=523）和验证组（15%，n=523）。为解决数据不平衡问题，我们应用了Synthetic Minority Over-sampling Technique（SMOTE）。我们选择了30个特征进行模型开发，显著减少了特征数量，从之前研究中使用的1095个特征。我们开发了一个深度学习模型来评估死亡风险，并实施了几种基准机器学习模型进行比较。结果：XGB-DL模型，结合XGBoost进行特征选择和深度学习，有效地减少了误报率。模型的AUROC从第一天的0.865（95% CI: 0.821-0.905）提高到第四天的0.903（95% CI: 0.868-0.936），使用MIMIC-IV数据库中3,646名ICU死亡患者的数据，训练期间的AUROC为0.945（95% CI: 0.944-0.947）。尽管其他机器学习模型在AUROC方面表现良好，但我们选择深度学习是因为其更高的特异性。结论：通过增强特征选择和数据清洗，所提出的模型与现有模型相比，AUROC提高了13%，同时将特征数量从之前研究中的1095个减少到30个。

更新时间: 2024-07-19 11:17:42

领域: cs.LG

下载: http://arxiv.org/abs/2407.14211v1

Fair Overlap Number of Balls (Fair-ONB): A Data-Morphology-based Undersampling Method for Bias Reduction

Given the magnitude of data generation currently, both in quantity and speed, the use of machine learning is increasingly important. When data include protected features that might give rise to discrimination, special care must be taken. Data quality is critical in these cases, as biases in training data can be reflected in classification models. This has devastating consequences and fails to comply with current regulations. Data-Centric Artificial Intelligence proposes dataset modifications to improve its quality. Instance selection via undersampling can foster balanced learning of classes and protected feature values in the classifier. When such undersampling is done close to the decision boundary, the effect on the classifier would be bolstered. This work proposes Fair Overlap Number of Balls (Fair-ONB), an undersampling method that harnesses the data morphology of the different data groups (obtained from the combination of classes and protected feature values) to perform guided undersampling in the areas where they overlap. It employs attributes of the ball coverage of the groups, such as the radius, number of covered instances and density, to select the most suitable areas for undersampling and reduce bias. Results show that the Fair-ONB method reduces bias with low impact on the classifier's predictive performance.

Updated: 2024-07-19 11:16:02

标题: 公平重叠球数（Fair-ONB）：一种基于数据形态学的欠采样方法，用于减少偏差

摘要: 鉴于当前数据生成的规模之大，无论是数量还是速度，机器学习的应用变得越来越重要。当数据包含可能导致歧视的受保护特征时，必须特别小心。在这些情况下，数据质量至关重要，因为训练数据中的偏见可能会体现在分类模型中。这将产生灾难性后果，并且不符合当前的法规。数据中心人工智能提出了数据集修改以提高其质量。通过欠采样进行实例选择可以促进分类器中类和受保护特征值的平衡学习。当这种欠采样接近决策边界时，对分类器的影响会得到加强。本文提出了公平重叠球数（Fair-ONB），一种利用不同数据组的数据形态（从类和受保护特征值的组合中获得）进行引导欠采样的方法，以在它们重叠的区域执行欠采样。它利用球覆盖组的属性，如半径、覆盖实例数和密度，选择最适合欠采样的区域，减少偏见。结果显示，Fair-ONB方法减少了偏见，对分类器的预测性能影响较小。

更新时间: 2024-07-19 11:16:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.14210v1

Unlearning Concepts from Text-to-Video Diffusion Models

With the advancement of computer vision and natural language processing, text-to-video generation, enabled by text-to-video diffusion models, has become more prevalent. These models are trained using a large amount of data from the internet. However, the training data often contain copyrighted content, including cartoon character icons and artist styles, private portraits, and unsafe videos. Since filtering the data and retraining the model is challenging, methods for unlearning specific concepts from text-to-video diffusion models have been investigated. However, due to the high computational complexity and relative large optimization scale, there is little work on unlearning methods for text-to-video diffusion models. We propose a novel concept-unlearning method by transferring the unlearning capability of the text encoder of text-to-image diffusion models to text-to-video diffusion models. Specifically, the method optimizes the text encoder using few-shot unlearning, where several generated images are used. We then use the optimized text encoder in text-to-video diffusion models to generate videos. Our method costs low computation resources and has small optimization scale. We discuss the generated videos after unlearning a concept. The experiments demonstrates that our method can unlearn copyrighted cartoon characters, artist styles, objects and people's facial characteristics. Our method can unlearn a concept within about 100 seconds on an RTX 3070. Since there was no concept unlearning method for text-to-video diffusion models before, we make concept unlearning feasible and more accessible in the text-to-video domain.

Updated: 2024-07-19 11:15:02

标题: 从文本到视频扩散模型中的概念去学习

摘要: 随着计算机视觉和自然语言处理的进步，由文本到视频生成的技术，通过文本到视频扩散模型实现，变得更加普遍。这些模型是通过使用大量来自互联网的数据进行训练的。然而，训练数据通常包含受版权保护的内容，包括卡通人物图标和艺术家风格、私人肖像和不安全视频。由于过滤数据并重新训练模型具有挑战性，因此已经研究了从文本到视频扩散模型中取消特定概念的方法。然而，由于高计算复杂性和相对较大的优化规模，目前对于文本到视频扩散模型的取消方法的研究还很少。我们提出了一种新颖的概念取消方法，通过将文本到图像扩散模型的文本编码器的取消能力转移到文本到视频扩散模型中。具体来说，该方法使用少量生成的图像对文本编码器进行优化。然后，将优化后的文本编码器应用于文本到视频扩散模型中生成视频。我们的方法计算资源消耗低，优化规模小。我们讨论了取消概念后生成的视频。实验证明，我们的方法可以取消受版权保护的卡通人物、艺术家风格、物体和人物面部特征。我们的方法可以在RTX 3070上约100秒内取消一个概念。由于以前没有针对文本到视频扩散模型的概念取消方法，我们使概念取消在文本到视频领域变得可行和更易于实现。

更新时间: 2024-07-19 11:15:02

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.14209v1

Longhorn: State Space Models are Amortized Online Learners

The most fundamental capability of modern AI methods such as Large Language Models (LLMs) is the ability to predict the next token in a long sequence of tokens, known as ``sequence modeling." Although the Transformers model is the current dominant approach to sequence modeling, its quadratic computational cost with respect to sequence length is a significant drawback. State-space models (SSMs) offer a promising alternative due to their linear decoding efficiency and high parallelizability during training. However, existing SSMs often rely on seemingly ad hoc linear recurrence designs. In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems. This approach links SSM design to formulating precise online learning objectives, with state transition rules derived from optimizing these objectives. Based on this insight, we introduce a novel deep SSM architecture based on the implicit update for optimizing an online regression objective. Our experimental results show that our models outperform state-of-the-art SSMs, including the Mamba model, on standard sequence modeling benchmarks and language modeling tasks.

Updated: 2024-07-19 11:12:08

标题: 长角牛：状态空间模型是摊销的在线学习者

摘要: 现代人工智能方法（例如大型语言模型(LLMs)）最基本的能力是能够预测长序列中的下一个标记，这被称为“序列建模”。尽管Transformers模型是当前主流的序列建模方法，但是其与序列长度成二次关系的计算成本是一个显著的缺点。状态空间模型（SSMs）由于其线性解码效率和训练过程中的高并行性，提供了一个有前途的替代方案。然而，现有的SSMs通常依赖看似临时的线性递归设计。在这项工作中，我们通过在线学习的视角探索SSM设计，将SSMs概念化为特定在线学习问题的元模块。这种方法将SSM设计与制定精确的在线学习目标联系起来，状态转换规则由优化这些目标导出。基于这一洞察，我们引入了一种基于隐式更新的新型深度SSM体系结构，用于优化在线回归目标。我们的实验结果显示，我们的模型在标准序列建模基准和语言建模任务中胜过了最先进的SSMs，包括Mamba模型。

更新时间: 2024-07-19 11:12:08

领域: cs.LG

下载: http://arxiv.org/abs/2407.14207v1

Robust agents learn causal world models

It has long been hypothesised that causal reasoning plays a fundamental role in robust and general intelligence. However, it is not known if agents must learn causal models in order to generalise to new domains, or if other inductive biases are sufficient. We answer this question, showing that any agent capable of satisfying a regret bound under a large set of distributional shifts must have learned an approximate causal model of the data generating process, which converges to the true causal model for optimal agents. We discuss the implications of this result for several research areas including transfer learning and causal inference.

Updated: 2024-07-19 11:12:08

标题: 稳健的代理学习因果世界模型

摘要: 长期以来，人们一直假设因果推理在强大和普遍智能中起着基础性作用。然而，目前尚不清楚代理人是否必须学习因果模型才能推广到新的领域，或者其他归纳偏差是否足够。我们回答了这个问题，表明任何能够在大量分布变化下满足后悔界限的代理人必须已经学习了数据生成过程的近似因果模型，对于最优代理人来说，这一模型会收敛到真实的因果模型。我们讨论了这一结果对于包括迁移学习和因果推理在内的若干研究领域的影响。

更新时间: 2024-07-19 11:12:08

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.10877v7

Watermark Smoothing Attacks against Language Models

Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling attribution of the text to the originating model. We introduce smoothing attacks and show that existing watermarking methods are not robust against minor modifications of text. An adversary can use weaker language models to smooth out the distribution perturbations caused by watermarks without significantly compromising the quality of the generated text. The modified text resulting from the smoothing attack remains close to the distribution of text that the original model (without watermark) would have produced. Our attack reveals a fundamental limitation of a wide range of watermarking techniques.

Updated: 2024-07-19 11:04:54

标题: 水印平滑攻击对语言模型的影响

摘要: 数字水印技术是一种用于在大型语言模型（LLMs）生成的文本概率分布中嵌入隐藏信号的技术，使得可以将文本归因于原始模型。我们引入了平滑攻击，并展示现有数字水印方法对文本的轻微修改不具有强大的鲁棒性。攻击者可以使用较弱的语言模型来平滑水印引起的分布扰动，而不会显著影响生成文本的质量。平滑攻击产生的修改文本仍然接近于原始模型（无水印）可能生成的文本分布。我们的攻击揭示了广泛范围的数字水印技术存在的一个基本限制。

更新时间: 2024-07-19 11:04:54

领域: cs.LG

下载: http://arxiv.org/abs/2407.14206v1

SHS: Scorpion Hunting Strategy Swarm Algorithm

We introduced the Scorpion Hunting Strategy (SHS), a novel population-based, nature-inspired optimisation algorithm. This algorithm draws inspiration from the hunting strategy of scorpions, which identify, locate, and capture their prey using the alpha and beta vibration operators. These operators control the SHS algorithm's exploitation and exploration abilities. To formulate an optimisation method, we mathematically simulate these dynamic events and behaviors. We evaluate the effectiveness of the SHS algorithm by employing 20 benchmark functions (including 10 conventional and 10 CEC2020 functions), using both qualitative and quantitative analyses. Through a comparative analysis with 12 state-of-the-art meta-heuristic algorithms, we demonstrate that the proposed SHS algorithm yields exceptionally promising results. These findings are further supported by statistically significant results obtained through the Wilcoxon rank sum test. Additionally, the ranking of SHS, as determined by the average rank derived from the Friedman test, positions it at the forefront when compared to other algorithms. Going beyond theoretical validation, we showcase the practical utility of the SHS algorithm by applying it to six distinct real-world optimisation tasks. These applications illustrate the algorithm's potential in addressing complex optimisation challenges. In summary, this work not only introduces the innovative SHS algorithm but also substantiates its effectiveness and versatility through rigorous benchmarking and real-world problem-solving scenarios.

Updated: 2024-07-19 10:58:42

标题: SHS：蝎子狩猎策略群算法

摘要: 我们介绍了蝎子狩猎策略（SHS），这是一种新颖的基于种群的、受自然启发的优化算法。该算法借鉴了蝎子的狩猎策略，利用alpha和beta振动算子识别、定位和捕捉猎物。这些算子控制了SHS算法的开发和探索能力。为了制定一个优化方法，我们对这些动态事件和行为进行了数学模拟。我们通过使用20个基准函数（包括10个传统函数和10个CEC2020函数），从定性和定量分析两个方面评估了SHS算法的有效性。通过与12种最先进的元启发算法进行比较分析，我们证明了提出的SHS算法产生了异常有希望的结果。这些发现得到了通过Wilcoxon秩和检验获得的具有统计显著性的结果的进一步支持。此外，通过Friedman检验得出的平均排名确定的SHS排名将其置于其他算法之前的位置。超越理论验证，我们展示了SHS算法在六个不同的现实世界优化任务中的实际效用。这些应用展示了该算法在解决复杂优化问题方面的潜力。总之，这项工作不仅介绍了创新的SHS算法，还通过严格的基准测试和实际问题解决方案证实了其有效性和多功能性。

更新时间: 2024-07-19 10:58:42

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2407.14202v1

On the matrix code of quadratic relationships for a Goppa code

In this article, we continue the analysis started in \cite{CMT23} for the matrix code of quadratic relationships associated with a Goppa code. We provide new sparse and low-rank elements in the matrix code and categorize them according to their shape. Thanks to this description, we prove that the set of rank 2 matrices in the matrix codes associated with square-free binary Goppa codes, i.e. those used in Classic McEiece, is much larger than what is expected, at least in the case where the Goppa polynomial degree is 2. We build upon the algebraic determinantal modeling introduced in \cite{CMT23} to derive a structural attack on these instances. Our method can break in just a few seconds some recent challenges about key-recovery attacks on the McEliece cryptosystem, consistently reducing their estimated security level. We also provide a general method, valid for any Goppa polynomial degree, to transform a generic pair of support and multiplier into a pair of support and Goppa polynomial.

Updated: 2024-07-19 10:52:12

标题: 关于Goppa码的二次关系矩阵代码

摘要: 在这篇文章中，我们继续分析与Goppa码相关的二次关系的矩阵码，该分析始于\cite{CMT23}。我们提供了矩阵码中的新的稀疏和低秩元素，并根据它们的形状进行分类。通过这种描述，我们证明了与无平方二进制Goppa码相关的矩阵码中秩为2的矩阵集合比预期的要大得多，至少在Goppa多项式次数为2的情况下。我们借鉴了\cite{CMT23}中引入的代数行列式建模，以推导对这些实例的结构攻击。我们的方法可以在几秒钟内破解一些关于McEliece加密系统密钥恢复攻击的最新挑战，从而持续降低它们的预估安全级别。我们还提供了一种通用方法，适用于任何Goppa多项式次数，将一个通用的支持和乘子对转化为一个支持和Goppa多项式对。

更新时间: 2024-07-19 10:52:12

领域: cs.IT,cs.CR,math.IT

下载: http://arxiv.org/abs/2310.20497v2

Enhancing Variable Importance in Random Forests: A Novel Application of Global Sensitivity Analysis

The present work provides an application of Global Sensitivity Analysis to supervised machine learning methods such as Random Forests. These methods act as black boxes, selecting features in high--dimensional data sets as to provide accurate classifiers in terms of prediction when new data are fed into the system. In supervised machine learning, predictors are generally ranked by importance based on their contribution to the final prediction. Global Sensitivity Analysis is primarily used in mathematical modelling to investigate the effect of the uncertainties of the input variables on the output. We apply it here as a novel way to rank the input features by their importance to the explainability of the data generating process, shedding light on how the response is determined by the dependence structure of its predictors. A simulation study shows that our proposal can be used to explore what advances can be achieved either in terms of efficiency, explanatory ability, or simply by way of confirming existing results.

Updated: 2024-07-19 10:45:36

标题: 提高随机森林中的变量重要性：全局敏感性分析的新应用

摘要: 这项研究将全局敏感性分析应用于监督机器学习方法，如随机森林。这些方法作为黑匣子，在高维数据集中选择特征，以提供准确的分类器，当新数据输入系统时进行预测。在监督机器学习中，预测器通常根据它们对最终预测的贡献进行排序。全局敏感性分析主要用于数学建模，以研究输入变量的不确定性对输出的影响。我们将其应用在这里，作为一种新颖的方式来根据它们对数据生成过程的可解释性的重要性对输入特征进行排序，揭示了响应是如何由其预测变量的依赖结构决定的。一项模拟研究表明，我们的提议可以用于探索在效率、解释能力方面或仅通过确认现有结果方面可能取得的进展。

更新时间: 2024-07-19 10:45:36

领域: stat.ML,cs.LG,stat.AP,stat.CO

下载: http://arxiv.org/abs/2407.14194v1

LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives

The widespread adoption of synthetic data raises new questions about how models generating the data can influence other large language models (LLMs) via distilled data. To start, our work exhaustively characterizes the impact of passive inheritance of model properties by systematically studying the consequences of synthetic data integration. We provide one of the most comprehensive studies to-date of how the source of synthetic data shapes models' internal biases, calibration and generations' textual attributes and preferences. We find that models are surprisingly sensitive towards certain attributes even when the synthetic data prompts appear "neutral". which invites the question whether this sensitivity can be exploited for good. Our findings invite the question can we explicitly steer the models towards the properties we want at test time by exploiting the data generation process? This would have historically been considered infeasible due to the cost of collecting data with a specific characteristic or objective in mind. However, improvement in the quality of synthetic data, as well as a shift towards general-purpose models designed to follow a diverse way of instructions, means this question is timely. We propose active inheritance as a term to describe intentionally constraining synthetic data according to a non-differentiable objective. We demonstrate how active inheritance can steer the generation profiles of models towards desirable non-differentiable attributes, e.g. high lexical diversity or low toxicity.

Updated: 2024-07-19 10:45:21

标题: 看LLM，做LLM：引导数据生成以达到非可微目标

摘要: 合成数据的广泛采用引发了关于生成数据的模型如何通过精炼数据影响其他大型语言模型（LLMs）的新问题。首先，我们的工作通过系统地研究合成数据整合的后果，详尽地描述了模型属性的被动继承对模型的影响。我们提供了迄今为止关于合成数据来源如何塑造模型内部偏见、校准以及生成的文本属性和偏好的最全面的研究之一。我们发现，即使合成数据提示看起来“中立”，模型对某些属性也表现出惊人的敏感性，这引发了一个问题，即这种敏感性是否可以被利用为善。我们的发现引发了一个问题，即我们是否可以通过利用数据生成过程来明确引导模型在测试时朝着我们想要的属性发展？这在历史上被认为不可行，因为收集具有特定特征或目标的数据的成本很高。然而，合成数据质量的提高，以及转向设计以遵循多样化指令的通用模型，意味着这个问题现在是及时的。我们提出“主动继承”作为一个术语，用来描述根据不可微分目标有意地限制合成数据。我们展示了如何通过主动继承来引导模型的生成概况朝着理想的不可微分属性发展，例如高词汇多样性或低毒性。

更新时间: 2024-07-19 10:45:21

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.01490v2

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Self-attention in Transformers comes with a high computational cost because of their quadratic computational complexity, but their effectiveness in addressing problems in language and vision has sparked extensive research aimed at enhancing their efficiency. However, diverse experimental conditions, spanning multiple input domains, prevent a fair comparison based solely on reported results, posing challenges for model selection. To address this gap in comparability, we perform a large-scale benchmark of more than 45 models for image classification, evaluating key efficiency aspects, including accuracy, speed, and memory usage. Our benchmark provides a standardized baseline for efficiency-oriented transformers. We analyze the results based on the Pareto front -- the boundary of optimal models. Surprisingly, despite claims of other models being more efficient, ViT remains Pareto optimal across multiple metrics. We observe that hybrid attention-CNN models exhibit remarkable inference memory- and parameter-efficiency. Moreover, our benchmark shows that using a larger model in general is more efficient than using higher resolution images. Thanks to our holistic evaluation, we provide a centralized resource for practitioners and researchers, facilitating informed decisions when selecting or developing efficient transformers.

Updated: 2024-07-19 10:44:53

标题: 应该选择哪种Transformer：视觉Transformer效率的比较分析

摘要: Transformer中的自注意力机制具有较高的计算成本，因为其二次计算复杂度，但它们在语言和视觉问题上的有效性引发了大量研究，旨在提高其效率。然而，不同的实验条件涵盖多个输入领域，仅基于报告的结果进行公平比较存在挑战，对模型选择产生困难。为了弥补这一可比性差距，我们对45多个图像分类模型进行了大规模基准测试，评估关键的效率方面，包括准确性、速度和内存使用。我们的基准测试为面向效率的Transformer提供了标准化基准线。我们基于帕累托前沿（最优模型的边界）对结果进行分析。令人惊讶的是，尽管其他模型声称更高效，但ViT在多个指标上仍然是帕累托最优的。我们观察到，混合注意力-CNN模型展现出卓越的推理内存和参数效率。此外，我们的基准测试显示，通常情况下使用更大的模型比使用更高分辨率的图像更有效。由于我们的全面评估，我们为从业者和研究人员提供了一个集中的资源，促进在选择或开发高效Transformer时做出明智决策。

更新时间: 2024-07-19 10:44:53

领域: cs.CV,cs.AI,cs.LG,68T07,I.4.0; I.2.10; I.5.1

下载: http://arxiv.org/abs/2308.09372v3

LeKUBE: A Legal Knowledge Update BEnchmark

Recent advances in Large Language Models (LLMs) have significantly shaped the applications of AI in multiple fields, including the studies of legal intelligence. Trained on extensive legal texts, including statutes and legal documents, the legal LLMs can capture important legal knowledge/concepts effectively and provide important support for downstream legal applications such as legal consultancy. Yet, the dynamic nature of legal statutes and interpretations also poses new challenges to the use of LLMs in legal applications. Particularly, how to update the legal knowledge of LLMs effectively and efficiently has become an important research problem in practice. Existing benchmarks for evaluating knowledge update methods are mostly designed for the open domain and cannot address the specific challenges of the legal domain, such as the nuanced application of new legal knowledge, the complexity and lengthiness of legal regulations, and the intricate nature of legal reasoning. To address this gap, we introduce the Legal Knowledge Update BEnchmark, i.e. LeKUBE, which evaluates knowledge update methods for legal LLMs across five dimensions. Specifically, we categorize the needs of knowledge updates in the legal domain with the help of legal professionals, and then hire annotators from law schools to create synthetic updates to the Chinese Criminal and Civil Code as well as sets of questions of which the answers would change after the updates. Through a comprehensive evaluation of state-of-the-art knowledge update methods, we reveal a notable gap between existing knowledge update methods and the unique needs of the legal domain, emphasizing the need for further research and development of knowledge update mechanisms tailored for legal LLMs.

Updated: 2024-07-19 10:40:10

标题: LeKUBE：法律知识更新基准

摘要: 最近大规模语言模型（LLMs）的进展显著地塑造了人工智能在多个领域的应用，包括法律智能的研究。经过在大量法律文本上训练，包括法规和法律文件，法律LLMs能够有效捕捉重要的法律知识/概念，并为法律咨询等下游法律应用提供重要支持。然而，法律法规和解释的动态性也给LLMs在法律应用中的使用提出了新的挑战。特别是，如何有效和高效地更新LLMs的法律知识已成为实践中的一个重要研究问题。现有用于评估知识更新方法的基准大多设计用于开放领域，并不能解决法律领域的特定挑战，如新法律知识的微妙应用、法规的复杂性和冗长性，以及法律推理的复杂性。为了填补这一空白，我们引入了法律知识更新基准，即LeKUBE，它评估了法律LLMs的知识更新方法在五个维度上的表现。具体而言，我们借助法律专业人员将法律领域的知识更新需求进行分类，然后聘请法学院的注释员为中国刑法和民法创造合成更新，并提出一系列问题，这些问题的答案将在更新后发生变化。通过对最先进的知识更新方法进行全面评估，我们揭示了现有知识更新方法与法律领域独特需求之间的显著差距，强调了进一步研究和开发为法律LLMs量身定制的知识更新机制的必要性。

更新时间: 2024-07-19 10:40:10

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.14192v1

Evaluating the Impact of Different Quantum Kernels on the Classification Performance of Support Vector Machine Algorithm: A Medical Dataset Application

The support vector machine algorithm with a quantum kernel estimator (QSVM-Kernel), as a leading example of a quantum machine learning technique, has undergone significant advancements. Nevertheless, its integration with classical data presents unique challenges. While quantum computers primarily interact with data in quantum states, embedding classical data into quantum states using feature mapping techniques is essential for leveraging quantum algorithms Despite the recognized importance of feature mapping, its specific impact on data classification outcomes remains largely unexplored. This study addresses this gap by comprehensively assessing the effects of various feature mapping methods on classification results, taking medical data analysis as a case study. In this study, the QSVM-Kernel method was applied to classification problems in two different and publicly available medical datasets, namely, the Wisconsin Breast Cancer (original) and The Cancer Genome Atlas (TCGA) Glioma datasets. In the QSVM-Kernel algorithm, quantum kernel matrices obtained from 9 different quantum feature maps were used. Thus, the effects of these quantum feature maps on the classification results of the QSVM-Kernel algorithm were examined in terms of both classifier performance and total execution time. As a result, in the Wisconsin Breast Cancer (original) and TCGA Glioma datasets, when Rx and Ry rotational gates were used, respectively, as feature maps in the QSVM-Kernel algorithm, the best classification performances were achieved both in terms of classification performance and total execution time. The contributions of this study are that (1) it highlights the significant impact of feature mapping techniques on medical data classification outcomes using the QSVM-Kernel algorithm, and (2) it also guides undertaking research for improved QSVM classification performance.

Updated: 2024-07-19 10:37:03

标题: 评估不同量子核对支持向量机算法分类性能的影响：一种医学数据集应用

摘要: 采用量子核估计器（QSVM-Kernel）的支持向量机算法作为量子机器学习技术的主要示例，已经取得了显著进展。然而，其与经典数据的集成面临独特挑战。尽管量子计算机主要与量子状态的数据进行交互，但使用特征映射技术将经典数据嵌入量子状态对于利用量子算法至关重要。尽管人们已经认识到特征映射的重要性，但它对数据分类结果的具体影响仍然大部分未被探索。本研究通过全面评估各种特征映射方法对分类结果的影响，以医疗数据分析为案例，来填补这一空白。在这项研究中，QSVM-Kernel方法被应用于两个不同的公开医学数据集，分别是威斯康辛乳腺癌（原始）和癌症基因组图谱（TCGA）脑胶质瘤数据集的分类问题。在QSVM-Kernel算法中，使用了来自9种不同量子特征映射的量子核矩阵。因此，从分类器性能和总执行时间的角度探讨了这些量子特征映射对QSVM-Kernel算法的分类结果的影响。结果显示，在威斯康辛乳腺癌（原始）和TCGA脑胶质瘤数据集中，当分别使用Rx和Ry旋转门作为特征映射时，QSVM-Kernel算法实现了最佳的分类性能，无论是在分类性能还是总执行时间方面。本研究的贡献在于：（1）强调了特征映射技术对使用QSVM-Kernel算法进行医疗数据分类结果的显著影响，（2）指导进行改进QSVM分类性能的研究。

更新时间: 2024-07-19 10:37:03

领域: cs.LG,cs.AI,quant-ph

下载: http://arxiv.org/abs/2407.09930v2

Achieving Well-Informed Decision-Making in Drug Discovery: A Comprehensive Calibration Study using Neural Network-Based Structure-Activity Models

In the drug discovery process, where experiments can be costly and time-consuming, computational models that predict drug-target interactions are valuable tools to accelerate the development of new therapeutic agents. Estimating the uncertainty inherent in these neural network predictions provides valuable information that facilitates optimal decision-making when risk assessment is crucial. However, such models can be poorly calibrated, which results in unreliable uncertainty estimates that do not reflect the true predictive uncertainty. In this study, we compare different metrics, including accuracy and calibration scores, used for model hyperparameter tuning to investigate which model selection strategy achieves well-calibrated models. Furthermore, we propose to use a computationally efficient Bayesian uncertainty estimation method named Bayesian Linear Probing (BLP), which generates Hamiltonian Monte Carlo (HMC) trajectories to obtain samples for the parameters of a Bayesian Logistic Regression fitted to the hidden layer of the baseline neural network. We report that BLP improves model calibration and achieves the performance of common uncertainty quantification methods by combining the benefits of uncertainty estimation and probability calibration methods. Finally, we show that combining post hoc calibration method with well-performing uncertainty quantification approaches can boost model accuracy and calibration.

Updated: 2024-07-19 10:29:00

标题: 实现药物发现中的明智决策：使用基于神经网络的结构活性模型进行全面校准研究

摘要: 在药物发现过程中，实验可能昂贵且耗时，因此能够预测药物靶点相互作用的计算模型是加速新治疗药物开发的有价值工具。估计这些神经网络预测中固有不确定性提供了有价值的信息，有助于在风险评估至关重要时进行最佳决策。然而，这些模型可能校准不佳，导致不可靠的不确定性估计，无法反映真实的预测不确定性。在本研究中，我们比较了用于模型超参数调整的不同指标，包括准确性和校准分数，以研究哪种模型选择策略能够实现良好校准的模型。此外，我们建议使用一种计算高效的贝叶斯不确定性估计方法，名为贝叶斯线性探测（BLP），该方法生成哈密顿蒙特卡洛（HMC）轨迹，以获得贝叶斯逻辑回归的参数样本，该逻辑回归适用于基线神经网络的隐层。我们报告称BLP提高了模型的校准性，并通过结合不确定性估计和概率校准方法的优势，实现了常见不确定性量化方法的性能。最后，我们表明，将事后校准方法与性能良好的不确定性量化方法相结合，可以提高模型的准确性和校准性。

更新时间: 2024-07-19 10:29:00

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.14185v1

Relational Representation Distillation

Knowledge distillation (KD) is an effective method for transferring knowledge from a large, well-trained teacher model to a smaller, more efficient student model. Despite its success, one of the main challenges in KD is ensuring the efficient transfer of complex knowledge while maintaining the student's computational efficiency. Unlike previous works that applied contrastive objectives promoting explicit negative instances, we introduce Relational Representation Distillation (RRD). Our approach leverages pairwise similarities to explore and reinforce the relationships between the teacher and student models. Inspired by self-supervised learning principles, it uses a relaxed contrastive loss that focuses on similarity rather than exact replication. This method aligns the output distributions of teacher samples in a large memory buffer, improving the robustness and performance of the student model without the need for strict negative instance differentiation. Our approach demonstrates superior performance on CIFAR-100, outperforming traditional KD techniques and surpassing 13 state-of-the-art methods. It also transfers successfully to other datasets like Tiny ImageNet and STL-10. The code will be made public soon.

Updated: 2024-07-19 10:25:18

标题: 关系表示提取

摘要: 知识蒸馏（KD）是一种有效的方法，可以将大型、经过良好训练的教师模型的知识传递给更小、更高效的学生模型。尽管取得了成功，但在知识蒸馏中的一个主要挑战是确保复杂知识的有效传递，同时保持学生模型的计算效率。与之前应用对比目标促进显式负例的作品不同，我们引入了关系表示蒸馏（RRD）。我们的方法利用成对相似性来探索并加强教师和学生模型之间的关系。受自监督学习原则的启发，它使用了一个放松的对比损失，侧重于相似性而不是精确复制。这种方法调整了教师样本在大型内存缓冲区中的输出分布，提高了学生模型的鲁棒性和性能，而无需严格的负实例区分。我们的方法在CIFAR-100上表现出优越性能，优于传统的KD技术，并超过13种最先进的方法。它还成功地转移到其他数据集，如Tiny ImageNet和STL-10。代码将很快公开。

更新时间: 2024-07-19 10:25:18

领域: cs.CV,cs.AI,68T07,I.4; I.2

下载: http://arxiv.org/abs/2407.12073v2

On Policy Evaluation Algorithms in Distributional Reinforcement Learning

We introduce a novel class of algorithms to efficiently approximate the unknown return distributions in policy evaluation problems from distributional reinforcement learning (DRL). The proposed distributional dynamic programming algorithms are suitable for underlying Markov decision processes (MDPs) having an arbitrary probabilistic reward mechanism, including continuous reward distributions with unbounded support being potentially heavy-tailed. For a plain instance of our proposed class of algorithms we prove error bounds, both within Wasserstein and Kolmogorov--Smirnov distances. Furthermore, for return distributions having probability density functions the algorithms yield approximations for these densities; error bounds are given within supremum norm. We introduce the concept of quantile-spline discretizations to come up with algorithms showing promising results in simulation experiments. While the performance of our algorithms can rigorously be analysed they can be seen as universal black box algorithms applicable to a large class of MDPs. We also derive new properties of probability metrics commonly used in DRL on which our quantitative analysis is based.

Updated: 2024-07-19 10:06:01

标题: 关于分布式强化学习中的策略评估算法

摘要: 我们介绍了一类新颖的算法，用于有效地近似从分布式强化学习（DRL）中的策略评估问题中的未知回报分布。提出的分布式动态规划算法适用于具有任意概率奖励机制的基础马尔可夫决策过程（MDPs），包括具有无界支持且可能是重尾的连续奖励分布。对于我们提出的算法类的一个简单实例，我们证明了误差界限，分别在Wasserstein和Kolmogorov-Smirnov距离内。此外，对于具有概率密度函数的回报分布，算法产生这些密度的近似；在最优范数内给出误差界限。我们引入了分位数样条离散化的概念，提出了在模拟实验中表现出有希望结果的算法。虽然我们的算法的性能可以严格分析，但它们可以被看作是适用于大类MDPs的通用黑箱算法。我们还推导了分布式强化学习常用的概率度量的新属性，我们的定量分析就是基于这些属性的。

更新时间: 2024-07-19 10:06:01

领域: stat.ML,cs.LG,math.PR,60E05, 60H25 (Primary) 68T05, 90C40 (Secondary)

下载: http://arxiv.org/abs/2407.14175v1

On Maximum Entropy Linear Feature Inversion

We revisit the classical problem of inverting dimension-reducing linear mappings using the maximum entropy (MaxEnt) criterion. In the literature, solutions are problem-dependent, inconsistent, and use different entropy measures. We propose a new unified approach that not only specializes to the existing approaches, but offers solutions to new cases, such as when data values are constrained to [0, 1], which has new applications in machine learning.

Updated: 2024-07-19 09:52:18

标题: 关于最大熵线性特征反演

摘要: 我们重新审视使用最大熵（MaxEnt）准则来求解反演降维线性映射的经典问题。在文献中，解决方案依赖于具体问题，不一致，并且使用不同的熵度量。我们提出了一种新的统一方法，不仅能够专门化到现有方法，而且提供了对新情况的解决方案，例如当数据值受限于[0, 1]时，这在机器学习中有新的应用。

更新时间: 2024-07-19 09:52:18

领域: cs.LG

下载: http://arxiv.org/abs/2407.14166v1

DeepHGCN: Toward Deeper Hyperbolic Graph Convolutional Networks

Hyperbolic graph convolutional networks (HGCNs) have demonstrated significant potential in extracting information from hierarchical graphs. However, existing HGCNs are limited to shallow architectures due to the computational expense of hyperbolic operations and the issue of over-smoothing as depth increases. Although treatments have been applied to alleviate over-smoothing in GCNs, developing a hyperbolic solution presents distinct challenges since operations must be carefully designed to fit the hyperbolic nature. Addressing these challenges, we propose DeepHGCN, the first deep multi-layer HGCN architecture with dramatically improved computational efficiency and substantially reduced over-smoothing. DeepHGCN features two key innovations: (1) a novel hyperbolic feature transformation layer that enables fast and accurate linear mappings, and (2) techniques such as hyperbolic residual connections and regularization for both weights and features, facilitated by an efficient hyperbolic midpoint method. Extensive experiments demonstrate that DeepHGCN achieves significant improvements in link prediction and node classification tasks compared to both Euclidean and shallow hyperbolic GCN variants.

Updated: 2024-07-19 09:51:59

标题: DeepHGCN：朝向更深的双曲图卷积网络

摘要: 超几何图卷积网络（HGCNs）已经展示出在从分层图中提取信息方面具有显著潜力。然而，由于超几何运算的计算成本和随深度增加而出现的过度平滑问题，现有的HGCNs仅限于浅层架构。尽管已经应用了一些方法来缓解GCNs中的过度平滑问题，但是开发一个超几何解决方案面临着独特的挑战，因为操作必须经过精心设计以适应超几何特性。为了解决这些挑战，我们提出了DeepHGCN，这是第一个具有显着提高计算效率和大幅减少过度平滑的深度多层HGCN架构。DeepHGCN具有两个关键创新：（1）一种新颖的超几何特征转换层，可以实现快速准确的线性映射，以及（2）一些技术，如超几何残差连接和对权重和特征进行正则化，通过高效的超几何中点方法实现。大量实验表明，与欧几里得和浅层超几何GCN变体相比，DeepHGCN在链接预测和节点分类任务中取得了显著的改进。

更新时间: 2024-07-19 09:51:59

领域: cs.LG

下载: http://arxiv.org/abs/2310.02027v4

Debiasing surgeon: fantastic weights and how to find them

Nowadays an ever-growing concerning phenomenon, the emergence of algorithmic biases that can lead to unfair models, emerges. Several debiasing approaches have been proposed in the realm of deep learning, employing more or less sophisticated approaches to discourage these models from massively employing these biases. However, a question emerges: is this extra complexity really necessary? Is a vanilla-trained model already embodying some ``unbiased sub-networks'' that can be used in isolation and propose a solution without relying on the algorithmic biases? In this work, we show that such a sub-network typically exists, and can be extracted from a vanilla-trained model without requiring additional training. We further validate that such specific architecture is incapable of learning a specific bias, suggesting that there are possible architectural countermeasures to the problem of biases in deep neural networks.

Updated: 2024-07-19 09:50:51

标题: 去偏见外科医生：神奇的权重及其发现方法

摘要: 如今，一个日益令人关切的现象是算法偏见的出现，这可能导致不公平的模型出现。在深度学习领域已经提出了几种去偏见的方法，采用更或者更少复杂的方法来阻止这些模型大规模地应用这些偏见。然而，一个问题出现了：这种额外的复杂性真的是必要的吗？一个普通训练的模型已经包含了一些“无偏子网络”，可以单独使用并提出一个不依赖算法偏见的解决方案吗？在这项工作中，我们展示了这样的一个子网络通常存在，并且可以从一个普通训练的模型中提取出来，而无需进行额外的训练。我们进一步验证了这种特定的架构无法学习特定的偏见，这表明在深度神经网络中存在可能的架构对抗措施来解决偏见问题。

更新时间: 2024-07-19 09:50:51

领域: cs.LG,cs.AI,cs.CV,cs.CY

下载: http://arxiv.org/abs/2403.14200v2

Machine learning emulation of precipitation from km-scale regional climate simulations using a diffusion model

High-resolution climate simulations are very valuable for understanding climate change impacts and planning adaptation measures. This has motivated use of regional climate models at sufficiently fine resolution to capture important small-scale atmospheric processes, such as convective storms. However, these regional models have very high computational costs, limiting their applicability. We present CPMGEM, a novel application of a generative machine learning model, a diffusion model, to skilfully emulate precipitation simulations from such a high-resolution model over England and Wales at much lower cost. This emulator enables stochastic generation of high-resolution (8.8km), daily-mean precipitation samples conditioned on coarse-resolution (60km) weather states from a global climate model. The output is fine enough for use in applications such as flood inundation modelling. The emulator produces precipitation predictions with realistic intensities and spatial structures and captures most of the 21st century climate change signal. We show evidence that the emulator has skill for extreme events up to and including 1-in-100 year intensities. Potential applications include producing high-resolution precipitation predictions for large-ensemble climate simulations and downscaling different climate models and climate change scenarios to better sample uncertainty in climate changes at local-scale.

Updated: 2024-07-19 09:42:20

标题: 机器学习模拟公里尺度区域气候模拟的降水，使用扩散模型

摘要: 高分辨率气候模拟对于理解气候变化影响和规划适应措施非常有价值。这促使人们使用区域气候模型以足够精细的分辨率捕捉重要的小尺度大气过程，如对流风暴。然而，这些区域模型具有非常高的计算成本，限制了它们的适用性。我们提出了CPMGEM，这是一种新颖的应用生成式机器学习模型，即扩散模型，可以巧妙地模拟来自这种高分辨率模型的英格兰和威尔士地区的降水模拟，成本要低得多。这个仿真器能够在全球气候模型的粗分辨率（60公里）天气状态的条件下，随机生成高分辨率（8.8公里）的日均降水样本。输出足够精细，可用于洪水淹没建模等应用。该仿真器产生的降水预测具有现实的强度和空间结构，并捕捉到了大部分21世纪气候变化信号。我们展示了仿真器对极端事件（包括1/100年强度）的技能证据。潜在应用包括为大量气候模拟生成高分辨率降水预测，以及将不同气候模型和气候变化情景进行降尺度处理，以更好地抽样本地尺度气候变化的不确定性。

更新时间: 2024-07-19 09:42:20

领域: physics.ao-ph,cs.LG,J.2

下载: http://arxiv.org/abs/2407.14158v1

Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning

Human-centered dynamic scene understanding plays a pivotal role in enhancing the capability of robotic and autonomous systems, in which Video-based Human-Object Interaction (V-HOI) detection is a crucial task in semantic scene understanding, aimed at comprehensively understanding HOI relationships within a video to benefit the behavioral decisions of mobile robots and autonomous driving systems. Although previous V-HOI detection models have made significant strides in accurate detection on specific datasets, they still lack the general reasoning ability like human beings to effectively induce HOI relationships. In this study, we propose V-HOI Multi-LLMs Collaborated Reasoning (V-HOI MLCR), a novel framework consisting of a series of plug-and-play modules that could facilitate the performance of current V-HOI detection models by leveraging the strong reasoning ability of different off-the-shelf pre-trained large language models (LLMs). We design a two-stage collaboration system of different LLMs for the V-HOI task. Specifically, in the first stage, we design a Cross-Agents Reasoning scheme to leverage the LLM conduct reasoning from different aspects. In the second stage, we perform Multi-LLMs Debate to get the final reasoning answer based on the different knowledge in different LLMs. Additionally, we devise an auxiliary training strategy that utilizes CLIP, a large vision-language model to enhance the base V-HOI models' discriminative ability to better cooperate with LLMs. We validate the superiority of our design by demonstrating its effectiveness in improving the prediction accuracy of the base V-HOI model via reasoning from multiple perspectives.

Updated: 2024-07-19 09:38:18

标题: 通过多个LLMs协同推理增强以人为中心的动态场景理解

摘要: 人类中心的动态场景理解在增强机器人和自主系统的能力方面发挥着关键作用，其中基于视频的人-物互动（V-HOI）检测是语义场景理解中的关键任务，旨在全面理解视频中的HOI关系，以有利于移动机器人和自动驾驶系统的行为决策。尽管先前的V-HOI检测模型在特定数据集上取得了重大进展，但它们仍然缺乏像人类那样的一般推理能力，无法有效诱导HOI关系。在本研究中，我们提出了一种新颖的框架V-HOI Multi-LLMs Collaborated Reasoning（V-HOI MLCR），该框架由一系列即插即用的模块组成，可以通过利用不同预训练大型语言模型（LLMs）的强大推理能力来促进当前V-HOI检测模型的性能。我们为V-HOI任务设计了一个两阶段的不同LLMs协作系统。具体而言，在第一阶段，我们设计了一个交叉代理推理方案，利用LLM从不同角度进行推理。在第二阶段，我们进行多LLMs辩论，基于不同LLMs中的不同知识得出最终推理答案。此外，我们设计了一种辅助训练策略，利用CLIP，一个大型视觉语言模型，来增强基础V-HOI模型的区分能力，以更好地与LLMs合作。我们通过从多个角度推理来提高基础V-HOI模型的预测准确性，验证了我们设计的优越性。

更新时间: 2024-07-19 09:38:18

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2403.10107v2

Multi-View Symbolic Regression

Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behavior, recovering known expressions from the literature as well as promising alternatives, thus enabling the use of SR to a large range of experimental scenarios.

Updated: 2024-07-19 09:37:48

标题: 多视图符号回归

摘要: 符号回归（SR）是在一组解释性和响应变量之间寻找表示关系的分析表达式。目前的SR方法假定从单个实验中提取出单个数据集。然而，研究人员经常面临来自使用不同设置进行的实验获得的多组结果。传统的SR方法可能无法找到潜在的表达式，因为每个实验的参数可能不同。在这项工作中，我们提出了多视图符号回归（MvSR），同时考虑多个数据集，模拟实验环境，并输出一个通用的参数化解决方案。该方法将评估的表达式拟合到每个独立数据集，并同时返回一个能够准确拟合所有数据集的参数化函数族f（x；θ）。我们使用从已知表达式生成的数据以及来自天文学、化学和经济领域的真实数据展示了MvSR的有效性，其中先验的分析表达式不可用。结果显示，MvSR更频繁地获得正确的表达式，并且对超参数变化具有鲁棒性。在真实数据中，它能够把握群体行为，从文献中恢复已知的表达式以及有前途的替代方案，从而使SR能够应用于各种实验场景。

更新时间: 2024-07-19 09:37:48

领域: cs.LG,astro-ph.IM,stat.AP

下载: http://arxiv.org/abs/2402.04298v3

SparQ Attention: Bandwidth-Efficient LLM Inference

The computational difficulties of large language model (LLM) inference remain a significant obstacle to their widespread deployment. The need for many applications to support long input sequences and process them in large batches typically causes token-generation to be bottlenecked by data transfer. For this reason, we introduce SparQ Attention, a technique for increasing the inference throughput of LLMs by utilising memory bandwidth more efficiently within the attention layers, through selective fetching of the cached history. Our proposed technique can be applied directly to off-the-shelf LLMs during inference, without requiring any modification to the pre-training setup or additional fine-tuning. We show that SparQ Attention brings up to 8x savings in attention data transfers without substantial drops in accuracy, by evaluating Llama 2 and 3, Mistral, Gemma and Pythia models on a wide range of downstream tasks.

Updated: 2024-07-19 09:37:19

标题: SparQ注意力：带宽高效的LLM推断

摘要: 大型语言模型（LLM）推理的计算困难仍然是它们广泛部署的重要障碍。许多应用程序需要支持长输入序列并以大批量处理它们，通常导致令牌生成受到数据传输的限制。因此，我们引入了SparQ Attention，这是一种通过在注意力层内更有效地利用内存带宽来增加LLM推理吞吐量的技术，通过选择性提取缓存的历史记录。我们提出的技术可以直接应用于推理过程中的现成LLM，而无需对预训练设置或额外微调进行任何修改。我们展示了SparQ Attention在广泛的下游任务上评估Llama 2和3、Mistral、Gemma和Pythia模型时，可实现高达8倍的注意力数据传输节省，而精度下降不明显。

更新时间: 2024-07-19 09:37:19

领域: cs.LG

下载: http://arxiv.org/abs/2312.04985v5

Where is the Testbed for my Federated Learning Research?

Progressing beyond centralized AI is of paramount importance, yet, distributed AI solutions, in particular various federated learning (FL) algorithms, are often not comprehensively assessed, which prevents the research community from identifying the most promising approaches and practitioners from being convinced that a certain solution is deployment-ready. The largest hurdle towards FL algorithm evaluation is the difficulty of conducting real-world experiments over a variety of FL client devices and different platforms, with different datasets and data distribution, all while assessing various dimensions of algorithm performance, such as inference accuracy, energy consumption, and time to convergence, to name a few. In this paper, we present CoLExT, a real-world testbed for FL research. CoLExT is designed to streamline experimentation with custom FL algorithms in a rich testbed configuration space, with a large number of heterogeneous edge devices, ranging from single-board computers to smartphones, and provides real-time collection and visualization of a variety of metrics through automatic instrumentation. According to our evaluation, porting FL algorithms to CoLExT requires minimal involvement from the developer, and the instrumentation introduces minimal resource usage overhead. Furthermore, through an initial investigation involving popular FL algorithms running on CoLExT, we reveal previously unknown trade-offs, inefficiencies, and programming bugs.

Updated: 2024-07-19 09:34:04

标题: 我的联邦学习研究的测试平台在哪里？

摘要: 超越集中式人工智能发展至关重要，然而，分布式人工智能解决方案，特别是各种联邦学习（FL）算法，往往没有得到全面评估，这阻碍了研究社区确定最有前途的方法，并让从业者相信某个解决方案已经准备就绪。对FL算法评估的最大障碍是在各种FL客户端设备和不同平台上进行真实世界实验的困难，这些设备具有不同的数据集和数据分布，同时评估算法性能的各个方面，如推断准确性、能源消耗和收敛时间等。在本文中，我们介绍了CoLExT，一个用于FL研究的真实世界实验平台。CoLExT旨在简化在丰富的测试环境配置空间中使用自定义FL算法进行实验，涵盖了大量异构边缘设备，从单板计算机到智能手机，并通过自动仪器实现对各种指标的实时收集和可视化。根据我们的评估，将FL算法移植到CoLExT需要开发人员的最少参与，并且仪器的使用对资源开销几乎没有影响。此外，通过对在CoLExT上运行的流行FL算法进行初步调查，我们揭示了以前未知的权衡、低效和编程错误。

更新时间: 2024-07-19 09:34:04

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2407.14154v1

A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs A2C

This study conducts a comparative analysis of three advanced Deep Reinforcement Learning models: Deep Q-Networks (DQN), Proximal Policy Optimization (PPO), and Advantage Actor-Critic (A2C), within the BreakOut Atari game environment. Our research assesses the performance and effectiveness of these models in a controlled setting. Through rigorous experimentation, we examine each model's learning efficiency, strategy development, and adaptability under dynamic game conditions. The findings provide critical insights into the practical applications of these models in game-based learning environments and contribute to the broader understanding of their capabilities. The code is publicly available at github.com/Neilus03/DRL_comparative_study.

Updated: 2024-07-19 09:29:38

标题: 深度强化学习模型的比较研究：DQN vs PPO vs A2C

摘要: 本研究在BreakOut Atari游戏环境中对三种先进的深度强化学习模型进行了比较分析：Deep Q-Networks（DQN）、Proximal Policy Optimization（PPO）和Advantage Actor-Critic（A2C）。我们的研究评估了这些模型在受控环境中的性能和有效性。通过严格的实验，我们研究了每个模型在动态游戏条件下的学习效率、策略开发和适应性。研究结果为这些模型在基于游戏的学习环境中的实际应用提供了重要见解，并有助于更广泛地了解它们的能力。代码可在github.com/Neilus03/DRL_comparative_study上公开获取。

更新时间: 2024-07-19 09:29:38

领域: cs.LG

下载: http://arxiv.org/abs/2407.14151v1

Agent-driven Generative Semantic Communication with Cross-Modality and Prediction

In the era of 6G, with compelling visions of intelligent transportation systems and digital twins, remote surveillance is poised to become a ubiquitous practice. Substantial data volume and frequent updates present challenges in wireless networks. To address these challenges, we propose a novel agent-driven generative semantic communication (A-GSC) framework based on reinforcement learning. In contrast to the existing research on semantic communication (SemCom), which mainly focuses on either semantic extraction or semantic sampling, we seamlessly integrate both by jointly considering the intrinsic attributes of source information and the contextual information regarding the task. Notably, the introduction of generative artificial intelligence (GAI) enables the independent design of semantic encoders and decoders. In this work, we develop an agent-assisted semantic encoder with cross-modality capability, which can track the semantic changes, channel condition, to perform adaptive semantic extraction and sampling. Accordingly, we design a semantic decoder with both predictive and generative capabilities, consisting of two tailored modules. Moreover, the effectiveness of the designed models has been verified using the UA-DETRAC dataset, demonstrating the performance gains of the overall A-GSC framework in both energy saving and reconstruction accuracy.

Updated: 2024-07-19 09:28:54

标题: 基于代理的生成语义交流与跨模态和预测

摘要: 在6G时代，智能交通系统和数字孪生等引人注目的愿景下，远程监视正准备成为一种普遍的实践。大量数据和频繁更新在无线网络中提出了挑战。为了解决这些挑战，我们提出了一种基于强化学习的新型代理驱动生成语义通信（A-GSC）框架。与现有的语义通信（SemCom）研究不同，后者主要集中在语义提取或语义抽样，我们通过联合考虑源信息的内在属性和任务相关的上下文信息来无缝地整合两者。值得注意的是，生成人工智能（GAI）的引入使得语义编码器和解码器可以独立设计。在这项工作中，我们开发了一个具有跨模态能力的代理辅助语义编码器，可以跟踪语义变化、通道条件，执行自适应的语义提取和抽样。因此，我们设计了一个具有预测和生成能力的语义解码器，由两个定制模块组成。此外，通过使用UA-DETRAC数据集验证了设计模型的有效性，证明了整体A-GSC框架在节能和重建准确性方面的性能提升。

更新时间: 2024-07-19 09:28:54

领域: cs.NI,cs.LG

下载: http://arxiv.org/abs/2404.06997v2

Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate

Generalization remains a central challenge in machine learning. In this work, we propose Learning from Teaching (LoT), a novel regularization technique for deep neural networks to enhance generalization. Inspired by the human ability to capture concise and abstract patterns, we hypothesize that generalizable correlations are expected to be easier to imitate. LoT operationalizes this concept to improve generalization of the main model with auxiliary student learners. The student learners are trained by the main model and, in turn, provide feedback to help the main model capture more generalizable and imitable correlations. Our experimental results across several domains, including Computer Vision, Natural Language Processing, and methodologies like Reinforcement Learning, demonstrate that the introduction of LoT brings significant benefits compared to training models on the original dataset. The results suggest the effectiveness and efficiency of LoT in identifying generalizable information at the right scales while discarding spurious data correlations, thus making LoT a valuable addition to current machine learning. Code is available at https://github.com/jincan333/LoT.

Updated: 2024-07-19 09:26:30

标题: 学习教学规范化：易于模仿的可推广相关性

摘要: 泛化仍然是机器学习中的一项核心挑战。在这项工作中，我们提出了学习教学（LoT），这是一种用于增强深度神经网络泛化能力的新型正则化技术。受人类捕捉简洁和抽象模式的能力启发，我们假设可泛化的相关性应该更容易模仿。LoT将这一概念具体化，通过辅助学生学习者来改善主模型的泛化能力。学生学习者由主模型进行训练，并反过来提供反馈，帮助主模型捕捉更多可泛化和可模仿的相关性。我们在多个领域，包括计算机视觉、自然语言处理和强化学习等方法学中的实验结果表明，引入LoT相比在原始数据集上训练模型带来了显著的好处。结果表明了LoT在识别适当规模的可泛化信息并丢弃虚假数据相关性方面的有效性和效率，从而使LoT成为当前机器学习中有价值的补充。代码可在https://github.com/jincan333/LoT 上找到。

更新时间: 2024-07-19 09:26:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.02769v2

PassTSL: Modeling Human-Created Passwords through Two-Stage Learning

Textual passwords are still the most widely used user authentication mechanism. Due to the close connections between textual passwords and natural languages, advanced technologies in natural language processing (NLP) and machine learning (ML) could be used to model passwords for different purposes such as studying human password-creation behaviors and developing more advanced password cracking methods for informing better defence mechanisms. In this paper, we propose PassTSL (modeling human-created Passwords through Two-Stage Learning), inspired by the popular pretraining-finetuning framework in NLP and deep learning (DL). We report how different pretraining settings affected PassTSL and proved its effectiveness by applying it to six large leaked password databases. Experimental results showed that it outperforms five state-of-the-art (SOTA) password cracking methods on password guessing by a significant margin ranging from 4.11% to 64.69% at the maximum point. Based on PassTSL, we also implemented a password strength meter (PSM), and our experiments showed that it was able to estimate password strength more accurately, causing fewer unsafe errors (overestimating the password strength) than two other SOTA PSMs when they produce the same rate of safe errors (underestimating the password strength): a neural-network based method and zxcvbn. Furthermore, we explored multiple finetuning settings, and our evaluations showed that, even a small amount of additional training data, e.g., only 0.1% of the pretrained data, can lead to over 3% improvement in password guessing on average. We also proposed a heuristic approach to selecting finetuning passwords based on JS (Jensen-Shannon) divergence and experimental results validated its usefulness. In summary, our contributions demonstrate the potential and feasibility of applying advanced NLP and ML methods to password modeling and cracking.

Updated: 2024-07-19 09:23:30

标题: PassTSL：通过两阶段学习建模人类创建的密码

摘要: 文本密码仍然是最广泛使用的用户身份验证机制。由于文本密码与自然语言之间的密切联系，自然语言处理（NLP）和机器学习（ML）领域的先进技术可以用于为不同目的建模密码，例如研究人类密码创建行为并开发更先进的密码破解方法，以提高防御机制。在本文中，我们提出了PassTSL（通过两阶段学习对人类创建的密码进行建模），受到自然语言处理（NLP）和深度学习（DL）中流行的预训练微调框架的启发。我们报告了不同的预训练设置如何影响PassTSL，并通过将其应用于六个大型泄露的密码数据库来证明其有效性。实验结果表明，在密码猜测方面，PassTSL在最大点处的表现优于五种最先进的密码破解方法，优势范围从4.11％到64.69％。基于PassTSL，我们还实现了一个密码强度仪（PSM），我们的实验表明，它能够更准确地估计密码强度，比其他两种最先进的PSM方法产生更少的不安全错误（高估密码强度），而它们产生相同比例的安全错误（低估密码强度）：一个基于神经网络的方法和zxcvbn。此外，我们探索了多个微调设置，我们的评估结果显示，即使是少量额外的训练数据，例如，仅占预训练数据的0.1％，平均可以导致密码猜测的3％以上的改进。我们还提出了一种基于JS（Jensen-Shannon）散度的启发式方法来选择微调密码，实验结果验证了其实用性。总之，我们的贡献展示了将先进的NLP和ML方法应用于密码建模和破解的潜力和可行性。

更新时间: 2024-07-19 09:23:30

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2407.14145v1

Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations

Ordinary differential equations (ODEs) are widely used to describe dynamical systems in science, but identifying parameters that explain experimental measurements is challenging. In particular, although ODEs are differentiable and would allow for gradient-based parameter optimization, the nonlinear dynamics of ODEs often lead to many local minima and extreme sensitivity to initial conditions. We therefore propose diffusion tempering, a novel regularization technique for probabilistic numerical methods which improves convergence of gradient-based parameter optimization in ODEs. By iteratively reducing a noise parameter of the probabilistic integrator, the proposed method converges more reliably to the true parameters. We demonstrate that our method is effective for dynamical systems of different complexity and show that it obtains reliable parameter estimates for a Hodgkin-Huxley model with a practically relevant number of parameters.

Updated: 2024-07-19 09:21:04

标题: 扩散回火改善常微分方程概率积分器的参数估计

摘要: 普通微分方程（ODEs）被广泛用于描述科学中的动态系统，但确定解释实验测量的参数是具有挑战性的。特别是，虽然ODEs是可微的并且允许基于梯度的参数优化，但ODEs的非线性动态通常导致许多局部最小值和对初始条件极其敏感。因此，我们提出了扩散调控，这是一种新颖的正则化技术，用于概率数值方法，可改善ODEs中基于梯度的参数优化的收敛性。通过迭代减少概率积分器的噪声参数，所提出的方法更可靠地收敛到真实参数。我们证明我们的方法对于不同复杂性的动态系统是有效的，并展示它对于Hodgkin-Huxley模型的参数估计是可靠的，该模型具有实际相关的参数数量。

更新时间: 2024-07-19 09:21:04

领域: cs.LG

下载: http://arxiv.org/abs/2402.12231v5

Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion

Class-incremental learning is a challenging problem, where the goal is to train a model that can classify data from an increasing number of classes over time. With the advancement of vision-language pre-trained models such as CLIP, they demonstrate good generalization ability that allows them to excel in class-incremental learning with completely frozen parameters. However, further adaptation to downstream tasks by simply fine-tuning the model leads to severe forgetting. Most existing works with pre-trained models assume that the forgetting of old classes is uniform when the model acquires new knowledge. In this paper, we propose a method named Adaptive Representation Adjustment and Parameter Fusion (RAPF). During training for new data, we measure the influence of new classes on old ones and adjust the representations, using textual features. After training, we employ a decomposed parameter fusion to further mitigate forgetting during adapter module fine-tuning. Experiments on several conventional benchmarks show that our method achieves state-of-the-art results. Our code is available at \url{https://github.com/linlany/RAPF}.

Updated: 2024-07-19 09:20:33

标题: 使用CLIP进行类增量学习：自适应表示调整和参数融合

摘要: 类增量学习是一个具有挑战性的问题，其目标是训练一个模型，可以随着时间逐渐增加的类别对数据进行分类。随着视觉语言预训练模型（如CLIP）的进步，它们展示了良好的泛化能力，使它们在完全冻结参数的情况下在类增量学习中表现出色。然而，通过简单微调模型进一步适应下游任务会导致严重遗忘。大多数现有的使用预训练模型的工作假设，当模型获得新知识时，旧类别的遗忘是均匀的。在本文中，我们提出了一种名为自适应表示调整和参数融合（RAPF）的方法。在为新数据训练期间，我们通过使用文本特征测量新类别对旧类别的影响并调整表示。训练结束后，我们采用分解的参数融合来进一步减轻适配器模块微调期间的遗忘。在多个传统基准测试中的实验证实，我们的方法取得了最先进的结果。我们的代码可在\url{https://github.com/linlany/RAPF}上找到。

更新时间: 2024-07-19 09:20:33

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.14143v1

Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation

Class incremental semantic segmentation aims to preserve old knowledge while learning new tasks, however, it is impeded by catastrophic forgetting and background shift issues. Prior works indicate the pivotal importance of initializing new classifiers and mainly focus on transferring knowledge from the background classifier or preparing classifiers for future classes, neglecting the flexibility and variance of new classifiers. In this paper, we propose a new classifier pre-tuning~(NeST) method applied before the formal training process, learning a transformation from old classifiers to generate new classifiers for initialization rather than directly tuning the parameters of new classifiers. Our method can make new classifiers align with the backbone and adapt to the new data, preventing drastic changes in the feature extractor when learning new classes. Besides, we design a strategy considering the cross-task class similarity to initialize matrices used in the transformation, helping achieve the stability-plasticity trade-off. Experiments on Pascal VOC 2012 and ADE20K datasets show that the proposed strategy can significantly improve the performance of previous methods. The code is available at \url{https://github.com/zhengyuan-xie/ECCV24_NeST}.

Updated: 2024-07-19 09:19:29

标题: 早期准备是明智的：新的分类器预调节用于类增量语义分割

摘要: 类别增量语义分割旨在在学习新任务的同时保留旧知识，然而，它受到灾难性遗忘和背景偏移问题的阻碍。先前的研究表明初始化新分类器的关键重要性，并主要集中在从背景分类器转移知识或为未来类准备分类器，忽视了新分类器的灵活性和差异性。在本文中，我们提出了一种新的分类器预调整（NeST）方法，应用于正式训练过程之前，学习从旧分类器生成新分类器的转换，而不是直接调整新分类器的参数。我们的方法可以使新分类器与骨干对齐并适应新数据，防止在学习新类时特征提取器发生剧烈变化。此外，我们设计了一种考虑跨任务类别相似性的策略，用于初始化转换中使用的矩阵，帮助实现稳定性和可塑性的权衡。在Pascal VOC 2012和ADE20K数据集上的实验证明，所提出的策略可以显著提高以前方法的性能。该代码可在\url{https://github.com/zhengyuan-xie/ECCV24_NeST}上找到。

更新时间: 2024-07-19 09:19:29

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.14142v1

Comparing and Contrasting Deep Learning Weather Prediction Backbones on Navier-Stokes and Atmospheric Dynamics

Remarkable progress in the development of Deep Learning Weather Prediction (DLWP) models positions them to become competitive with traditional numerical weather prediction (NWP) models. Indeed, a wide number of DLWP architectures -- based on various backbones, including U-Net, Transformer, Graph Neural Network (GNN), and Fourier Neural Operator (FNO) -- have demonstrated their potential at forecasting atmospheric states. However, due to differences in training protocols, forecast horizons, and data choices, it remains unclear which (if any) of these methods and architectures are most suitable for weather forecasting and for future model development. Here, we step back and provide a detailed empirical analysis, under controlled conditions, comparing and contrasting the most prominent DLWP models, along with their backbones. We accomplish this by predicting synthetic two-dimensional incompressible Navier-Stokes and real-world global weather dynamics. In terms of accuracy, memory consumption, and runtime, our results illustrate various tradeoffs. For example, on synthetic data, we observe favorable performance of FNO; and on the real-world WeatherBench dataset, our results demonstrate the suitability of ConvLSTM and SwinTransformer for short-to-mid-ranged forecasts. For long-ranged weather rollouts of up to 365 days, we observe superior stability and physical soundness in architectures that formulate a spherical data representation, i.e., GraphCast and Spherical FNO. In addition, we observe that all of these model backbones ``saturate,'' i.e., none of them exhibit so-called neural scaling, which highlights an important direction for future work on these and related models.

Updated: 2024-07-19 08:59:00

标题: 比较和对比Navier-Stokes和大气动力学上的深度学习天气预测骨干网络

摘要: 深度学习天气预测（DLWP）模型的发展取得了显著进展，使它们有望与传统的数值天气预测（NWP）模型竞争。事实上，基于各种骨干结构，包括U-Net、Transformer、图神经网络（GNN）和傅里叶神经算子（FNO）等，许多DLWP架构已经展现出了在预测大气状态方面的潜力。然而，由于训练协议、预测时间范围和数据选择的差异，目前尚不清楚这些方法和架构中哪种（如果有的话）最适合天气预测和未来模型发展。在这里，我们退后一步，在受控条件下进行了详细的实证分析，比较和对比了最突出的DLWP模型及其骨干结构。我们通过预测合成的二维不可压缩Navier-Stokes方程和真实的全球天气动态来实现这一点。就准确性、内存消耗和运行时间而言，我们的结果展示了各种权衡。例如，在合成数据上，我们观察到FNO的良好性能；在真实的WeatherBench数据集上，我们的结果表明ConvLSTM和SwinTransformer适用于短中期预测。对长期天气预测（最长365天）的展望中，我们发现了在制定球形数据表示的架构中具有卓越稳定性和物理合理性，即GraphCast和球形FNO。此外，我们观察到所有这些模型骨干都“饱和”，即它们都没有展现所谓的神经缩放，这强调了未来关于这些模型和相关模型的重要方向。

更新时间: 2024-07-19 08:59:00

领域: cs.LG

下载: http://arxiv.org/abs/2407.14129v1

zIA: a GenAI-powered local auntie assists tourists in Italy

The Tourism and Destination Management Organization (DMO) industry is rapidly evolving to adapt to new technologies and traveler expectations. Generative Artificial Intelligence (AI) offers an astonishing and innovative opportunity to enhance the tourism experience by providing personalized, interactive and engaging assistance. In this article, we propose a generative AI-based chatbot for tourism assistance. The chatbot leverages AI ability to generate realistic and creative texts, adopting the friendly persona of the well-known Italian all-knowledgeable aunties, to provide tourists with personalized information, tailored and dynamic pre, during and post recommendations and trip plans and personalized itineraries, using both text and voice commands, and supporting different languages to satisfy Italian and foreign tourists expectations. This work is under development in the Molise CTE research project, funded by the Italian Minister of the Economic Growth (MIMIT), with the aim to leverage the best emerging technologies available, such as Cloud and AI to produce state of the art solutions in the Smart City environment.

Updated: 2024-07-19 08:56:25

标题: zIA：一位由GenAI驱动的当地阿姨帮助意大利游客

摘要: 旅游和目的地管理组织（DMO）行业正在迅速发展，以适应新技术和旅行者的期望。生成式人工智能（AI）为提供个性化、互动和吸引人的协助，增强了旅游体验提供了惊人和创新的机会。在本文中，我们提出了一个基于生成式AI的旅游协助聊天机器人。该聊天机器人利用AI生成现实和创意文本的能力，采用知名意大利全知全能的阿姨们友好的形象，为游客提供个性化信息，定制和动态的旅行建议和行程计划以及个性化的行程，使用文本和语音命令，支持不同的语言以满足意大利和外国游客的期望。这项工作正在莫利塞CTE研究项目中开展，该项目由意大利经济增长部（MIMIT）资助，旨在利用最先进的技术，如云和AI，在智能城市环境中提供最先进的解决方案。

更新时间: 2024-07-19 08:56:25

领域: cs.DC,cs.AI

下载: http://arxiv.org/abs/2407.11830v2

SPADE: Sparsity-Guided Debugging for Deep Neural Networks

It is known that sparsity can improve interpretability for deep neural networks. However, existing methods in the area either require networks that are pre-trained with sparsity constraints, or impose sparsity after the fact, altering the network's general behavior. In this paper, we demonstrate, for the first time, that sparsity can instead be incorporated into the interpretation process itself, as a sample-specific preprocessing step. Unlike previous work, this approach, which we call SPADE, does not place constraints on the trained model and does not affect its behavior during inference on the sample. Given a trained model and a target sample, SPADE uses sample-targeted pruning to provide a "trace" of the network's execution on the sample, reducing the network to the most important connections prior to computing an interpretation. We demonstrate that preprocessing with SPADE significantly increases the accuracy of image saliency maps across several interpretability methods. Additionally, SPADE improves the usefulness of neuron visualizations, aiding humans in reasoning about network behavior. Our code is available at https://github.com/IST-DASLab/SPADE.

Updated: 2024-07-19 08:53:18

标题: SPADE：用于深度神经网络的稀疏引导调试

摘要: 已知，稀疏性可以提高深度神经网络的可解释性。然而，在这一领域现有的方法要么需要预先受到稀疏性约束的网络，要么在事后施加稀疏性，改变网络的一般行为。在本文中，我们首次证明，稀疏性可以被纳入到解释过程中作为样本特定的预处理步骤。与先前的工作不同，我们称之为SPADE的这种方法不会对训练模型施加约束，也不会影响对样本进行推断时的行为。给定一个训练好的模型和一个目标样本，SPADE使用样本定向剪枝来提供网络对样本执行的“迹象”，在计算解释之前将网络减少到最重要的连接。我们证明，使用SPADE进行预处理显著提高了图像显著性图在几种可解释性方法中的准确性。此外，SPADE提高了神经元可视化的实用性，帮助人类推理网络行为。我们的代码可在https://github.com/IST-DASLab/SPADE 上找到。

更新时间: 2024-07-19 08:53:18

领域: cs.LG

下载: http://arxiv.org/abs/2310.04519v2

Global optimality under amenable symmetry constraints

Consider a convex function that is invariant under an group of transformations. If it has a minimizer, does it also have an invariant minimizer? Variants of this problem appear in nonparametric statistics and in a number of adjacent fields. The answer depends on the choice of function, and on what one may loosely call the geometry of the problem -- the interplay between convexity, the group, and the underlying vector space, which is typically infinite-dimensional. We observe that this geometry is completely encoded in the smallest closed convex invariant subsets of the space, and proceed to study these sets, for groups that are amenable but not necessarily compact. We then apply this toolkit to the invariant optimality problem. It yields new results on invariant kernel mean embeddings and risk-optimal invariant couplings, and clarifies relations between seemingly distinct ideas, such as the summation trick used in machine learning to construct equivariant neural networks and the classic Hunt-Stein theorem of statistics.

Updated: 2024-07-19 08:50:31

标题: Amenable对称约束下的全局最优性

摘要: 考虑一个在一组变换下不变的凸函数。如果它有一个极小值点，那么它是否也有一个不变的极小值点呢？这个问题的变体出现在非参数统计学和许多相关领域中。答案取决于函数的选择，以及可以粗略称之为问题的几何结构——凸性、群体和通常是无限维的底层向量空间之间的相互作用。我们观察到这种几何结构完全编码在空间中最小的闭凸不变子集中，并继续研究这些集合，对于可接受但不一定是紧凑的群体。然后我们将这个工具包应用于不变优化问题。它产生了关于不变核均值嵌入和风险最优不变耦合的新结果，并澄清了看似不同的想法之间的关系，例如在机器学习中用来构建等变神经网络的求和技巧和统计学中经典的亨特-斯坦定理。

更新时间: 2024-07-19 08:50:31

领域: math.ST,cs.LG,stat.ML,stat.TH

下载: http://arxiv.org/abs/2402.07613v2

A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures

The large language models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LMMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire training process to third-party platforms. However, research has demonstrated that language models are susceptible to potential security vulnerabilities, particularly in backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into language models by poisoning training samples or model weights, allowing attackers to manipulate model responses through malicious triggers. While existing surveys on backdoor attacks provide a comprehensive overview, they lack an in-depth examination of backdoor attacks specifically targeting LLMs. To bridge this gap and grasp the latest trends in the field, this paper presents a novel perspective on backdoor attacks for LLMs by focusing on fine-tuning methods. Specifically, we systematically classify backdoor attacks into three categories: full-parameter fine-tuning, parameter-efficient fine-tuning, and attacks without fine-tuning. Based on insights from a substantial review, we also discuss crucial issues for future research on backdoor attacks, such as further exploring attack algorithms that do not require fine-tuning, or developing more covert attack algorithms.

Updated: 2024-07-19 08:50:24

标题: 大规模语言模型中后门攻击与防御调查：对安全措施的影响

摘要: 大型语言模型（LLMs）填补了人类语言理解和复杂问题解决之间的差距，在几个自然语言处理任务上取得了最先进的性能，特别是在少样本和零样本设置下。尽管LMMs的有效性已得到证明，由于计算资源的限制，用户必须与开源语言模型互动，或将整个训练过程外包给第三方平台。然而，研究表明语言模型容易受到潜在的安全漏洞影响，尤其是在后门攻击方面。后门攻击旨在通过操纵训练样本或模型权重，向语言模型引入有针对性的漏洞，使攻击者能够通过恶意触发器操纵模型响应。虽然现有关于后门攻击的调查提供了全面的概述，但缺乏对专门针对LLMs的后门攻击进行深入研究。为了填补这一空白并了解该领域的最新趋势，本文通过关注微调方法，提出了一种新颖的对LLMs的后门攻击视角。具体而言，我们将后门攻击系统地分类为三类：全参数微调、参数高效微调和无需微调的攻击。基于对大量文献的研究，我们还讨论了未来研究中关键问题，例如进一步探索不需要微调的攻击算法，或开发更隐蔽的攻击算法。

更新时间: 2024-07-19 08:50:24

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2406.06852v3

Generating multi-scale NMC particles with radial grain architectures using spatial stochastics and GANs

Understanding structure-property relationships of Li-ion battery cathodes is crucial for optimizing rate-performance and cycle-life resilience. However, correlating the morphology of cathode particles, such as in NMC811, and their inner grain architecture with electrode performance is challenging, particularly, due to the significant length-scale difference between grain and particle sizes. Experimentally, it is currently not feasible to image such a high number of particles with full granular detail to achieve representivity. A second challenge is that sufficiently high-resolution 3D imaging techniques remain expensive and are sparsely available at research institutions. To address these challenges, a stereological generative adversarial network (GAN)-based model fitting approach is presented that can generate representative 3D information from 2D data, enabling characterization of materials in 3D using cost-effective 2D data. Once calibrated, this multi-scale model is able to rapidly generate virtual cathode particles that are statistically similar to experimental data, and thus is suitable for virtual characterization and materials testing through numerical simulations. A large dataset of simulated particles with inner grain architecture has been made publicly available.

Updated: 2024-07-19 08:44:39

标题: 使用空间随机性和生成对抗网络（GANs）生成具有径向晶粒结构的多尺度NMC颗粒

摘要: 理解锂离子电池正极的结构-性能关系对于优化速率性能和循环寿命至关重要。然而，将正极颗粒的形态（如NMC811）和其内部晶粒结构与电极性能相关联是具有挑战性的，特别是由于晶粒和颗粒尺寸之间的显著长度尺度差异。目前在实验上，无法成像如此大量的颗粒并获得完整的颗粒细节以实现代表性。第二个挑战是足够高分辨率的3D成像技术仍然昂贵且在研究机构中稀缺。为了解决这些挑战，提出了一种基于立体学对抗生成网络（GAN）的模型拟合方法，可以从2D数据生成代表性的3D信息，从而利用经济有效的2D数据对材料进行3D表征。一旦校准，这种多尺度模型能够快速生成与实验数据统计相似的虚拟正极颗粒，因此适用于通过数值模拟进行虚拟表征和材料测试。已经提供了一个包含内部晶粒结构的大量模拟颗粒数据集。

更新时间: 2024-07-19 08:44:39

领域: physics.app-ph,cs.AI

下载: http://arxiv.org/abs/2407.05333v2

The Cardinality of Identifying Code Sets for Soccer Ball Graph with Application to Remote Sensing

In the context of satellite monitoring of the earth, we can assume that the surface of the earth is divided into a set of regions. We assume that the impact of a big social/environmental event spills into neighboring regions. Using Identifying Code Sets (ICSes), we can deploy sensors in such a way that the region in which an event takes place can be uniquely identified, even with fewer sensors than regions. As Earth is almost a sphere, we use a soccer ball as a model. We construct a Soccer Ball Graph (SBG), and provide human-oriented, analytical proofs that 1) the SBG has at least 26 ICSes of cardinality ten, implying that there are at least 26 different ways to deploy ten satellites to monitor the Earth and 2) that the cardinality of the minimum Identifying Code Set (MICS) for the SBG is at least nine. We then provide a machine-oriented formal proof that the cardinality of the MICS for the SBG is in fact ten, meaning that one must deploy at least ten satellites to monitor the Earth in the SBG model. We also provide machine-oriented proof that there are exactly 26 ICSes of cardinality ten for the SBG.

Updated: 2024-07-19 08:36:44

标题: 足球图的识别码集的基数及其在遥感中的应用

摘要: 在卫星监测地球的背景下，我们可以假设地球表面被划分为一组区域。我们假设一个重大社会/环境事件的影响会波及到相邻的区域。通过使用识别码集（ICSes），我们可以部署传感器，以便在发生事件的区域可以被唯一识别，即使比区域数量更少的传感器也可以实现。由于地球几乎是一个球体，我们使用足球作为模型。我们构建了一个足球图（SBG），并提供了人为导向的分析证明：1）SBG至少有26个基数为十的ICSes，意味着至少有26种不同的方式可以部署十个卫星来监测地球；2）SBG的最小识别码集（MICS）的基数至少为九。然后，我们提供了一个面向机器的正式证明，即SBG的MICS的基数实际上为十，这意味着必须至少部署十个卫星来监测SBG模型中的地球。我们还提供了面向机器的证明，即对于SBG，基数为十的ICSes恰好有26个。

更新时间: 2024-07-19 08:36:44

领域: cs.AI,I.2.3

下载: http://arxiv.org/abs/2407.14120v1

Shape and Style GAN-based Multispectral Data Augmentation for Crop/Weed Segmentation in Precision Farming

The use of deep learning methods for precision farming is gaining increasing interest. However, collecting training data in this application field is particularly challenging and costly due to the need of acquiring information during the different growing stages of the cultivation of interest. In this paper, we present a method for data augmentation that uses two GANs to create artificial images to augment the training data. To obtain a higher image quality, instead of re-creating the entire scene, we take original images and replace only the patches containing objects of interest with artificial ones containing new objects with different shapes and styles. In doing this, we take into account both the foreground (i.e., crop samples) and the background (i.e., the soil) of the patches. Quantitative experiments, conducted on publicly available datasets, demonstrate the effectiveness of the proposed approach. The source code and data discussed in this work are available as open source.

Updated: 2024-07-19 08:36:25

标题: 基于Shape and Style GAN的多光谱数据增强技术在精准农业中的作物/杂草分割应用

摘要: 深度学习方法在精准农业领域的应用越来越受到关注。然而，由于需要在不同种植阶段获取信息，因此在这个应用领域收集训练数据尤为具有挑战性和昂贵。本文提出了一种数据增强方法，利用两个生成对抗网络（GANs）创建人工图像来增强训练数据。为了获得更高的图像质量，我们不是重新创建整个场景，而是使用原始图像，仅用人工对象替换包含感兴趣对象的补丁，这些人工对象具有不同形状和风格。在此过程中，我们考虑了补丁的前景（即作物样本）和背景（即土壤）。在公开可用的数据集上进行的定量实验证明了所提方法的有效性。本文讨论的源代码和数据可作为开源获取。

更新时间: 2024-07-19 08:36:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.14119v1

AuditNet: A Conversational AI-based Security Assistant [DEMO]

In the age of information overload, professionals across various fields face the challenge of navigating vast amounts of documentation and ever-evolving standards. Ensuring compliance with standards, regulations, and contractual obligations is a critical yet complex task across various professional fields. We propose a versatile conversational AI assistant framework designed to facilitate compliance checking on the go, in diverse domains, including but not limited to network infrastructure, legal contracts, educational standards, environmental regulations, and government policies. By leveraging retrieval-augmented generation using large language models, our framework automates the review, indexing, and retrieval of relevant, context-aware information, streamlining the process of verifying adherence to established guidelines and requirements. This AI assistant not only reduces the manual effort involved in compliance checks but also enhances accuracy and efficiency, supporting professionals in maintaining high standards of practice and ensuring regulatory compliance in their respective fields. We propose and demonstrate AuditNet, the first conversational AI security assistant designed to assist IoT network security experts by providing instant access to security standards, policies, and regulations.

Updated: 2024-07-19 08:33:07

标题: AuditNet：一个基于对话式人工智能的安全助手【演示】

摘要: 在信息过载的时代，各行业的专业人士面临着浩瀚文档和不断发展的标准的挑战。确保符合标准、法规和合同义务是各行业的关键但又复杂的任务。我们提出了一个多功能的对话式人工智能助手框架，旨在促进即时的合规检查，在各个领域包括但不限于网络基础设施、法律合同、教育标准、环境法规和政府政策。通过利用大型语言模型的检索增强生成，我们的框架自动化了相关、具有上下文意识的信息的审查、索引和检索，简化了验证遵守已建立指南和要求的过程。这个人工智能助手不仅减少了合规检查中的手动努力，还增强了准确性和效率，支持专业人员保持高标准的实践，确保他们各自领域的法规合规性。我们提出并展示了AuditNet，第一个旨在帮助物联网网络安全专家的对话式人工智能安全助手，通过提供即时访问安全标准、政策和法规。

更新时间: 2024-07-19 08:33:07

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.14116v1

A3Rank: Augmentation Alignment Analysis for Prioritizing Overconfident Failing Samples for Deep Learning Models

Sharpening deep learning models by training them with examples close to the decision boundary is a well-known best practice. Nonetheless, these models are still error-prone in producing predictions. In practice, the inference of the deep learning models in many application systems is guarded by a rejector, such as a confidence-based rejector, to filter out samples with insufficient prediction confidence. Such confidence-based rejectors cannot effectively guard against failing samples with high confidence. Existing test case prioritization techniques effectively distinguish confusing samples from confident samples to identify failing samples among the confusing ones, yet prioritizing the failing ones high among many confident ones is challenging. In this paper, we propose $A^3$Rank, a novel test case prioritization technique with augmentation alignment analysis, to address this problem. $A^3$Rank generates augmented versions of each test case and assesses the extent of the prediction result for the test case misaligned with these of the augmented versions and vice versa. Our experiment shows that $A^3$Rank can effectively rank failing samples escaping from the checking of confidence-based rejectors, which significantly outperforms the peer techniques by 163.63\% in the detection ratio of top-ranked samples. We also provide a framework to construct a detector devoted to augmenting these rejectors to defend these failing samples, and our detector can achieve a significantly higher defense success rate.

Updated: 2024-07-19 08:32:10

标题: A3Rank：用于深度学习模型中优先处理过度自信失败样本的增强对齐分析

摘要: 通过训练接近决策边界的示例来提升深度学习模型是一个众所周知的最佳实践。然而，这些模型在产生预测时仍然容易出错。在实践中，许多应用系统中深度学习模型的推断受到一个拒绝器的保护，比如基于置信度的拒绝器，用于过滤出预测置信度不足的样本。这种基于置信度的拒绝器无法有效地防止置信度高但仍然失败的样本。现有的测试用例优先级排序技术能够有效区分令人困惑的样本和自信的样本，以识别出在令人困惑的样本中的失败样本，但是在许多自信的样本中将失败的样本优先级排在前面是具有挑战性的。在本文中，我们提出了一种新颖的测试用例优先级排序技术$A^3$Rank，通过增强对齐分析来解决这个问题。$A^3$Rank生成每个测试用例的增强版本，并评估与这些增强版本不一致的测试用例的预测结果程度，反之亦然。我们的实验表明，$A^3$Rank能够有效地对逃脱基于置信度拒绝器检查的失败样本进行排序，其在排名前列样本的检测率方面比同行技术提高了163.63\%。我们还提供了一个框架，用于构建一个专门用于增强这些拒绝器以保护这些失败样本的检测器，我们的检测器可以实现显著更高的防御成功率。

更新时间: 2024-07-19 08:32:10

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2407.14114v1

A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent

Distributed gradient descent algorithms have come to the fore in modern machine learning, especially in parallelizing the handling of large datasets that are distributed across several workers. However, scant attention has been paid to analyzing the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions instead of random noise. In this paper, we formulate a novel problem in which adversarial corruptions are present in a distributed learning system. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Extensive convergence analysis for (strongly) convex loss functions is provided for different choices of the stepsize. We carefully optimize the stepsize schedule to accelerate the convergence of the algorithm, while at the same time amortizing the effect of the corruption over time. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.

Updated: 2024-07-19 08:29:12

标题: 一种基于镜像下降的算法，用于容忍数据损坏的分布式梯度下降

摘要: 分布式梯度下降算法在现代机器学习中备受关注，特别是在并行处理分布在多个工作者之间的大型数据集时。然而，对于存在敌对性破坏而不是随机噪声的分布式梯度下降算法的行为分析却受到了较少的关注。在本文中，我们提出了一个新颖的问题，即在分布式学习系统中存在敌对性破坏。我们展示了如何利用（懒惰的）镜像下降的思想来设计一个容忍破坏的分布式优化算法。针对不同步长选择，我们提供了对（强烈）凸损失函数的广泛收敛分析。我们仔细优化了步长计划，以加快算法的收敛速度，同时随着时间摊销破坏的影响。基于线性回归、支持向量分类和MNIST数据集上的softmax分类的实验证实了我们的理论发现。

更新时间: 2024-07-19 08:29:12

领域: eess.SP,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2407.14111v1

Exploiting Uncommon Text-Encoded Structures for Automated Jailbreaks in LLMs

Large Language Models (LLMs) are widely used in natural language processing but face the risk of jailbreak attacks that maliciously induce them to generate harmful content. Existing jailbreak attacks, including character-level and context-level attacks, mainly focus on the prompt of the plain text without specifically exploring the significant influence of its structure. In this paper, we focus on studying how prompt structure contributes to the jailbreak attack. We introduce a novel structure-level attack method based on tail structures that are rarely used during LLM training, which we refer to as Uncommon Text-Encoded Structure (UTES). We extensively study 12 UTESs templates and 6 obfuscation methods to build an effective automated jailbreak tool named StructuralSleight that contains three escalating attack strategies: Structural Attack, Structural and Character/Context Obfuscation Attack, and Fully Obfuscated Structural Attack. Extensive experiments on existing LLMs show that StructuralSleight significantly outperforms baseline methods. In particular, the attack success rate reaches 94.62\% on GPT-4o, which has not been addressed by state-of-the-art techniques.

Updated: 2024-07-19 08:23:38

标题: 利用LLMs中罕见的文本编码结构实现自动越狱

摘要: 大型语言模型（LLMs）在自然语言处理中被广泛使用，但面临盗破攻击的风险，这些攻击会恶意诱使它们生成有害内容。现有的盗破攻击，包括字符级和上下文级攻击，主要关注纯文本的提示，而没有具体探索其结构的重要影响。本文侧重研究提示结构如何促成盗破攻击。我们介绍了一种基于很少在LLM训练中使用的尾部结构的新型结构级攻击方法，我们称之为Uncommon Text-Encoded Structure（UTES）。我们广泛研究了12个UTES模板和6种混淆方法，以构建一个名为StructuralSleight的有效自动化盗破工具，其中包含三种逐步升级的攻击策略：结构攻击、结构和字符/上下文混淆攻击，以及完全混淆的结构攻击。对现有LLMs的广泛实验表明，StructuralSleight显著优于基准方法。特别是，在GPT-4o上，攻击成功率达到94.62\%，这是目前技术水平无法解决的。

更新时间: 2024-07-19 08:23:38

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2406.08754v2

TorchGT: A Holistic System for Large-scale Graph Transformer Training

Graph Transformer is a new architecture that surpasses GNNs in graph learning. While there emerge inspiring algorithm advancements, their practical adoption is still limited, particularly on real-world graphs involving up to millions of nodes. We observe existing graph transformers fail on large-scale graphs mainly due to heavy computation, limited scalability and inferior model quality. Motivated by these observations, we propose TorchGT, the first efficient, scalable, and accurate graph transformer training system. TorchGT optimizes training at different levels. At algorithm level, by harnessing the graph sparsity, TorchGT introduces a Dual-interleaved Attention which is computation-efficient and accuracy-maintained. At runtime level, TorchGT scales training across workers with a communication-light Cluster-aware Graph Parallelism. At kernel level, an Elastic Computation Reformation further optimizes the computation by reducing memory access latency in a dynamic way. Extensive experiments demonstrate that TorchGT boosts training by up to 62.7x and supports graph sequence lengths of up to 1M.

Updated: 2024-07-19 08:21:42

标题: TorchGT：用于大规模图转换器训练的整体系统

摘要: 图形转换器是一种新的架构，超越了图形学习中的GNN。虽然出现了鼓舞人心的算法进步，但它们在实际应用中仍然受限，特别是在涉及数百万节点的实际世界图中。我们观察到现有的图形转换器在大规模图上失败主要是由于大量计算、有限的可扩展性和较低的模型质量。受这些观察的启发，我们提出了TorchGT，第一个高效、可扩展和准确的图形转换器训练系统。TorchGT在不同层次上优化训练。在算法层面上，通过利用图的稀疏性，TorchGT引入了一种双交替注意机制，这种机制既高效又保持准确性。在运行时层面上，TorchGT通过一种通信轻量级的集群感知图并行化，实现了跨工作节点的训练扩展。在内核层面上，一种弹性计算重塑进一步通过动态方式减少内存访问延迟来优化计算。大量实验表明，TorchGT将训练提升了最多62.7倍，并支持长达100万的图序列长度。

更新时间: 2024-07-19 08:21:42

领域: cs.DC,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.14106v1

Multi-Texture Synthesis through Signal Responsive Neural Cellular Automata

Neural Cellular Automata (NCA) have proven to be effective in a variety of fields, with numerous biologically inspired applications. One of the fields, in which NCAs perform well is the generation of textures, modelling global patterns from local interactions governed by uniform and coherent rules. This paper aims to enhance the usability of NCAs in texture synthesis by addressing a shortcoming of current NCA architectures for texture generation, which requires separately trained NCA for each individual texture. In this work, we train a single NCA for the evolution of multiple textures, based on individual examples. Our solution provides texture information in the state of each cell, in the form of an internally coded genomic signal, which enables the NCA to generate the expected texture. Such a neural cellular automaton not only maintains its regenerative capability but also allows for interpolation between learned textures and supports grafting techniques. This demonstrates the ability to edit generated textures and the potential for them to merge and coexist within the same automaton. We also address questions related to the influence of the genomic information and the cost function on the evolution of the NCA.

Updated: 2024-07-19 08:17:44

标题: 多纹理合成通过信号响应神经细胞自动机

摘要: 神经元细胞自动机（NCA）已被证明在各个领域中具有有效性，具有众多受生物启发的应用。其中，NCAs表现良好的一个领域是生成纹理，通过受统一和一致规则控制的局部相互作用对全局模式进行建模。本文旨在通过解决当前用于纹理生成的NCA架构的一个缺点，即需要为每种单独的纹理单独训练NCA，以增强NCAs在纹理合成中的可用性。在这项工作中，我们训练一个单独的NCA来演化多种纹理，基于个别示例。我们的解决方案通过在每个细胞的状态中提供纹理信息，以内部编码的基因信号形式，使NCA能够生成预期的纹理。这样一个神经元细胞自动机不仅保持其再生能力，还允许在学习的纹理之间进行插值，并支持嫁接技术。这展示了编辑生成的纹理的能力以及它们在同一自动机中融合和共存的潜力。我们还讨论了基因组信息和成本函数对NCA演化的影响相关问题。

更新时间: 2024-07-19 08:17:44

领域: cs.NE,cs.AI

下载: http://arxiv.org/abs/2407.05991v2

ParamsDrag: Interactive Parameter Space Exploration via Image-Space Dragging

Numerical simulation serves as a cornerstone in scientific modeling, yet the process of fine-tuning simulation parameters poses significant challenges. Conventionally, parameter adjustment relies on extensive numerical simulations, data analysis, and expert insights, resulting in substantial computational costs and low efficiency. The emergence of deep learning in recent years has provided promising avenues for more efficient exploration of parameter spaces. However, existing approaches often lack intuitive methods for precise parameter adjustment and optimization. To tackle these challenges, we introduce ParamsDrag, a model that facilitates parameter space exploration through direct interaction with visualizations. Inspired by DragGAN, our ParamsDrag model operates in three steps. First, the generative component of ParamsDrag generates visualizations based on the input simulation parameters. Second, by directly dragging structure-related features in the visualizations, users can intuitively understand the controlling effect of different parameters. Third, with the understanding from the earlier step, users can steer ParamsDrag to produce dynamic visual outcomes. Through experiments conducted on real-world simulations and comparisons with state-of-the-art deep learning-based approaches, we demonstrate the efficacy of our solution.

Updated: 2024-07-19 08:12:41

标题: ParamsDrag：通过图像空间拖动进行交互式参数空间探索

摘要: 数值模拟在科学建模中起着基石作用，然而微调模拟参数的过程面临着重大挑战。传统上，参数调整依赖于大量的数值模拟、数据分析和专家见解，导致计算成本高昂且效率低下。近年来深度学习的出现为更有效地探索参数空间提供了希望。然而，现有方法通常缺乏精确参数调整和优化的直观方法。为了解决这些挑战，我们引入了ParamsDrag，这是一个通过与可视化直接交互来促进参数空间探索的模型。受DragGAN的启发，我们的ParamsDrag模型分为三个步骤。首先，ParamsDrag的生成组件根据输入的模拟参数生成可视化。其次，通过直接拖动可视化中的结构相关特征，用户可以直观地了解不同参数的控制效果。第三，通过前一步的理解，用户可以引导ParamsDrag产生动态的可视结果。通过在真实世界模拟上进行的实验和与基于最新深度学习方法的比较，我们展示了我们解决方案的有效性。

更新时间: 2024-07-19 08:12:41

领域: cs.GR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.14100v1

DPER: Diffusion Prior Driven Neural Representation for Limited Angle and Sparse View CT Reconstruction

Limited-angle and sparse-view computed tomography (LACT and SVCT) are crucial for expanding the scope of X-ray CT applications. However, they face challenges due to incomplete data acquisition, resulting in diverse artifacts in the reconstructed CT images. Emerging implicit neural representation (INR) techniques, such as NeRF, NeAT, and NeRP, have shown promise in under-determined CT imaging reconstruction tasks. However, the unsupervised nature of INR architecture imposes limited constraints on the solution space, particularly for the highly ill-posed reconstruction task posed by LACT and ultra-SVCT. In this study, we introduce the Diffusion Prior Driven Neural Representation (DPER), an advanced unsupervised framework designed to address the exceptionally ill-posed CT reconstruction inverse problems. DPER adopts the Half Quadratic Splitting (HQS) algorithm to decompose the inverse problem into data fidelity and distribution prior sub-problems. The two sub-problems are respectively addressed by INR reconstruction scheme and pre-trained score-based diffusion model. This combination first injects the implicit image local consistency prior from INR. Additionally, it effectively augments the feasibility of the solution space for the inverse problem through the generative diffusion model, resulting in increased stability and precision in the solutions. We conduct comprehensive experiments to evaluate the performance of DPER on LACT and ultra-SVCT reconstruction with two public datasets (AAPM and LIDC), an in-house clinical COVID-19 dataset and a public raw projection dataset created by Mayo Clinic. The results show that our method outperforms the state-of-the-art reconstruction methods on in-domain datasets, while achieving significant performance improvements on out-of-domain (OOD) datasets.

Updated: 2024-07-19 08:12:24

标题: DPER：受扩散先验驱动的有限角度和稀疏视角CT重建的神经表示

摘要: 有限角度和稀疏视图计算机断层扫描（LACT和SVCT）对于扩大X射线计算机断层扫描应用的范围至关重要。然而，它们面临着由于数据采集不完整而导致重建的CT图像中出现各种伪影的挑战。新兴的隐式神经表示（INR）技术，如NeRF、NeAT和NeRP，在未确定的CT成像重建任务中显示出潜力。然而，INR架构的无监督性质对解决空间施加了有限的约束，特别是对于LACT和超稀疏SVCT提出的高度病态的重建任务。在这项研究中，我们介绍了扩散先验驱动的神经表示（DPER），这是一个先进的无监督框架，旨在解决极度病态的CT重建逆问题。DPER采用Half Quadratic Splitting（HQS）算法将逆问题分解为数据保真度和分布先验子问题。这两个子问题分别由INR重建方案和预先训练的基于分数的扩散模型解决。该组合首先从INR中注入隐式图像局部一致性先验。此外，通过生成扩散模型有效地增加了逆问题的解空间的可行性，从而提高了解决方案的稳定性和精度。我们进行了全面的实验，评估了DPER在LACT和超稀疏SVCT重建中的性能，使用了两个公共数据集（AAPM和LIDC）、一个内部临床COVID-19数据集以及由Mayo Clinic创建的一个公共原始投影数据集。结果表明，我们的方法在域内数据集上优于最先进的重建方法，同时在域外（OOD）数据集上实现了显著的性能改进。

更新时间: 2024-07-19 08:12:24

领域: eess.IV,cs.AI,cs.CV,I.2.10; I.4.5

下载: http://arxiv.org/abs/2404.17890v2

Conditional Generative Models are Provably Robust: Pointwise Guarantees for Bayesian Inverse Problems

Conditional generative models became a very powerful tool to sample from Bayesian inverse problem posteriors. It is well-known in classical Bayesian literature that posterior measures are quite robust with respect to perturbations of both the prior measure and the negative log-likelihood, which includes perturbations of the observations. However, to the best of our knowledge, the robustness of conditional generative models with respect to perturbations of the observations has not been investigated yet. In this paper, we prove for the first time that appropriately learned conditional generative models provide robust results for single observations.

Updated: 2024-07-19 08:11:11

标题: 有条件生成模型可以被证明是强大的：贝叶斯逆问题的点态保证

摘要: 条件生成模型已成为从贝叶斯反问题后验中抽样的强大工具。在经典贝叶斯文献中已知，后验测度对先验测度和负对数似然的扰动都是相当稳健的，其中包括观测的扰动。然而，据我们所知，条件生成模型对观测的扰动的稳健性尚未被研究。在本文中，我们首次证明，适当学习的条件生成模型为单个观测提供稳健的结果。

更新时间: 2024-07-19 08:11:11

领域: cs.LG,math.ST,stat.TH

下载: http://arxiv.org/abs/2303.15845v3

On the Robustness of Fully-Spiking Neural Networks in Open-World Scenarios using Forward-Only Learning Algorithms

In the last decade, Artificial Intelligence (AI) models have rapidly integrated into production pipelines propelled by their excellent modeling performance. However, the development of these models has not been matched by advancements in algorithms ensuring their safety, failing to guarantee robust behavior against Out-of-Distribution (OoD) inputs outside their learning domain. Furthermore, there is a growing concern with the sustainability of AI models and their required energy consumption in both training and inference phases. To mitigate these issues, this work explores the use of the Forward-Forward Algorithm (FFA), a biologically plausible alternative to Backpropagation, adapted to the spiking domain to enhance the overall energy efficiency of the model. By capitalizing on the highly expressive topology emerging from the latent space of models trained with FFA, we develop a novel FF-SCP algorithm for OoD Detection. Our approach measures the likelihood of a sample belonging to the in-distribution (ID) data by using the distance from the latent representation of samples to class-representative manifolds. Additionally, to provide deeper insights into our OoD pipeline, we propose a gradient-free attribution technique that highlights the features of a sample pushing it away from the distribution of any class. Multiple experiments using our spiking FFA adaptation demonstrate that the achieved accuracy levels are comparable to those seen in analog networks trained via back-propagation. Furthermore, OoD detection experiments on multiple datasets prove that FF-SCP outperforms avant-garde OoD detectors within the spiking domain in terms of several metrics used in this area. We also present a qualitative analysis of our explainability technique, exposing the precision by which the method detects OoD features, such as embedded artifacts or missing regions.

Updated: 2024-07-19 08:08:17

标题: 关于在开放世界场景中使用仅向前学习算法的全脉冲神经网络的鲁棒性

摘要: 在过去的十年中，人工智能（AI）模型迅速融入生产流程，得益于其出色的建模性能。然而，这些模型的发展并未得到相匹配的算法进步，无法确保其对于学习域之外的分布外输入的鲁棒行为。此外，人们对AI模型的可持续性及其在训练和推理阶段所需的能源消耗也越来越关注。为了缓解这些问题，本研究探讨了使用前向-前向算法（FFA），这是一种生物学上可信的反向传播的替代方案，适用于脉冲领域，以提高模型的整体能效。通过利用使用FFA训练的模型的潜在空间中出现的高度表达拓扑结构，我们开发了一种用于OoD检测的新颖FF-SCP算法。我们的方法通过使用样本的潜在表示与类别代表流形之间的距离来衡量样本属于分布内（ID）数据的可能性。此外，为了更深入地了解我们的OoD管道，我们提出了一种无梯度归因技术，突显样本特征，将其推离任何类别的分布。使用我们的脉冲FFA适应性进行的多次实验表明，实现的精确度水平与通过反向传播训练的模拟网络相当。此外，在多个数据集上进行的OoD检测实验证明，在这一领域内，FF-SCP在几个用于本领域的度量标准方面优于前卫的OoD检测器。我们还对我们的可解释性技术进行了定性分析，展示了该方法检测OoD特征（如嵌入式工件或缺失区域）的精确程度。

更新时间: 2024-07-19 08:08:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.14097v1

Geolocation Predicting of Tweets Using BERT-Based Models

This research is aimed to solve the tweet/user geolocation prediction task and provide a flexible methodology for the geotagging of textual big data. The suggested approach implements neural networks for natural language processing (NLP) to estimate the location as coordinate pairs (longitude, latitude) and two-dimensional Gaussian Mixture Models (GMMs). The scope of proposed models has been finetuned on a Twitter dataset using pretrained Bidirectional Encoder Representations from Transformers (BERT) as base models. Performance metrics show a median error of fewer than 30 km on a worldwide-level, and fewer than 15 km on the US-level datasets for the models trained and evaluated on text features of tweets' content and metadata context. Our source code and data are available at https://github.com/K4TEL/geo-twitter.git

Updated: 2024-07-19 08:06:30

标题: 使用基于BERT的模型预测tweets的地理位置

摘要: 这项研究旨在解决推文/用户地理定位预测任务，并提供一种灵活的方法论，用于对文本大数据进行地理标记。建议的方法采用神经网络进行自然语言处理（NLP），以估计位置为坐标对（经度，纬度），并使用二维高斯混合模型（GMMs）。所提出的模型范围已在Twitter数据集上进行了微调，使用预训练的双向编码器表示转换器（BERT）作为基础模型。性能指标显示，在全球范围内，训练和评估了推文内容和元数据上的文本特征的模型的中位误差小于30公里，在美国范围内的数据集中小于15公里。我们的源代码和数据可在https://github.com/K4TEL/geo-twitter.git 上找到。

更新时间: 2024-07-19 08:06:30

领域: cs.CL,cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2303.07865v3

People use fast, goal-directed simulation to reason about novel games

We can evaluate features of problems and their potential solutions well before we can effectively solve them. When considering a game we have never played, for instance, we might infer whether it is likely to be challenging, fair, or fun simply from hearing the game rules, prior to deciding whether to invest time in learning the game or trying to play it well. Many studies of game play have focused on optimality and expertise, characterizing how people and computational models play based on moderate to extensive search and after playing a game dozens (if not thousands or millions) of times. Here, we study how people reason about a range of simple but novel connect-n style board games. We ask people to judge how fair and how fun the games are from very little experience: just thinking about the game for a minute or so, before they have ever actually played with anyone else, and we propose a resource-limited model that captures their judgments using only a small number of partial game simulations and almost no lookahead search.

Updated: 2024-07-19 07:59:04

标题: 人们使用快速、目标导向的模拟来推理新颖游戏

摘要: 我们可以在有效解决问题之前很好地评估问题及其潜在解决方案的特征。例如，当考虑一个我们从未玩过的游戏时，我们可能会从听到游戏规则开始推断它是否可能具有挑战性、公平或趣味性，然后再决定是否投入时间学习这个游戏或尝试玩得好。许多游戏研究专注于最优性和专业性，描述了人们和计算模型如何基于中等到广泛的搜索以及在玩了几十次（如果不是数千次或数百万次）之后玩游戏。在这里，我们研究了人们如何推理一系列简单但新颖的 connect-n 类型棋盘游戏。我们要求人们在几乎没有经验的情况下判断这些游戏的公平性和趣味性：只是对游戏思考一分钟左右，然后他们从未真正与其他人一起玩过，我们提出了一个资源有限的模型，仅使用少量部分游戏模拟和几乎没有前瞻搜索来捕捉他们的判断。

更新时间: 2024-07-19 07:59:04

领域: cs.GT,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2407.14095v1

User-Creator Feature Dynamics in Recommender Systems with Dual Influence

Recommender systems present relevant contents to users and help content creators reach their target audience. The dual nature of these systems influences both users and creators: users' preferences are affected by the items they are recommended, while creators are incentivized to alter their contents such that it is recommended more frequently. We define a model, called user-creator feature dynamics, to capture the dual influences of recommender systems. We prove that a recommender system with dual influence is guaranteed to polarize, causing diversity loss in the system. We then investigate, both theoretically and empirically, approaches for mitigating polarization and promoting diversity in recommender systems. Unexpectedly, we find that common diversity-promoting approaches do not work in the presence of dual influence, while relevancy-optimizing methods like top-$k$ recommendation can prevent polarization and improve diversity of the system.

Updated: 2024-07-19 07:58:26

标题: 《具有双重影响的推荐系统中的用户创作者特征动态》

摘要: 推荐系统向用户呈现相关内容，并帮助内容创作者达到他们的目标受众。这些系统的双重性质影响着用户和创建者：用户的偏好受到他们被推荐的物品的影响，而创建者被激励改变他们的内容，使其更频繁地被推荐。我们定义了一个模型，称为用户-创建者特征动态，来捕捉推荐系统的双重影响。我们证明了具有双重影响的推荐系统必定会极化，导致系统中的多样性丧失。然后我们从理论和经验两方面研究了减轻极化和促进推荐系统多样性的方法。令人意外的是，我们发现在双重影响存在的情况下，常见的促进多样性的方法并不奏效，而像top-$k$推荐这样的关注相关性的方法可以防止极化并提高系统的多样性。

更新时间: 2024-07-19 07:58:26

领域: cs.IR,cs.CY,cs.GT,cs.LG

下载: http://arxiv.org/abs/2407.14094v1

Integrated Push-and-Pull Update Model for Goal-Oriented Effective Communication

This paper studies decision-making for goal-oriented effective communication. We consider an end-to-end status update system where a sensing agent (SA) observes a source, generates and transmits updates to an actuation agent (AA), while the AA takes actions to accomplish a goal at the endpoint. We integrate the push- and pull-based update communication models to obtain a push-and-pull model, which allows the transmission controller at the SA to decide to push an update to the AA and the query controller at the AA to pull updates by raising queries at specific time instances. To gauge effectiveness, we utilize a grade of effectiveness (GoE) metric incorporating updates' freshness, usefulness, and timeliness of actions as qualitative attributes. We then derive effect-aware policies to maximize the expected discounted sum of updates' effectiveness subject to induced costs. The effect-aware policy at the SA considers the potential effectiveness of communicated updates at the endpoint, while at the AA, it accounts for the probabilistic evolution of the source and importance of generated updates. Our results show the proposed push-and-pull model outperforms models solely based on push- or pull-based updates both in terms of efficiency and effectiveness. Additionally, using effect-aware policies at both agents enhances effectiveness compared to periodic and/or probabilistic effect-agnostic policies at either or both agents.

Updated: 2024-07-19 07:57:31

标题: 整合的推拉更新模型用于目标导向有效沟通

摘要: 本文研究目标导向的有效沟通的决策制定。我们考虑了一个端到端的状态更新系统，其中一个感知代理（SA）观察一个源，生成并传输更新给一个执行代理（AA），而AA则采取行动来实现端点的目标。我们整合了推送和拉取更新通信模型，得到了一个推拉模型，这允许SA的传输控制器决定向AA推送更新，并且AA的查询控制器通过在特定时间实例提出查询来拉取更新。为了衡量有效性，我们利用一个包含更新新鲜度、有用性和行动及时性的评估有效性（GoE）指标作为定性属性。然后，我们推导出考虑效果的策略，以最大化更新的有效性的期望折现总和，同时受到引发成本的限制。SA的考虑到传达更新在端点的潜在有效性，而在AA方面，它考虑到源的概率演变和生成更新的重要性。我们的结果表明，所提出的推拉模型在效率和有效性方面优于仅基于推送或拉取更新的模型。此外，在两个代理处使用考虑效果的策略比在一个或两个代理处使用周期性和/或概率效果不可知策略更增强有效性。

更新时间: 2024-07-19 07:57:31

领域: cs.IT,cs.AI,cs.MA,cs.NI,math.IT

下载: http://arxiv.org/abs/2407.14092v1

A Tutorial on the Pretrain-Finetune Paradigm for Natural Language Processing

Given that natural language serves as the primary conduit for expressing thoughts and emotions, text analysis has become a key technique in psychological research. It enables the extraction of valuable insights from natural language, facilitating endeavors like personality traits assessment, mental health monitoring, and sentiment analysis in interpersonal communications. In text analysis, existing studies often resort to either human coding, which is time-consuming, using pre-built dictionaries, which often fails to cover all possible scenarios, or training models from scratch, which requires large amounts of labeled data. In this tutorial, we introduce the pretrain-finetune paradigm. The pretrain-finetune paradigm represents a transformative approach in text analysis and natural language processing. This paradigm distinguishes itself through the use of large pretrained language models, demonstrating remarkable efficiency in finetuning tasks, even with limited training data. This efficiency is especially beneficial for research in social sciences, where the number of annotated samples is often quite limited. Our tutorial offers a comprehensive introduction to the pretrain-finetune paradigm. We first delve into the fundamental concepts of pretraining and finetuning, followed by practical exercises using real-world applications. We demonstrate the application of the paradigm across various tasks, including multi-class classification and regression. Emphasizing its efficacy and user-friendliness, the tutorial aims to encourage broader adoption of this paradigm. To this end, we have provided open access to all our code and datasets. The tutorial is highly beneficial across various psychology disciplines, providing a comprehensive guide to employing text analysis in diverse research settings.

Updated: 2024-07-19 07:47:18

标题: 一个关于自然语言处理的预训练-微调范式的教程

摘要: 鉴于自然语言是表达思想和情感的主要媒介，文本分析已成为心理研究中的关键技术。它能够从自然语言中提取有价值的见解，促进个性特征评估、心理健康监测和人际交流中的情感分析等工作。在文本分析中，现有研究通常采用人工编码、使用预先构建的词典或从头开始训练模型等方法，这些方法要么耗时，要么无法覆盖所有可能的场景，要么需要大量标记数据。在本教程中，我们介绍了预训练-微调范式。预训练-微调范式代表了文本分析和自然语言处理中的一种革命性方法。该范式通过使用大型预训练语言模型，在微调任务中展现出卓越的效率，即使只有有限的训练数据。这种效率在社会科学研究中尤其有益，因为已标记样本数量通常相当有限。我们的教程全面介绍了预训练-微调范式。我们首先深入探讨了预训练和微调的基本概念，然后通过使用真实应用实践进行实际练习。我们展示了该范式在各种任务中的应用，包括多类别分类和回归。强调其有效性和易用性，本教程旨在鼓励更广泛地采用这种范式。为此，我们提供了所有代码和数据集的开放访问。该教程在各种心理学学科中具有极大的益处，为在不同研究环境中运用文本分析提供了全面指南。

更新时间: 2024-07-19 07:47:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.02504v2

Uncertainty Management in the Construction of Knowledge Graphs: a Survey

Knowledge Graphs (KGs) are a major asset for companies thanks to their great flexibility in data representation and their numerous applications, e.g., vocabulary sharing, Q/A or recommendation systems. To build a KG it is a common practice to rely on automatic methods for extracting knowledge from various heterogeneous sources. But in a noisy and uncertain world, knowledge may not be reliable and conflicts between data sources may occur. Integrating unreliable data would directly impact the use of the KG, therefore such conflicts must be resolved. This could be done manually by selecting the best data to integrate. This first approach is highly accurate, but costly and time-consuming. That is why recent efforts focus on automatic approaches, which represents a challenging task since it requires handling the uncertainty of extracted knowledge throughout its integration into the KG. We survey state-of-the-art approaches in this direction and present constructions of both open and enterprise KGs and how their quality is maintained. We then describe different knowledge extraction methods, introducing additional uncertainty. We also discuss downstream tasks after knowledge acquisition, including KG completion using embedding models, knowledge alignment, and knowledge fusion in order to address the problem of knowledge uncertainty in KG construction. We conclude with a discussion on the remaining challenges and perspectives when constructing a KG taking into account uncertainty.

Updated: 2024-07-19 07:46:07

标题: 知识图谱构建中的不确定性管理：一项调查

摘要: 知识图谱（KGs）是公司的重要资产，因为它们在数据表示方面具有极大的灵活性，且具有诸多应用，例如词汇共享、问答或推荐系统。构建KG通常依赖于从各种异构来源提取知识的自动方法。但在一个嘈杂和不确定的世界中，知识可能并不可靠，数据源之间可能发生冲突。集成不可靠数据会直接影响KG的使用，因此这些冲突必须得到解决。这可以通过手动选择要集成的最佳数据来完成。这种第一种方法非常准确，但成本高且耗时。因此，近年来的努力集中在自动方法上，这是一项具有挑战性的任务，因为它要求在将提取的知识整合到KG中时处理不确定性。我们调查了该方向的最新方法，并介绍了开放和企业KG的构建方式以及如何维护其质量。然后我们描述了不同的知识提取方法，引入了额外的不确定性。我们还讨论了知识获取后的下游任务，包括使用嵌入模型完成KG、知识对齐和知识融合，以解决KG构建中的知识不确定性问题。最后，我们讨论了在考虑不确定性时构建KG时剩余的挑战和展望。

更新时间: 2024-07-19 07:46:07

领域: cs.AI

下载: http://arxiv.org/abs/2405.16929v2

ESQA: Event Sequences Question Answering

Event sequences (ESs) arise in many practical domains including finance, retail, social networks, and healthcare. In the context of machine learning, event sequences can be seen as a special type of tabular data with annotated timestamps. Despite the importance of ESs modeling and analysis, little effort was made in adapting large language models (LLMs) to the ESs domain. In this paper, we highlight the common difficulties of ESs processing and propose a novel solution capable of solving multiple downstream tasks with little or no finetuning. In particular, we solve the problem of working with long sequences and improve time and numeric features processing. The resulting method, called ESQA, effectively utilizes the power of LLMs and, according to extensive experiments, achieves state-of-the-art results in the ESs domain.

Updated: 2024-07-19 07:38:35

标题: ESQA：事件序列问答

摘要: 事件序列（ESs）在许多实际领域中出现，包括金融、零售、社交网络和医疗保健。在机器学习的背景下，事件序列可以被看作一种带有注释时间戳的特殊类型的表格数据。尽管ESs建模和分析的重要性，但很少有人努力将大型语言模型（LLMs）适应于ESs领域。在本文中，我们强调了ESs处理的常见困难，并提出了一种新颖的解决方案，能够在几乎不需要微调的情况下解决多个下游任务。特别是，我们解决了处理长序列的问题，并改进了时间和数值特征的处理。所得到的方法，称为ESQA，有效利用了LLMs的强大能力，并根据大量实验，在ESs领域取得了最先进的结果。

更新时间: 2024-07-19 07:38:35

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.12833v2

DisenSemi: Semi-supervised Graph Classification via Disentangled Representation Learning

Graph classification is a critical task in numerous multimedia applications, where graphs are employed to represent diverse types of multimedia data, including images, videos, and social networks. Nevertheless, in real-world scenarios, labeled graph data can be limited or scarce. To address this issue, we focus on the problem of semi-supervised graph classification, which involves both supervised and unsupervised models learning from labeled and unlabeled data. In contrast to recent approaches that transfer the entire knowledge from the unsupervised model to the supervised one, we argue that an effective transfer should only retain the relevant semantics that align well with the supervised task. In this paper, we propose a novel framework named DisenSemi, which learns disentangled representation for semi-supervised graph classification. Specifically, a disentangled graph encoder is proposed to generate factor-wise graph representations for both supervised and unsupervised models. Then we train two models via supervised objective and mutual information (MI)-based constraints respectively. To ensure the meaningful transfer of knowledge from the unsupervised encoder to the supervised one, we further define an MI-based disentangled consistency regularization between two models and identify the corresponding rationale that aligns well with the current graph classification task. Experimental results on a range of publicly accessible datasets reveal the effectiveness of our DisenSemi.

Updated: 2024-07-19 07:31:32

标题: DisenSemi: 通过解耦表示学习的半监督图分类

摘要: 图分类是许多多媒体应用中的关键任务，其中图被用来表示各种类型的多媒体数据，包括图像、视频和社交网络。然而，在现实世界的情况下，有标记的图数据可能是有限或稀缺的。为了解决这个问题，我们关注半监督图分类的问题，其中涉及从有标记和无标记数据中学习的监督和无监督模型。与最近的方法将整个无监督模型的知识转移到监督模型不同，我们认为有效的转移应该只保留与监督任务良好对齐的相关语义。在本文中，我们提出了一个名为DisenSemi的新框架，该框架学习半监督图分类的分解表示。具体地，提出了一个分解图编码器，为监督和无监督模型生成基于因子的图表示。然后，我们通过监督客观和基于互信息（MI）的约束分别训练两个模型。为了确保从无监督编码器到监督编码器的知识的有意义转移，我们进一步定义了两个模型之间基于互信息的分解一致性正则化，并确定与当前图分类任务良好对齐的相应原理。一系列公开可访问数据集的实验结果显示了我们DisenSemi的有效性。

更新时间: 2024-07-19 07:31:32

领域: cs.LG,cs.AI,cs.IR,cs.SI

下载: http://arxiv.org/abs/2407.14081v1

Temporal receptive field in dynamic graph learning: A comprehensive analysis

Dynamic link prediction is a critical task in the analysis of evolving networks, with applications ranging from recommender systems to economic exchanges. However, the concept of the temporal receptive field, which refers to the temporal context that models use for making predictions, has been largely overlooked and insufficiently analyzed in existing research. In this study, we present a comprehensive analysis of the temporal receptive field in dynamic graph learning. By examining multiple datasets and models, we formalize the role of temporal receptive field and highlight their crucial influence on predictive accuracy. Our results demonstrate that appropriately chosen temporal receptive field can significantly enhance model performance, while for some models, overly large windows may introduce noise and reduce accuracy. We conduct extensive benchmarking to validate our findings, ensuring that all experiments are fully reproducible. Code is available at https://github.com/ykrmm/BenchmarkTW .

Updated: 2024-07-19 07:27:14

标题: 动态图学习中的时间感受野：全面分析

摘要: 动态链接预测是分析不断发展的网络中的关键任务，应用范围从推荐系统到经济交流。然而，关于时间感受域的概念，即模型用于进行预测的时间上下文，在现有研究中很大程度上被忽视和分析不足。在本研究中，我们在动态图学习中对时间感受域进行了全面分析。通过检查多个数据集和模型，我们形式化了时间感受域的作用，并强调它们对预测精度的关键影响。我们的结果表明，适当选择的时间感受域可以显著提高模型性能，而对于一些模型，过大的窗口可能会引入噪音并降低准确性。我们进行了广泛的基准测试来验证我们的发现，确保所有实验都能完全可复制。代码可在 https://github.com/ykrmm/BenchmarkTW 上找到。

更新时间: 2024-07-19 07:27:14

领域: cs.LG

下载: http://arxiv.org/abs/2407.12370v2

Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train

The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-training method to acquire a cardiac structure-aware world model. The core innovation lies in constructing a self-supervised task that requires structural inference by predicting masked structures on a 2D plane and imagining another plane based on pose transformation in 3D space. To support large-scale pre-training, we collected over 1.36 million echocardiograms from ten standard views, along with their 3D spatial poses. In the downstream probe guidance task, we demonstrate that our pre-trained model consistently reduces guidance errors across the ten most common standard views on the test set with 0.29 million samples from 74 routine clinical scans, indicating that structure-aware pre-training benefits the scanning.

Updated: 2024-07-19 07:15:07

标题: 结构感知的世界模型用于通过大规模自监督预训练引导探针

摘要: 心脏的复杂结构导致超声心动图在获取心脏超声图像方面存在重大挑战。成功的超声心动图需要对二维平面上的结构以及三维空间中平面之间的空间关系有深入的理解。本文创新性地提出了一种大规模自监督预训练方法，以获取了解心脏结构的世界模型。核心创新在于构建了一个自监督任务，通过预测2D平面上的遮罩结构并根据3D空间中的姿态变换想象另一个平面来要求结构推理。为支持大规模预训练，我们收集了超过136万个来自十个标准视图的超声心动图，以及它们的3D空间姿态。在下游探针引导任务中，我们证明了我们的预训练模型在测试集上在十个最常见的标准视图上一致降低了引导错误，测试集包括来自74例常规临床扫描的29万个样本，表明结构感知的预训练有益于扫描。

更新时间: 2024-07-19 07:15:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2406.19756v2

Domain-Specific Pretraining of Language Models: A Comparative Study in the Medical Field

There are many cases where LLMs are used for specific tasks in a single domain. These usually require less general, but more domain-specific knowledge. Highly capable, general-purpose state-of-the-art language models like GPT-4 or Claude-3-opus can often be used for such tasks, but they are very large and cannot be run locally, even if they were not proprietary. This can be a problem when working with sensitive data. This paper focuses on domain-specific and mixed-domain pretraining as potentially more efficient methods than general pretraining for specialized language models. We will take a look at work related to domain-specific pretraining, specifically in the medical area, and compare benchmark results of specialized language models to general-purpose language models.

Updated: 2024-07-19 07:12:43

标题: 领域特定的语言模型预训练：医学领域的比较研究

摘要: 有许多情况下，LLMs被用于单一领域的特定任务。这些通常需要更少的一般性，但更多的领域特定知识。像GPT-4或Claude-3-opus这样功能强大的通用语言模型通常可以用于这些任务，但它们非常庞大，即使它们不是专有的也无法在本地运行。在处理敏感数据时，这可能会成为一个问题。本文重点研究了特定领域和混合领域预训练作为专门语言模型比通用预训练更有效的方法。我们将关注与特定领域预训练相关的工作，特别是在医疗领域，并将专门语言模型与通用语言模型进行基准结果的比较。

更新时间: 2024-07-19 07:12:43

领域: cs.LG,cs.AI,cs.CL,I.2.6; I.2.7

下载: http://arxiv.org/abs/2407.14076v1

DISCO: Efficient Diffusion Solver for Large-Scale Combinatorial Optimization Problems

Combinatorial Optimization (CO) problems are fundamentally crucial in numerous practical applications across diverse industries, characterized by entailing enormous solution space and demanding time-sensitive response. Despite significant advancements made by recent neural solvers, their limited expressiveness does not conform well to the multi-modal nature of CO landscapes. While some research has pivoted towards diffusion models, they require simulating a Markov chain with many steps to produce a sample, which is time-consuming and does not meet the efficiency requirement of real applications, especially at scale. We propose DISCO, an efficient DIffusion Solver for Combinatorial Optimization problems that excels in both solution quality and inference speed. DISCO's efficacy is two-pronged: Firstly, it achieves rapid denoising of solutions through an analytically solvable form, allowing for direct sampling from the solution space with very few reverse-time steps, thereby drastically reducing inference time. Secondly, DISCO enhances solution quality by restricting the sampling space to a more constrained, meaningful domain guided by solution residues, while still preserving the inherent multi-modality of the output probabilistic distributions. DISCO achieves state-of-the-art results on very large Traveling Salesman Problems with 10000 nodes and challenging Maximal Independent Set benchmarks, with its per-instance denoising time up to 44.8 times faster. Through further combining a divide-and-conquer strategy, DISCO can be generalized to solve arbitrary-scale problem instances off the shelf, even outperforming models trained specifically on corresponding scales.

Updated: 2024-07-19 07:10:24

标题: DISCO：大规模组合优化问题的高效扩散求解器

摘要: 组合优化（CO）问题在各个行业的实际应用中至关重要，其特点是包含巨大的解空间并需要及时响应。尽管最近神经求解器取得了显著进展，但它们的表达能力有限，不太适应CO景观的多模态特性。虽然一些研究已经转向扩散模型，但它们需要模拟具有许多步骤的马尔科夫链来生成样本，这是耗时的，并且不能满足实际应用的效率要求，尤其是在规模上。我们提出DISCO，一种高效的用于解决组合优化问题的扩散求解器，在解决质量和推理速度方面都表现出色。DISCO的有效性有两个方面：首先，通过可解析形式实现快速的解决方案去噪，允许从解空间直接采样，只需很少的反向时间步骤，从而大大减少推理时间。其次，DISCO通过将采样空间限制在更受约束的、有意义的域中，由解残留引导，同时仍保留输出概率分布的固有多模性，从而提高解决方案质量。DISCO在具有10000个节点的非常大的旅行推销员问题和具有挑战性的最大独立集基准测试中取得了最新的结果，每个实例的去噪时间高达44.8倍。通过进一步结合分而治之的策略，DISCO可以被泛化为解决现成的任意规模问题实例，甚至超越专门针对相应规模训练的模型。

更新时间: 2024-07-19 07:10:24

领域: cs.AI

下载: http://arxiv.org/abs/2406.19705v3

LoAS: Fully Temporal-Parallel Datatflow for Dual-Sparse Spiking Neural Networks

Spiking Neural Networks (SNNs) have gained significant research attention in the last decade due to their potential to drive resource-constrained edge devices. Though existing SNN accelerators offer high efficiency in processing sparse spikes with dense weights, opportunities are less explored in SNNs with sparse weights, i.e., dual-sparsity. In this work, we study the acceleration of dual-sparse SNNs, focusing on their core operation, sparse-matrix-sparse-matrix multiplication (spMspM). We observe that naively running a dual-sparse SNN on existing spMspM accelerators designed for dual-sparse Artificial Neural Networks (ANNs) exhibits sub-optimal efficiency. The main challenge is that processing timesteps, a natural property of SNNs, introduces an extra loop to ANN spMspM, leading to longer latency and more memory traffic. To address the problem, we propose a fully temporal-parallel (FTP) dataflow, which minimizes both data movement across timesteps and the end-to-end latency of dual-sparse SNNs. To maximize the efficiency of FTP dataflow, we propose an FTP-friendly spike compression mechanism that efficiently compresses single-bit spikes and ensures contiguous memory access. We further propose an FTP-friendly inner-join circuit that can lower the cost of the expensive prefix-sum circuits with almost no throughput penalty. All the above techniques for FTP dataflow are encapsulated in LoAS, a Low-latency inference Accelerator for dual-sparse SNNs. With FTP dataflow, compression, and inner-join, running dual-sparse SNN workloads on LoAS demonstrates significant speedup (up to $8.51\times$) and energy reduction (up to $3.68\times$) compared to running it on prior dual-sparse accelerators.

Updated: 2024-07-19 07:02:26

标题: LoAS：双稀疏脉冲神经网络的全时空并行数据流

摘要: 脉冲神经网络（SNNs）由于其驱动资源受限边缘设备的潜力而在过去十年中引起了重要的研究关注。尽管现有的SNN加速器在处理稀疏脉冲和稠密权重方面效率很高，但在稀疏权重的SNN中的机会较少，即双稀疏性。在这项工作中，我们研究了双稀疏SNN的加速，重点关注它们的核心运算，稀疏矩阵-稀疏矩阵乘法（spMspM）。我们观察到，在现有为双稀疏人工神经网络（ANNs）设计的spMspM加速器上朴素地运行双稀疏SNN会导致效率亚优。主要挑战在于处理时间步长，这是SNN的自然属性，会为ANN spMspM引入额外的循环，导致更长的延迟和更多的内存流量。为了解决这个问题，我们提出了一种完全时空并行（FTP）数据流，最大限度地减少跨时间步长的数据移动和双稀疏SNN的端到端延迟。为了最大化FTP数据流的效率，我们提出了一种FTP友好的脉冲压缩机制，有效压缩单个比特的脉冲并确保内存访问的连续性。我们进一步提出了一种FTP友好的内连接电路，可以降低昂贵的前缀和电路的成本，几乎没有吞吐量惩罚。所有上述FTP数据流的技术都封装在LoAS中，这是一种用于双稀疏SNN的低延迟推断加速器。通过FTP数据流、压缩和内连接，将双稀疏SNN工作负载在LoAS上运行，相比在之前的双稀疏加速器上运行，表现出显著的加速（高达8.51倍）和能量减少（高达3.68倍）。

更新时间: 2024-07-19 07:02:26

领域: cs.AR,cs.AI,cs.NE

下载: http://arxiv.org/abs/2407.14073v1

Causal foundations of bias, disparity and fairness

The study of biases, such as gender or racial biases, is an important topic in the social and behavioural sciences. However, the literature does not always clearly define the concept. Definitions of bias are often ambiguous or not provided at all. To study biases in a precise manner, it is important to have a well-defined concept of bias. We propose to define bias as a direct causal effect that is unjustified. We propose to define the closely related concept of disparity as a direct or indirect causal effect that includes a bias. Our proposed definitions can be used to study biases and disparities in a more rigorous and systematic way. We compare our definitions of bias and disparity with various criteria of fairness introduced in the artificial intelligence literature. In addition, we discuss how our definitions relate to discrimination. We illustrate our definitions of bias and disparity in two case studies, focusing on gender bias in science and racial bias in police shootings. Our proposed definitions aim to contribute to a better appreciation of the causal intricacies of studies of biases and disparities. We hope that this will also promote an improved understanding of the policy implications of such studies.

Updated: 2024-07-19 06:54:26

标题: 偏见、不平等和公平的因果基础

摘要: 对偏见的研究，如性别或种族偏见，是社会和行为科学中的重要话题。然而，文献并不总是清楚地定义这个概念。偏见的定义经常模糊不清或根本没有提供。为了以精确的方式研究偏见，有必要有一个明确定义的偏见概念。我们建议将偏见定义为一个不合理的直接因果效应。我们提议定义与之密切相关的不平等概念为包含偏见的直接或间接因果效应。我们的提议定义可以用来更严格和系统地研究偏见和不平等。我们将我们的偏见和不平等的定义与人工智能文献中引入的各种公平标准进行比较。此外，我们讨论了我们的定义如何与歧视相关。我们在两个案例研究中说明了我们对偏见和不平等的定义，重点关注科学中的性别偏见和警察枪击事件中的种族偏见。我们的提议定义旨在促进对偏见和不平等研究因果复杂性的更好理解。我们希望这也能促进对此类研究政策意义的更好理解。

更新时间: 2024-07-19 06:54:26

领域: cs.DL,cs.AI,stat.AP

下载: http://arxiv.org/abs/2207.13665v3

360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation

With the development of VR-related techniques, viewers can enjoy a realistic and immersive experience through a head-mounted display, while omnidirectional video with a low frame rate can lead to user dizziness. However, the prevailing plane frame interpolation methodologies are unsuitable for Omnidirectional Video Interpolation, chiefly due to the lack of models tailored to such videos with strong distortion, compounded by the scarcity of valuable datasets for Omnidirectional Video Frame Interpolation. In this paper, we introduce the benchmark dataset, 360VFI, for Omnidirectional Video Frame Interpolation. We present a practical implementation that introduces a distortion prior from omnidirectional video into the network to modulate distortions. We especially propose a pyramid distortion-sensitive feature extractor that uses the unique characteristics of equirectangular projection (ERP) format as prior information. Moreover, we devise a decoder that uses an affine transformation to facilitate the synthesis of intermediate frames further. 360VFI is the first dataset and benchmark that explores the challenge of Omnidirectional Video Frame Interpolation. Through our benchmark analysis, we presented four different distortion conditions scenes in the proposed 360VFI dataset to evaluate the challenge triggered by distortion during interpolation. Besides, experimental results demonstrate that Omnidirectional Video Interpolation can be effectively improved by modeling for omnidirectional distortion.

Updated: 2024-07-19 06:50:24

标题: 360VFI：一种用于全向视频帧插值的数据集和基准

摘要: 随着虚拟现实相关技术的发展，观众可以通过头戴式显示器享受逼真和沉浸式的体验，而具有低帧率的全向视频可能会导致用户头晕。然而，目前流行的平面帧插值方法不适用于全向视频插值，主要是因为缺乏针对具有强烈畸变的此类视频的模型，再加上缺乏有价值的全向视频帧插值数据集。在本文中，我们介绍了用于全向视频帧插值的基准数据集360VFI。我们提出了一个实用的实现，将来自全向视频的畸变先验引入网络以调节畸变。我们特别提出了一个金字塔畸变敏感特征提取器，利用等距投影（ERP）格式的独特特征作为先验信息。此外，我们设计了一个解码器，使用仿射变换来促进进一步合成中间帧。360VFI是探索全向视频帧插值挑战的第一个数据集和基准。通过我们的基准分析，我们在提出的360VFI数据集中呈现了四种不同的畸变条件场景，以评估插值过程中畸变引发的挑战。此外，实验结果表明，通过对全向畸变建模，全向视频插值可以得到有效改善。

更新时间: 2024-07-19 06:50:24

领域: cs.CV,cs.LG,cs.MM

下载: http://arxiv.org/abs/2407.14066v1

MSCT: Addressing Time-Varying Confounding with Marginal Structural Causal Transformer for Counterfactual Post-Crash Traffic Prediction

Traffic crashes profoundly impede traffic efficiency and pose economic challenges. Accurate prediction of post-crash traffic status provides essential information for evaluating traffic perturbations and developing effective solutions. Previous studies have established a series of deep learning models to predict post-crash traffic conditions, however, these correlation-based methods cannot accommodate the biases caused by time-varying confounders and the heterogeneous effects of crashes. The post-crash traffic prediction model needs to estimate the counterfactual traffic speed response to hypothetical crashes under various conditions, which demonstrates the necessity of understanding the causal relationship between traffic factors. Therefore, this paper presents the Marginal Structural Causal Transformer (MSCT), a novel deep learning model designed for counterfactual post-crash traffic prediction. To address the issue of time-varying confounding bias, MSCT incorporates a structure inspired by Marginal Structural Models and introduces a balanced loss function to facilitate learning of invariant causal features. The proposed model is treatment-aware, with a specific focus on comprehending and predicting traffic speed under hypothetical crash intervention strategies. In the absence of ground-truth data, a synthetic data generation procedure is proposed to emulate the causal mechanism between traffic speed, crashes, and covariates. The model is validated using both synthetic and real-world data, demonstrating that MSCT outperforms state-of-the-art models in multi-step-ahead prediction performance. This study also systematically analyzes the impact of time-varying confounding bias and dataset distribution on model performance, contributing valuable insights into counterfactual prediction for intelligent transportation systems.

Updated: 2024-07-19 06:42:41

标题: MSCT：利用边缘结构因果转换器解决时变混杂因素，用于事故后交通预测的反事实情景

摘要: 交通事故严重影响交通效率，并带来经济挑战。精确预测事故后交通状况为评估交通干扰和制定有效解决方案提供了必要信息。先前的研究建立了一系列深度学习模型来预测事故后的交通状况，然而，这些基于相关性的方法无法适应时间变化混淆因素和事故的异质效应带来的偏差。事故后交通预测模型需要估计在各种条件下对假设事故的反事实交通速度响应，这显示了理解交通因素之间因果关系的必要性。因此，本文提出了边缘结构因果转换器（MSCT），这是一种专为反事实事故后交通预测设计的新型深度学习模型。为解决时间变化混淆偏差问题，MSCT采用了受边缘结构模型启发的结构，并引入了平衡损失函数来促进学习不变因果特征。所提出的模型具有治疗感知能力，特别关注理解和预测在假设事故干预策略下的交通速度。在缺乏地面真实数据的情况下，提出了一种合成数据生成过程，来模拟交通速度、事故和协变量之间的因果机制。该模型使用合成和真实世界数据进行验证，表明MSCT在多步预测性能方面优于现有模型。本研究还系统分析了时间变化混淆偏差和数据集分布对模型性能的影响，为智能交通系统的反事实预测提供了有价值的见解。

更新时间: 2024-07-19 06:42:41

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2407.14065v1

On the Causal Sufficiency and Necessity of Multi-Modal Representation Learning

An effective paradigm of multi-modal learning (MML) is to learn unified representations among modalities. From a causal perspective, constraining the consistency between different modalities can mine causal representations that convey primary events. However, such simple consistency may face the risk of learning insufficient or unnecessary information: a necessary but insufficient cause is invariant across modalities but may not have the required accuracy; a sufficient but unnecessary cause tends to adapt well to specific modalities but may be hard to adapt to new data. To address this issue, in this paper, we aim to learn representations that are both causal sufficient and necessary, i.e., Causal Complete Cause ($C^3$), for MML. Firstly, we define the concept of $C^3$ for MML, which reflects the probability of being causal sufficiency and necessity. We also propose the identifiability and measurement of $C^3$, i.e., $C^3$ risk, to ensure calculating the learned representations' $C^3$ scores in practice. Then, we theoretically prove the effectiveness of $C^3$ risk by establishing the performance guarantee of MML with a tight generalization bound. Based on these theoretical results, we propose a plug-and-play method, namely Causal Complete Cause Regularization ($C^3$R), to learn causal complete representations by constraining the $C^3$ risk bound. Extensive experiments conducted on various benchmark datasets empirically demonstrate the effectiveness of $C^3$R.

Updated: 2024-07-19 06:35:49

标题: 多模态表示学习的因果充分性和必要性

摘要: 多模态学习（MML）的有效范式是学习在不同模态之间统一的表示。从因果关系的角度来看，约束不同模态之间的一致性可以挖掘传达主要事件的因果表示。然而，这种简单的一致性可能面临学习不足或不必要信息的风险：一个必要但不足的原因在不同模态之间是不变的，但可能缺乏所需的准确性；一个充分但不必要的原因倾向于适应特定的模态，但可能难以适应新数据。为了解决这个问题，在本文中，我们的目标是学习既足够又必要的表示，即MML的因果完全原因（$C^3$）。首先，我们为MML定义了$C^3$的概念，反映了因果充分性和必要性的概率。我们还提出了$C^3$的可识别性和测量，即$C^3$风险，以确保在实践中计算学习表示的$C^3$分数。然后，我们通过建立MML的紧密泛化界限来理论上证明了$C^3$风险的有效性。基于这些理论结果，我们提出了一种即插即用的方法，即因果完全原因正则化（$C^3$R），通过约束$C^3$风险界限来学习因果完整表示。在各种基准数据集上进行的大量实验从经验上证明了$C^3$R的有效性。

更新时间: 2024-07-19 06:35:49

领域: cs.LG

下载: http://arxiv.org/abs/2407.14058v1

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a decoding stage to generate subsequent tokens. For long prompts, the KV cache must be computed for all tokens during the prefilling stage, which can significantly increase the time needed to generate the first token. Consequently, the prefilling stage may become a bottleneck in the generation process. An open question remains whether all prompt tokens are essential for generating the first token. To answer this, we introduce a novel method, LazyLLM, that selectively computes the KV for tokens important for the next token prediction in both the prefilling and decoding stages. Contrary to static pruning approaches that prune the prompt at once, LazyLLM allows language models to dynamically select different subsets of tokens from the context in different generation steps, even though they might be pruned in previous steps. Extensive experiments on standard datasets across various tasks demonstrate that LazyLLM is a generic method that can be seamlessly integrated with existing language models to significantly accelerate the generation without fine-tuning. For instance, in the multi-document question-answering task, LazyLLM accelerates the prefilling stage of the LLama 2 7B model by 2.34x while maintaining accuracy.

Updated: 2024-07-19 06:34:45

标题: LazyLLM：用于高效长上下文LLM推理的动态标记修剪

摘要: transformer-based大型语言模型的推理包括两个连续阶段：1）一个预填充阶段用于计算提示的KV缓存并生成第一个令牌，2）一个解码阶段用于生成后续令牌。对于长提示，KV缓存必须在预填充阶段计算所有令牌，这可能显著增加生成第一个令牌所需的时间。因此，预填充阶段可能成为生成过程中的瓶颈。一个未解决的问题是是否所有提示令牌对生成第一个令牌都是必不可少的。为了回答这个问题，我们引入了一种新方法LazyLLM，它在预填充和解码阶段中选择性地计算对下一个令牌预测重要的令牌的KV。与一次性剪枝提示的静态剪枝方法相反，LazyLLM允许语言模型在不同的生成步骤中动态选择来自上下文的不同子集的令牌，即使它们在之前的步骤中可能已被剪枝。对各种任务的标准数据集进行的大量实验表明，LazyLLM是一种通用方法，可以与现有语言模型无缝集成，显著加快生成速度而无需微调。例如，在多文档问答任务中，LazyLLM通过2.34倍加速了LLama 27B模型的预填充阶段，同时保持准确性。

更新时间: 2024-07-19 06:34:45

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.14057v1

Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings

We release Rasa, the first multilingual expressive TTS dataset for any Indian language, which contains 10 hours of neutral speech and 1-3 hours of expressive speech for each of the 6 Ekman emotions covering 3 languages: Assamese, Bengali, & Tamil. Our ablation studies reveal that just 1 hour of neutral and 30 minutes of expressive data can yield a Fair system as indicated by MUSHRA scores. Increasing neutral data to 10 hours, with minimal expressive data, significantly enhances expressiveness. This offers a practical recipe for resource-constrained languages, prioritizing easily obtainable neutral data alongside smaller amounts of expressive data. We show the importance of syllabically balanced data and pooling emotions to enhance expressiveness. We also highlight challenges in generating specific emotions, e.g., fear and surprise.

Updated: 2024-07-19 06:33:10

标题: Rasa: 在低资源环境中为印度语言构建富有表现力的语音合成系统

摘要: 我们发布了Rasa，这是首个适用于任何印度语言的多语言表达性TTS数据集，其中包含每种6种Ekman情绪的中性语音10小时和1-3小时表达性语音，涵盖3种语言：阿萨姆语、孟加拉语和泰米尔语。我们的消融研究表明，仅1小时的中性数据和30分钟的表达性数据可以产生符合MUSHRA分数的公平系统。将中性数据增加到10小时，同时保持最少的表达性数据，显著增强了表现力。这为资源受限的语言提供了实用的方法，优先考虑易获得的中性数据以及较少的表达性数据。我们展示了音节平衡数据和汇集情绪以增强表现力的重要性。我们还强调了在产生特定情绪（如恐惧和惊讶）方面的挑战。

更新时间: 2024-07-19 06:33:10

领域: cs.CL,cs.LG,cs.SD,eess.AS

下载: http://arxiv.org/abs/2407.14056v1

Quantum Hamiltonian Embedding of Images for Data Reuploading Classifiers

When applying quantum computing to machine learning tasks, one of the first considerations is the design of the quantum machine learning model itself. Conventionally, the design of quantum machine learning algorithms relies on the ``quantisation" of classical learning algorithms, such as using quantum linear algebra to implement important subroutines of classical algorithms, if not the entire algorithm, seeking to achieve quantum advantage through possible run-time accelerations brought by quantum computing. However, recent research has started questioning whether quantum advantage via speedup is the right goal for quantum machine learning [1]. Research also has been undertaken to exploit properties that are unique to quantum systems, such as quantum contextuality, to better design quantum machine learning models [2]. In this paper, we take an alternative approach by incorporating the heuristics and empirical evidences from the design of classical deep learning algorithms to the design of quantum neural networks. We first construct a model based on the data reuploading circuit [3] with the quantum Hamiltonian data embedding unitary [4]. Through numerical experiments on images datasets, including the famous MNIST and FashionMNIST datasets, we demonstrate that our model outperforms the quantum convolutional neural network (QCNN)[5] by a large margin (up to over 40% on MNIST test set). Based on the model design process and numerical results, we then laid out six principles for designing quantum machine learning models, especially quantum neural networks.

Updated: 2024-07-19 06:31:22

标题: 图像的量子哈密顿嵌入用于数据重新上传分类器

摘要: 将量子计算应用于机器学习任务时，首要考虑的是量子机器学习模型的设计。传统上，量子机器学习算法的设计依赖于对经典学习算法进行“量子化”，例如使用量子线性代数来实现经典算法的重要子程序，甚至整个算法，以通过量子计算带来的可能的运行加速度来实现量子优势。然而，最近的研究开始质疑通过加速来实现量子优势是否是量子机器学习的正确目标。研究还着手利用量子系统独特的性质，如量子上下文性，来更好地设计量子机器学习模型。在本文中，我们采用了一种替代方法，将经典深度学习算法的设计启发和经验证据引入到量子神经网络的设计中。我们首先构建了一个基于数据重新上传电路和量子哈密顿数据嵌入酉操作的模型。通过对图像数据集（包括著名的MNIST和FashionMNIST数据集）的数值实验，我们证明了我们的模型在性能上大大优于量子卷积神经网络（QCNN）（在MNIST测试集上最多超过40%）。基于模型设计过程和数值结果，我们提出了设计量子机器学习模型（尤其是量子神经网络）的六项原则。

更新时间: 2024-07-19 06:31:22

领域: quant-ph,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.14055v1

Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models

This study explores the realm of knowledge base question answering (KBQA). KBQA is considered a challenging task, particularly in parsing intricate questions into executable logical forms. Traditional semantic parsing (SP)-based methods require extensive data annotations, which result in significant costs. Recently, the advent of few-shot in-context learning, powered by large language models (LLMs), has showcased promising capabilities. However, fully leveraging LLMs to parse questions into logical forms in low-resource scenarios poses a substantial challenge. To tackle these hurdles, we introduce Interactive-KBQA, a framework designed to generate logical forms through direct interaction with knowledge bases (KBs). Within this framework, we have developed three generic APIs for KB interaction. For each category of complex question, we devised exemplars to guide LLMs through the reasoning processes. Our method achieves competitive results on the WebQuestionsSP, ComplexWebQuestions, KQA Pro, and MetaQA datasets with a minimal number of examples (shots). Importantly, our approach supports manual intervention, allowing for the iterative refinement of LLM outputs. By annotating a dataset with step-wise reasoning processes, we showcase our model's adaptability and highlight its potential for contributing significant enhancements to the field.

Updated: 2024-07-19 06:14:20

标题: Interactive-KBQA：大型语言模型的知识库问答多轮交互

摘要: 这项研究探讨了知识库问答（KBQA）的领域。KBQA被认为是一项具有挑战性的任务，特别是在将复杂问题解析为可执行的逻辑形式方面。传统的基于语义解析（SP）的方法需要大量的数据注释，这导致了显著的成本。最近，由大型语言模型（LLMs）驱动的少样本上下文学习的出现展示了有希望的能力。然而，在低资源情境下充分利用LLMs将问题解析为逻辑形式面临着重大挑战。为了克服这些障碍，我们引入了交互式知识库问答（Interactive-KBQA），这是一个旨在通过直接与知识库（KBs）互动生成逻辑形式的框架。在这个框架内，我们开发了三个用于知识库交互的通用API。针对每一类复杂问题，我们设计了示例来指导LLMs进行推理过程。我们的方法在WebQuestionsSP、ComplexWebQuestions、KQA Pro和MetaQA数据集上以最少数量的示例（shots）取得了竞争性的结果。重要的是，我们的方法支持手动干预，允许对LLM输出进行迭代优化。通过对一个数据集进行逐步推理过程的注释，我们展示了我们模型的适应性，并突出了其为领域做出显著改进的潜力。

更新时间: 2024-07-19 06:14:20

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2402.15131v2

SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy

Text-to-SQL conversion is a critical innovation, simplifying the transition from complex SQL to intuitive natural language queries, especially significant given SQL's prevalence in the job market across various roles. The rise of Large Language Models (LLMs) like GPT-3.5 and GPT-4 has greatly advanced this field, offering improved natural language understanding and the ability to generate nuanced SQL statements. However, the potential of open-source LLMs in Text-to-SQL applications remains underexplored, with many frameworks failing to leverage their full capabilities, particularly in handling complex database queries and incorporating feedback for iterative refinement. Addressing these limitations, this paper introduces SQLfuse, a robust system integrating open-source LLMs with a suite of tools to enhance Text-to-SQL translation's accuracy and usability. SQLfuse features four modules: schema mining, schema linking, SQL generation, and a SQL critic module, to not only generate but also continuously enhance SQL query quality. Demonstrated by its leading performance on the Spider Leaderboard and deployment by Ant Group, SQLfuse showcases the practical merits of open-source LLMs in diverse business contexts.

Updated: 2024-07-19 06:01:57

标题: SQLfuse：通过综合LLM协同提高文本到SQL性能

摘要: 文本到SQL转换是一项关键的创新，简化了从复杂的SQL到直观的自然语言查询的过渡，尤其是考虑到SQL在各种角色的就业市场中的普遍存在。像GPT-3.5和GPT-4这样的大型语言模型(LLMs)的兴起极大推动了这一领域的发展，提供了改进的自然语言理解能力和生成细致SQL语句的能力。然而，在文本到SQL应用中开源LLMs的潜力尚未得到充分挖掘，许多框架未能充分利用它们的全部功能，特别是在处理复杂数据库查询和融入反馈以进行迭代改进方面。为了解决这些限制，本文介绍了SQLfuse，一个强大的系统，将开源LLMs与一套工具集成在一起，以提高文本到SQL转换的准确性和可用性。SQLfuse具有四个模块：模式挖掘、模式链接、SQL生成和SQL评论模块，不仅可以生成，还可以不断提升SQL查询的质量。通过在Spider Leaderboard上的领先表现和被蚂蚁集团部署，SQLfuse展示了开源LLMs在多样化商业环境中的实际优点。

更新时间: 2024-07-19 06:01:57

领域: cs.CL,cs.AI,cs.DB

下载: http://arxiv.org/abs/2407.14568v1

Towards A Post-Quantum Cryptography in Blockchain I: Basic Review on Theoretical Cryptography and Quantum Information Theory

Recently, the invention of quantum computers was so revolutionary that they bring transformative challenges in a variety of fields, especially for the traditional cryptographic blockchain, and it may become a real thread for most of the cryptocurrencies in the market. That is, it becomes inevitable to consider to implement a post-quantum cryptography, which is also referred to as quantum-resistant cryptography, for attaining quantum resistance in blockchains.

Updated: 2024-07-19 05:59:21

标题: 走向区块链中的后量子密码学 I：理论密码学和量子信息理论基础综述

摘要: 最近，量子计算机的发明是如此革命性，它们为各个领域带来了变革性的挑战，特别是传统的加密区块链，它可能成为市场上大多数加密货币的真正威胁。也就是说，考虑实施后量子密码学是不可避免的，这也被称为量子抗性密码学，以实现区块链的量子抗性。

更新时间: 2024-07-19 05:59:21

领域: cs.CR,q-fin.CP

下载: http://arxiv.org/abs/2407.18966v1

OCTrack: Benchmarking the Open-Corpus Multi-Object Tracking

We study a novel yet practical problem of open-corpus multi-object tracking (OCMOT), which extends the MOT into localizing, associating, and recognizing generic-category objects of both seen (base) and unseen (novel) classes, but without the category text list as prompt. To study this problem, the top priority is to build a benchmark. In this work, we build OCTrackB, a large-scale and comprehensive benchmark, to provide a standard evaluation platform for the OCMOT problem. Compared to previous datasets, OCTrackB has more abundant and balanced base/novel classes and the corresponding samples for evaluation with less bias. We also propose a new multi-granularity recognition metric to better evaluate the generative object recognition in OCMOT. By conducting the extensive benchmark evaluation, we report and analyze the results of various state-of-the-art methods, which demonstrate the rationale of OCMOT, as well as the usefulness and advantages of OCTrackB.

Updated: 2024-07-19 05:58:01

标题: OCTrack：基于开放语料库的多目标跟踪性能评估Benchmark

摘要: 我们研究了一个新颖但实际的问题，即开放语料库多目标跟踪（OCMOT），它将MOT扩展为定位、关联和识别已知（基本）和未知（新颖）类别的通用类别对象，但没有类别文本列表作为提示。为了研究这个问题，首要任务是建立一个基准。在这项工作中，我们建立了OCTrackB，一个大规模且全面的基准，为OCMOT问题提供了一个标准评估平台。与之前的数据集相比，OCTrackB具有更丰富和平衡的基本/新颖类别及相应的样本，评估时具有更少的偏见。我们还提出了一种新的多粒度识别度量标准，以更好地评估OCMOT中的生成对象识别。通过进行广泛的基准评估，我们报告和分析了各种最先进方法的结果，证明了OCMOT的合理性，以及OCTrackB的实用性和优势。

更新时间: 2024-07-19 05:58:01

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.14047v1

Energy-Efficient Channel Decoding for Wireless Federated Learning: Convergence Analysis and Adaptive Design

One of the most critical challenges for deploying distributed learning solutions, such as federated learning (FL), in wireless networks is the limited battery capacity of mobile clients. While it is a common belief that the major energy consumption of mobile clients comes from the uplink data transmission, this paper presents a novel finding, namely the channel decoding operation also contributes significantly to the overall energy consumption of mobile clients in FL. Motivated by this new observation, we propose an energy-efficient adaptive channel decoding scheme that leverages the intrinsic robustness of FL to model errors. In particular, the robustness is exploited to reduce the energy consumption of channel decoders at mobile clients by adaptively adjusting the number of decoding iterations. We theoretically prove that wireless FL with communication errors can converge at the same rate as the case with error-free communication as long as the bit error rate (BER) is properly constrained. An adaptive channel decoding scheme is then proposed to improve the energy efficiency of wireless FL systems. Experimental results demonstrate that the proposed method maintains the same learning accuracy while reducing the channel decoding energy consumption by 20% when compared to existing approaches.

Updated: 2024-07-19 05:57:16

标题: 无线联邦学习的节能信道解码：收敛分析和自适应设计

摘要: 在无线网络中部署分布式学习解决方案，如联邦学习（FL），最关键的挑战之一是移动客户端的有限电池容量。虽然普遍认为移动客户端的主要能耗来自上行数据传输，但本文提出了一个新的发现，即信道解码操作也对FL中移动客户端的整体能耗有显著贡献。受到这一新观察的启发，我们提出了一种能效自适应信道解码方案，利用FL的固有鲁棒性来建模错误。具体来说，利用鲁棒性来通过自适应调整解码迭代次数来降低移动客户端的信道解码能耗。我们在理论上证明，具有通信错误的无线FL可以以与无错误通信的情况相同的速率收敛，只要比特错误率（BER）得到适当限制。然后提出了一种自适应信道解码方案来提高无线FL系统的能效。实验结果表明，与现有方法相比，所提出的方法在保持相同学习准确性的同时，将信道解码能耗降低了20%。

更新时间: 2024-07-19 05:57:16

领域: cs.IT,cs.LG,eess.SP,math.IT

下载: http://arxiv.org/abs/2407.13703v2

ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?

Although large language models (LLMs) have been largely successful in generating functionally correct programs, conditioning models to produce efficient solutions while ensuring correctness remains a challenge. Further, unreliability in benchmarking code efficiency is a hurdle across varying hardware specifications for popular interpreted languages such as Python. In this paper, we present ECCO, a reproducible benchmark for evaluating program efficiency via two paradigms: natural language (NL) based code generation and history-based code editing. On ECCO, we adapt and thoroughly investigate the three most promising existing LLM-based approaches: in-context learning, iterative refinement with execution or NL feedback, and fine-tuning conditioned on execution and editing history. While most methods degrade functional correctness and moderately increase program efficiency, we find that adding execution information often helps maintain functional correctness, and NL feedback enhances more on efficiency. We release our benchmark to support future work on LLM-based generation of efficient code.

Updated: 2024-07-19 05:47:40

标题: ECCO：我们可以在不牺牲功能正确性的情况下提高模型生成的代码效率吗？

摘要: 尽管大型语言模型(LLMs)在生成功能正确的程序方面取得了很大成功，但是在确保正确性的同时，对模型进行条件化以生成高效解决方案仍然是一个挑战。此外，对于流行的解释性语言（如Python），在不同硬件规格下对代码效率进行基准测试的不可靠性是一个障碍。在本文中，我们提出了ECCO，这是一个可重现的基准测试，用于通过两种范式评估程序效率：基于自然语言（NL）的代码生成和基于历史的代码编辑。在ECCO上，我们调整并深入研究了三种最有前景的基于LLM的方法：上下文学习，迭代改进与执行或NL反馈，以及基于执行和编辑历史的微调。尽管大多数方法会降低功能正确性并适度提高程序效率，但我们发现添加执行信息通常有助于保持功能正确性，而NL反馈更有利于提高效率。我们发布了我们的基准测试，以支持未来关于基于LLM生成高效代码的工作。

更新时间: 2024-07-19 05:47:40

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.14044v1

Generative Language Model for Catalyst Discovery

Discovery of novel and promising materials is a critical challenge in the field of chemistry and material science, traditionally approached through methodologies ranging from trial-and-error to machine learning-driven inverse design. Recent studies suggest that transformer-based language models can be utilized as material generative models to expand chemical space and explore materials with desired properties. In this work, we introduce the Catalyst Generative Pretrained Transformer (CatGPT), trained to generate string representations of inorganic catalyst structures from a vast chemical space. CatGPT not only demonstrates high performance in generating valid and accurate catalyst structures but also serves as a foundation model for generating desired types of catalysts by fine-tuning with sparse and specified datasets. As an example, we fine-tuned the pretrained CatGPT using a binary alloy catalyst dataset designed for screening two-electron oxygen reduction reaction (2e-ORR) catalyst and generate catalyst structures specialized for 2e-ORR. Our work demonstrates the potential of language models as generative tools for catalyst discovery.

Updated: 2024-07-19 05:34:08

标题: 催化剂发现的生成式语言模型

摘要: 在化学和材料科学领域，发现新颖且有前景的材料是一个关键挑战，传统上通过从试错到机器学习驱动的逆向设计等方法论来解决。最近的研究表明，基于transformer的语言模型可以作为材料生成模型，扩展化学空间并探索具有所需性质的材料。在这项工作中，我们介绍了Catalyst Generative Pretrained Transformer（CatGPT），经过训练能够从广阔的化学空间中生成无机催化剂结构的字符串表示。CatGPT不仅在生成有效和准确的催化剂结构方面表现出高性能，而且还作为一个基础模型，通过与稀疏和指定的数据集微调，生成所需类型的催化剂。作为一个例子，我们使用设计用于筛选双电子氧还原反应（2e-ORR）催化剂的二元合金数据集，对预训练的CatGPT进行微调，生成专门用于2e-ORR的催化剂结构。我们的工作展示了语言模型作为催化剂发现的生成工具的潜力。

更新时间: 2024-07-19 05:34:08

领域: cs.LG

下载: http://arxiv.org/abs/2407.14040v1

BERTer: The Efficient One

We explore advanced fine-tuning techniques to boost BERT's performance in sentiment analysis, paraphrase detection, and semantic textual similarity. Our approach leverages SMART regularization to combat overfitting, improves hyperparameter choices, employs a cross-embedding Siamese architecture for improved sentence embeddings, and introduces innovative early exiting methods. Our fine-tuning findings currently reveal substantial improvements in model efficiency and effectiveness when combining multiple fine-tuning architectures, achieving a state-of-the-art performance score of on the test set, surpassing current benchmarks and highlighting BERT's adaptability in multifaceted linguistic tasks.

Updated: 2024-07-19 05:33:09

标题: BERTer：高效率的模型

摘要: 我们探索了先进的微调技术，以提高BERT在情感分析、释义检测和语义文本相似性方面的性能。我们的方法利用SMART正则化来对抗过拟合，改进超参数选择，采用交叉嵌入的孪生架构来改进句子嵌入，并引入创新的早期退出方法。我们的微调结果目前显示，在结合多种微调架构时，模型的效率和有效性都有显著提高，在测试集上取得了最先进的性能得分，超过了当前的基准，并突显了BERT在多方面语言任务中的适应性。

更新时间: 2024-07-19 05:33:09

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.14039v1

An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs

The LLaMA family has become one of the most powerful open-source Large Language Models (LLMs) and the popular LLM backbones of Multimodal Large Language Models (MLLMs), widely applied in Computer Vision (CV) and Natural Language Understanding (NLU) tasks. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-limited scenarios, we explore LLaMA3's capabilities when quantized to low bit-width. This exploration can potentially unveil new insights and challenges for low-bit quantization of LLaMA3 and other forthcoming LLMs, especially in addressing performance degradation problems that suffer in LLM compression. Specifically, we comprehensively evaluate the 10 existing post-training quantization and LoRA-finetuning methods of LLaMA3 on 1-8 bits and diverse datasets to reveal LLaMA3's low-bit quantization performance. To uncover the capabilities of low-bit quantized MLLM, we assessed the performance of the LLaMA3-based LLaVA-Next-8B model under 2-4 ultra-low bits with post-training quantization methods. Our experimental results indicate that LLaMA3 still suffers non-negligent degradation in linguistic and visual contexts, particularly under ultra-low bit widths. This highlights the significant performance gap under low bit-width that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, driving LLMs and MLLMs to achieve higher accuracy at lower bit to enhance practicality.

Updated: 2024-07-19 05:31:53

标题: LLaMA3量化的实证研究：从LLMs到MLLMs

摘要: LLaMA家族已经成为最强大的开源大型语言模型(LLMs)和流行的多模态大型语言模型(MLLMs)的LLM骨干，在计算机视觉(CV)和自然语言理解(NLU)任务中广泛应用。值得注意的是，LLaMA3模型最近发布并在超过15T令牌的数据上进行了超大规模预训练，取得了令人印象深刻的性能。鉴于在资源有限的情况下对LLMs进行低比特量化的广泛应用，我们探索了将LLaMA3量化为低比特宽度时的能力。这种探索可能揭示LLaMA3和其他即将推出的LLMs低比特量化的新见解和挑战，特别是在解决LLM压缩中遇到的性能降级问题方面。具体而言，我们全面评估了LLaMA3的10种现有后训练量化和LoRA微调方法在1-8位和不同数据集上的表现，以揭示LLaMA3的低比特量化性能。为了揭示低比特量化MLLM的能力，我们评估了基于LLaMA3的LLaVA-Next-8B模型在2-4个超低位上的性能，采用后训练量化方法。我们的实验结果表明，LLaMA3在语言和视觉上仍然存在着非可忽略的退化，特别是在超低比特宽度下。这突显了未来发展中需要弥合的低比特宽度下的显著性能差距。我们期望这项经验研究将有助于推进未来模型的发展，推动LLMs和MLLMs在较低比特率下实现更高的准确性，以增强实用性。

更新时间: 2024-07-19 05:31:53

领域: cs.LG

下载: http://arxiv.org/abs/2404.14047v2

Operating System And Artificial Intelligence: A Systematic Review

In the dynamic landscape of technology, the convergence of Artificial Intelligence (AI) and Operating Systems (OS) has emerged as a pivotal arena for innovation. Our exploration focuses on the symbiotic relationship between AI and OS, emphasizing how AI-driven tools enhance OS performance, security, and efficiency, while OS advancements facilitate more sophisticated AI applications. We delve into various AI techniques employed to optimize OS functionalities, including memory management, process scheduling, and intrusion detection. Simultaneously, we analyze the role of OS in providing essential services and infrastructure that enable effective AI application execution, from resource allocation to data processing. The article also addresses challenges and future directions in this domain, emphasizing the imperative of secure and efficient AI integration within OS frameworks. By examining case studies and recent developments, our review provides a comprehensive overview of the current state of AI-OS integration, underscoring its significance in shaping the next generation of computing technologies. Finally, we explore the promising prospects of Intelligent OSes, considering not only how innovative OS architectures will pave the way for groundbreaking opportunities but also how AI will significantly contribute to advancing these next-generation OSs.

Updated: 2024-07-19 05:29:34

标题: 操作系统与人工智能：系统性综述

摘要: 在技术领域的动态景观中，人工智能（AI）和操作系统（OS）的融合已经成为创新的关键领域。我们的探索集中在AI和OS之间的共生关系上，强调AI驱动工具如何提高OS的性能、安全性和效率，同时OS的进步促进了更复杂的AI应用。我们深入研究了各种AI技术，用于优化OS功能，包括内存管理、进程调度和入侵检测。同时，我们分析了OS在提供基本服务和基础设施方面的作用，从资源分配到数据处理，使有效的AI应用执行成为可能。本文还讨论了该领域的挑战和未来方向，强调了在OS框架中安全高效地整合AI的必要性。通过案例研究和最新进展的审视，我们的评论提供了AI-OS集成当前状态的全面概述，强调了它在塑造下一代计算技术中的重要性。最后，我们探讨了智能操作系统的前景，考虑到创新的OS架构将为开创性机会铺平道路，同时AI将显著促进这些下一代OS的发展。

更新时间: 2024-07-19 05:29:34

领域: cs.OS,cs.AI

下载: http://arxiv.org/abs/2407.14567v1

Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View

Cellular nuclei recognition serves as a fundamental and essential step in the workflow of digital pathology. However, with disparate source organs and staining procedures among histology image clusters, the scanned tiles inherently conform to a non-uniform data distribution, which induces deteriorated promises for general cross-cohort usages. Despite the latest efforts leveraging domain adaptation to mitigate distributional discrepancy, those methods are subjected to modeling the morphological characteristics of each cell individually, disregarding the hierarchical latent structure and intrinsic contextual correspondences across the tumor micro-environment. In this work, we identify the importance of implicit correspondences across biological contexts for exploiting domain-invariant pathological composition and thereby propose to exploit the dependence over various biological structures for domain adaptive cellular recognition. We discover those high-level correspondences via unsupervised contextual modeling and use them as bridges to facilitate adaptation over diverse organs and stains. In addition, to further exploit the rich spatial contexts embedded amongst nuclear communities, we propose self-adaptive dynamic distillation to secure instance-aware trade-offs across different model constituents. The proposed method is extensively evaluated on a broad spectrum of cross-domain settings under miscellaneous data distribution shifts and outperforms the state-of-the-art methods by a substantial margin. Code is available at https://github.com/camwew/CellularRecognition_DA_CC.

Updated: 2024-07-19 05:26:06

标题: 重新审视领域变化下的自适应细胞识别：一个情境对应视角

摘要: 细胞核识别在数字病理学工作流程中起着基础和关键作用。然而，由于组织学图像集群中存在不同来源器官和染色程序，扫描的图块本质上符合不均匀的数据分布，这导致了对一般交叉队列用途的承诺恶化。尽管最新的努力利用域适应来减轻分布差异，但这些方法往往是对每个细胞的形态特征进行建模，而忽略了在肿瘤微环境中跨层次潜在结构和内在上下文对应关系。在这项工作中，我们确定了跨生物学背景之间的隐式对应的重要性，以利用域不变的病理组成，并因此提出利用各种生物结构之间的依赖性来适应细胞识别。我们通过无监督的上下文建模发现这些高级对应关系，并将它们用作促进在不同器官和染料上的适应的桥梁。此外，为了进一步利用核社区之间嵌入的丰富空间上下文，我们提出了自适应动态提炼，以确保跨不同模型成分的实例感知权衡。所提出的方法在广泛的跨域设置下进行了深入评估，超越了最先进的方法。代码可在https://github.com/camwew/CellularRecognition_DA_CC获取。

更新时间: 2024-07-19 05:26:06

领域: q-bio.QM,cs.LG,eess.IV

下载: http://arxiv.org/abs/2407.12870v2

Cross-Task Data Augmentation by Pseudo-label Generation for Region Based Coronary Artery Instance Segmentation

Coronary Artery Diseases (CADs) although preventable, are one of the leading causes of death and disability. Diagnosis of these diseases is often difficult and resource intensive. Angiographic imaging segmentation of the arteries has evolved as a tool of assistance that helps clinicians make an accurate diagnosis. However, due to the limited amount of data and the difficulty in curating a dataset, the task of segmentation has proven challenging. In this study, we introduce the use of pseudo-labels to address the issue of limited data in the angiographic dataset to enhance the performance of the baseline YOLO model. Unlike existing data augmentation techniques that improve the model constrained to a fixed dataset, we introduce the use of pseudo-labels generated on a dataset of separate related task to diversify and improve model performance. This method increases the baseline F1 score by 9% in the validation data set and by 3% in the test data set.

Updated: 2024-07-19 05:23:28

标题: 基于区域的冠状动脉实例分割的伪标签生成的跨任务数据增强

摘要: 虽然冠状动脉疾病（CADs）是可预防的，但却是导致死亡和残疾的主要原因之一。这些疾病的诊断通常很困难且需要大量资源。动脉的血管造影成像分割已经发展成为一种辅助工具，帮助临床医生做出准确诊断。然而，由于数据量有限且数据集的筛选困难，分割任务一直很具挑战性。在这项研究中，我们介绍了使用伪标签来解决血管造影数据集中数据有限的问题，以提高基线YOLO模型的性能。与现有的数据增强技术不同，这些技术仅改善固定数据集的模型，我们引入了在一个相关任务的数据集上生成的伪标签的使用，以增加和改善模型性能。这种方法在验证数据集中将基线F1得分提高了9％，在测试数据集中提高了3％。

更新时间: 2024-07-19 05:23:28

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.05990v2

Transforming and Combining Rewards for Aligning Large Language Models

A common approach for aligning language models to human preferences is to first learn a reward model from preference data, and then use this reward model to update the language model. We study two closely related problems that arise in this approach. First, any monotone transformation of the reward model preserves preference ranking; is there a choice that is ``better'' than others? Second, we often wish to align language models to multiple properties: how should we combine multiple reward models? Using a probabilistic interpretation of the alignment procedure, we identify a natural choice for transformation for (the common case of) rewards learned from Bradley-Terry preference models. The derived transformation is straightforward: we apply a log-sigmoid function to the centered rewards, a method we term ``LSC-transformation'' (log-sigmoid-centered transformation). This transformation has two important properties. First, it emphasizes improving poorly-performing outputs, rather than outputs that already score well. This mitigates both underfitting (where some prompts are not improved) and reward hacking (where the model learns to exploit misspecification of the reward model). Second, it enables principled aggregation of rewards by linking summation to logical conjunction: the sum of transformed rewards corresponds to the probability that the output is ``good'' in all measured properties, in a sense we make precise. Experiments aligning language models to be both helpful and harmless using RLHF show substantial improvements over the baseline (non-transformed) approach.

Updated: 2024-07-19 05:12:15

标题: 将大型语言模型的奖励转化和组合以实现对齐

摘要: 一种将语言模型与人类偏好对齐的常见方法是首先从偏好数据中学习奖励模型，然后使用此奖励模型来更新语言模型。我们研究了在这种方法中出现的两个密切相关的问题。首先，奖励模型的任何单调变换都会保持偏好排序；是否有一种选择比其他选择更好？其次，我们经常希望将语言模型与多个属性对齐：我们应该如何组合多个奖励模型？通过对齐过程的概率解释，我们确定了一个自然的转换选择，适用于从布拉德利-特里偏好模型学习奖励的常见情况。推导出的转换方法很简单：我们将对中心化奖励应用对数sigmoid函数，这种方法我们称之为“LSC转换”（对数sigmoid中心化转换）。这种转换具有两个重要特性。首先，它强调改善表现不佳的输出，而不是已经得分良好的输出。这有助于减轻欠拟合（其中一些提示没有得到改善）和奖励欺骗（模型学会利用奖励模型的误差描述）。其次，它通过将求和与逻辑连接相关联，使奖励的聚合变得有原则：转换后的奖励之和对应于输出在所有已测量属性上均“优秀”的概率，我们将其明确化。使用RLHF将语言模型对齐为既有帮助又无害的实验显示，相较于基线（未转换）方法，取得了显著的改进。

更新时间: 2024-07-19 05:12:15

领域: cs.CL,cs.AI,68T50,I.2

下载: http://arxiv.org/abs/2402.00742v2

HeCiX: Integrating Knowledge Graphs and Large Language Models for Biomedical Research

Despite advancements in drug development strategies, 90% of clinical trials fail. This suggests overlooked aspects in target validation and drug optimization. In order to address this, we introduce HeCiX-KG, Hetionet-Clinicaltrials neXus Knowledge Graph, a novel fusion of data from ClinicalTrials.gov and Hetionet in a single knowledge graph. HeCiX-KG combines data on previously conducted clinical trials from ClinicalTrials.gov, and domain expertise on diseases and genes from Hetionet. This offers a thorough resource for clinical researchers. Further, we introduce HeCiX, a system that uses LangChain to integrate HeCiX-KG with GPT-4, and increase its usability. HeCiX shows high performance during evaluation against a range of clinically relevant issues, proving this model to be promising for enhancing the effectiveness of clinical research. Thus, this approach provides a more holistic view of clinical trials and existing biological data.

Updated: 2024-07-19 05:04:24

标题: HeCiX：整合知识图谱和大型语言模型用于生物医学研究

摘要: 尽管药物开发策略取得了进展，但90%的临床试验失败。这表明目标验证和药物优化中存在被忽视的方面。为了解决这个问题，我们引入了HeCiX-KG，即Hetionet-Clinicaltrials neXus知识图，这是ClinicalTrials.gov和Hetionet数据的一个新型融合。HeCiX-KG将来自ClinicalTrials.gov的先前进行的临床试验数据与来自Hetionet的疾病和基因的领域专业知识相结合。这为临床研究人员提供了一个全面的资源。此外，我们引入了HeCiX系统，使用LangChain将HeCiX-KG与GPT-4集成，提高其可用性。HeCiX在针对一系列临床相关问题的评估中表现出较高的性能，证明这一模型在增强临床研究效果方面具有潜力。因此，这种方法提供了更全面的临床试验和现有生物数据的视角。

更新时间: 2024-07-19 05:04:24

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2407.14030v1

PASS++: A Dual Bias Reduction Framework for Non-Exemplar Class-Incremental Learning

Class-incremental learning (CIL) aims to recognize new classes incrementally while maintaining the discriminability of old classes. Most existing CIL methods are exemplar-based, i.e., storing a part of old data for retraining. Without relearning old data, those methods suffer from catastrophic forgetting. In this paper, we figure out two inherent problems in CIL, i.e., representation bias and classifier bias, that cause catastrophic forgetting of old knowledge. To address these two biases, we present a simple and novel dual bias reduction framework that employs self-supervised transformation (SST) in input space and prototype augmentation (protoAug) in deep feature space. On the one hand, SST alleviates the representation bias by learning generic and diverse representations that can transfer across different tasks. On the other hand, protoAug overcomes the classifier bias by explicitly or implicitly augmenting prototypes of old classes in the deep feature space, which poses tighter constraints to maintain previously learned decision boundaries. We further propose hardness-aware prototype augmentation and multi-view ensemble strategies, leading to significant improvements. The proposed framework can be easily integrated with pre-trained models. Without storing any samples of old classes, our method can perform comparably with state-of-the-art exemplar-based approaches which store plenty of old data. We hope to draw the attention of researchers back to non-exemplar CIL by rethinking the necessity of storing old samples in CIL.

Updated: 2024-07-19 05:03:16

标题: PASS++：一种用于非示例类增量学习的双偏差减少框架

摘要: Class-incremental learning (CIL)旨在在保持原有类别的可辨识性的同时，逐步识别新类别。大多数现有的CIL方法都是基于样本的，即存储部分旧数据进行重新训练。没有重新学习旧数据，这些方法会遭受灾难性遗忘。在本文中，我们发现CIL中存在两个固有问题，即表征偏差和分类器偏差，这会导致旧知识的灾难性遗忘。为了解决这两种偏差，我们提出了一个简单而新颖的双偏差减少框架，该框架在输入空间中采用自监督转换（SST）和在深度特征空间中采用原型增强（protoAug）。一方面，SST通过学习通用和多样化的表示来减轻表征偏差，这些表示可以在不同任务之间转移。另一方面，protoAug通过明确或隐式地增强深度特征空间中旧类别的原型，克服了分类器偏差，这给维持先前学习的决策边界带来了更严格的约束。我们进一步提出了基于难度的原型增强和多视图集成策略，实现了显著的改进。该提出的框架可以轻松与预训练模型集成。在不存储任何旧类别样本的情况下，我们的方法可以与存储大量旧数据的最先进的基于样本的方法进行相当的性能。我们希望通过重新思考在CIL中存储旧样本的必要性，引起研究人员对非样本CIL的关注。

更新时间: 2024-07-19 05:03:16

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.14029v1

MVPatch: More Vivid Patch for Adversarial Camouflaged Attacks on Object Detectors in the Physical World

Recent studies have shown that Adversarial Patches (APs) can effectively manipulate object detection models. However, the conspicuous patterns often associated with these patches tend to attract human attention, posing a significant challenge. Existing research has primarily focused on enhancing attack efficacy in the physical domain while often neglecting the optimization of stealthiness and transferability. Furthermore, applying APs in real-world scenarios faces major challenges related to transferability, stealthiness, and practicality. To address these challenges, we introduce generalization theory into the context of APs, enabling our iterative process to simultaneously enhance transferability and refine visual correlation with realistic images. We propose a Dual-Perception-Based Framework (DPBF) to generate the More Vivid Patch (MVPatch), which enhances transferability, stealthiness, and practicality. The DPBF integrates two key components: the Model-Perception-Based Module (MPBM) and the Human-Perception-Based Module (HPBM), along with regularization terms. The MPBM employs ensemble strategy to reduce object confidence scores across multiple detectors, thereby improving AP transferability with robust theoretical support. Concurrently, the HPBM introduces a lightweight method for achieving visual similarity, creating natural and inconspicuous adversarial patches without relying on additional generative models. The regularization terms further enhance the practicality of the generated APs in the physical domain. Additionally, we introduce naturalness and transferability scores to provide an unbiased assessment of APs. Extensive experimental validation demonstrates that MVPatch achieves superior transferability and a natural appearance in both digital and physical domains, underscoring its effectiveness and stealthiness.

Updated: 2024-07-19 04:55:57

标题: MVPatch：对物理世界中目标检测器进行对抗性伪装攻击的更生动补丁

摘要: 最近的研究表明，对抗性贴纸（APs）可以有效地操纵物体检测模型。然而，与这些贴纸经常相关的显眼图案往往会吸引人类注意力，构成了一个重大挑战。现有研究主要集中在增强物理领域中的攻击效果，往往忽视了潜在性和可转移性的优化。此外，在真实世界场景中应用APs面临与可转移性、潜在性和实用性相关的重大挑战。为了解决这些挑战，我们将泛化理论引入APs的背景中，使我们的迭代过程能够同时增强可转移性和细化与现实图像的视觉相关性。我们提出了一个基于双感知的框架（DPBF）来生成更生动的贴纸（MVPatch），它增强了可转移性、潜在性和实用性。DPBF整合了两个关键组成部分：基于模型感知的模块（MPBM）和基于人类感知的模块（HPBM），以及正则化项。MPBM采用集成策略来降低多个检测器的对象置信度分数，从而改善AP的可转移性并获得强大的理论支持。与此同时，HPBM引入了一种轻量级方法来实现视觉相似性，创造出自然且不显眼的对抗性贴纸，而无需依赖额外的生成模型。正则化项进一步增强了在物理领域生成的APs的实用性。此外，我们引入了自然性和可转移性评分，以提供对APs的公正评估。广泛的实验证实，MVPatch在数字和物理领域均实现了出色的可转移性和自然外观，突显了其有效性和潜在性。

更新时间: 2024-07-19 04:55:57

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2312.17431v3

TTA-OOD: Test-time Augmentation for Improving Out-of-Distribution Detection in Gastrointestinal Vision

Deep learning has significantly advanced the field of gastrointestinal vision, enhancing disease diagnosis capabilities. One major challenge in automating diagnosis within gastrointestinal settings is the detection of abnormal cases in endoscopic images. Due to the sparsity of data, this process of distinguishing normal from abnormal cases has faced significant challenges, particularly with rare and unseen conditions. To address this issue, we frame abnormality detection as an out-of-distribution (OOD) detection problem. In this setup, a model trained on In-Distribution (ID) data, which represents a healthy GI tract, can accurately identify healthy cases, while abnormalities are detected as OOD, regardless of their class. We introduce a test-time augmentation segment into the OOD detection pipeline, which enhances the distinction between ID and OOD examples, thereby improving the effectiveness of existing OOD methods with the same model. This augmentation shifts the pixel space, which translates into a more distinct semantic representation for OOD examples compared to ID examples. We evaluated our method against existing state-of-the-art OOD scores, showing improvements with test-time augmentation over the baseline approach.

Updated: 2024-07-19 04:50:54

标题: TTA-OOD：用于改进胃肠视觉领域中超出分布检测的测试时间数据增强

摘要: 深度学习显著推动了胃肠视觉领域的发展，增强了疾病诊断能力。在胃肠设置中自动化诊断面临的一个主要挑战是在内窥镜图像中检测异常病例。由于数据稀疏，区分正常和异常病例的过程面临着重大挑战，尤其是对于罕见和未见状况。为了解决这个问题，我们将异常检测定位为一种超出分布（OOD）检测问题。在这种设置中，一个在内部分布（ID）数据上训练的模型，代表着健康的胃肠道，可以准确识别健康病例，而异常病例被检测为OOD，无论其类别如何。我们在OOD检测管道中引入了一个测试时间增强段，增强了ID和OOD示例之间的区分，从而提高了使用相同模型的现有OOD方法的有效性。这种增强改变了像素空间，使OOD示例相对于ID示例具有更明显的语义表示。我们将我们的方法与现有的最新OOD分数进行了评估，结果显示测试时间增强比基线方法有所改善。

更新时间: 2024-07-19 04:50:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.14024v1

A Survey on Efficient Inference for Large Language Models

Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards developing techniques aimed at enhancing the efficiency of LLM inference. This paper presents a comprehensive survey of the existing literature on efficient LLM inference. We start by analyzing the primary causes of the inefficient LLM inference, i.e., the large model size, the quadratic-complexity attention operation, and the auto-regressive decoding approach. Then, we introduce a comprehensive taxonomy that organizes the current literature into data-level, model-level, and system-level optimization. Moreover, the paper includes comparative experiments on representative methods within critical sub-fields to provide quantitative insights. Last but not least, we provide some knowledge summary and discuss future research directions.

Updated: 2024-07-19 04:47:36

标题: 大语言模型高效推理调查

摘要: 大型语言模型（LLMs）由于其在各种任务中表现出色而受到广泛关注。然而，LLM推断所需的大量计算和内存资源需求给在资源受限场景下的部署带来挑战。该领域的努力主要集中在开发旨在提高LLM推断效率的技术上。本文对现有文献中关于高效LLM推断的综合调查进行了介绍。我们首先分析了导致LLM推断低效的主要原因，即庞大的模型大小、二次复杂度的注意力操作以及自回归解码方法。然后，我们引入了一个将当前文献分为数据级、模型级和系统级优化的全面分类体系。此外，本文还包括对代表性方法进行比较实验，以提供定量洞见。最后，我们提供了一些知识总结，并讨论了未来的研究方向。

更新时间: 2024-07-19 04:47:36

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.14294v3

Causal Inference with Complex Treatments: A Survey

Causal inference plays an important role in explanatory analysis and decision making across various fields like statistics, marketing, health care, and education. Its main task is to estimate treatment effects and make intervention policies. Traditionally, most of the previous works typically focus on the binary treatment setting that there is only one treatment for a unit to adopt or not. However, in practice, the treatment can be much more complex, encompassing multi-valued, continuous, or bundle options. In this paper, we refer to these as complex treatments and systematically and comprehensively review the causal inference methods for addressing them. First, we formally revisit the problem definition, the basic assumptions, and their possible variations under specific conditions. Second, we sequentially review the related methods for multi-valued, continuous, and bundled treatment settings. In each situation, we tentatively divide the methods into two categories: those conforming to the unconfoundedness assumption and those violating it. Subsequently, we discuss the available datasets and open-source codes. Finally, we provide a brief summary of these works and suggest potential directions for future research.

Updated: 2024-07-19 04:46:58

标题: 使用复杂治疗的因果推断：一项调查

摘要: 因果推断在统计学、营销、医疗保健和教育等各个领域的解释性分析和决策中发挥着重要作用。其主要任务是估计治疗效果并制定干预政策。传统上，大多数先前的研究通常集中在二元治疗设定上，即单位只能选择接受或不接受一种治疗。然而，在实践中，治疗可能更加复杂，包括多值、连续或捆绑选项。在本文中，我们将这些称为复杂治疗，并系统全面地审查解决这些问题的因果推断方法。首先，我们正式重新审视问题定义、基本假设及其在特定条件下的可能变化。其次，我们按顺序审查相关方法，针对多值、连续和捆绑治疗设定。在每种情况下，我们试图将方法分为两类：符合无偏性假设和违反该假设的方法。随后，我们讨论可用数据集和开源代码。最后，我们对这些工作进行简要总结，并提出未来研究的潜在方向。

更新时间: 2024-07-19 04:46:58

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2407.14022v1

NeuroBind: Towards Unified Multimodal Representations for Neural Signals

Understanding neural activity and information representation is crucial for advancing knowledge of brain function and cognition. Neural activity, measured through techniques like electrophysiology and neuroimaging, reflects various aspects of information processing. Recent advances in deep neural networks offer new approaches to analyzing these signals using pre-trained models. However, challenges arise due to discrepancies between different neural signal modalities and the limited scale of high-quality neural data. To address these challenges, we present NeuroBind, a general representation that unifies multiple brain signal types, including EEG, fMRI, calcium imaging, and spiking data. To achieve this, we align neural signals in these image-paired neural datasets to pre-trained vision-language embeddings. Neurobind is the first model that studies different neural modalities interconnectedly and is able to leverage high-resource modality models for various neuroscience tasks. We also showed that by combining information from different neural signal modalities, NeuroBind enhances downstream performance, demonstrating the effectiveness of the complementary strengths of different neural modalities. As a result, we can leverage multiple types of neural signals mapped to the same space to improve downstream tasks, and demonstrate the complementary strengths of different neural modalities. This approach holds significant potential for advancing neuroscience research, improving AI systems, and developing neuroprosthetics and brain-computer interfaces.

Updated: 2024-07-19 04:42:52

标题: NeuroBind: 朝向神经信号的统一多模态表示

摘要: 理解神经活动和信息表示对于推进大脑功能和认知的知识至关重要。神经活动，通过诸如电生理学和神经影像学等技术测量，反映了信息处理的各个方面。深度神经网络的最新进展提供了使用预训练模型分析这些信号的新方法。然而，由于不同神经信号模式之间的差异以及高质量神经数据的有限规模，挑战也随之而来。为了解决这些挑战，我们提出了NeuroBind，一个统一多种脑信号类型的通用表示，包括EEG、fMRI、钙成像和尖峰数据。为了实现这一目标，我们将这些图像配对的神经数据集中的神经信号与预先训练的视觉-语言嵌入对齐。Neurobind是第一个研究不同神经模态之间相互关联的模型，并能够利用高资源模态模型进行各种神经科学任务。我们还展示了通过结合不同神经信号模态的信息，NeuroBind增强了下游性能，展示了不同神经模态的互补优势的有效性。因此，我们可以利用映射到相同空间的多种类型的神经信号来改进下游任务，并展示不同神经模态的互补优势。这种方法具有推动神经科学研究、改进人工智能系统以及开发神经假肢和脑-计算机界面的重大潜力。

更新时间: 2024-07-19 04:42:52

领域: q-bio.NC,cs.LG

下载: http://arxiv.org/abs/2407.14020v1

Towards Faithful Explanations: Boosting Rationalization with Shortcuts Discovery

The remarkable success in neural networks provokes the selective rationalization. It explains the prediction results by identifying a small subset of the inputs sufficient to support them. Since existing methods still suffer from adopting the shortcuts in data to compose rationales and limited large-scale annotated rationales by human, in this paper, we propose a Shortcuts-fused Selective Rationalization (SSR) method, which boosts the rationalization by discovering and exploiting potential shortcuts. Specifically, SSR first designs a shortcuts discovery approach to detect several potential shortcuts. Then, by introducing the identified shortcuts, we propose two strategies to mitigate the problem of utilizing shortcuts to compose rationales. Finally, we develop two data augmentations methods to close the gap in the number of annotated rationales. Extensive experimental results on real-world datasets clearly validate the effectiveness of our proposed method.

Updated: 2024-07-19 04:31:38

标题: 朝着忠实解释之路：通过快捷发现提升合理化

摘要: 在神经网络方面取得的显著成功引发了选择性合理化的讨论。该方法通过识别支持预测结果的少量输入来解释预测结果。由于现有方法仍然存在使用数据中的捷径来构成理由以及人类有限的大规模注释理由的问题，因此在本文中，我们提出了一种名为“快捷方式融合选择性合理化（SSR）”方法，通过发现和利用潜在的捷径来提升合理化。具体来说，SSR首先设计了一种捷径发现方法来检测几种潜在的捷径。然后，通过引入识别的捷径，我们提出了两种策略来缓解利用捷径构成理由的问题。最后，我们开发了两种数据增强方法来填补注释理由数量的差距。对真实世界数据集的广泛实验结果清楚地验证了我们提出的方法的有效性。

更新时间: 2024-07-19 04:31:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.07955v2

Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability Guarantees

Censoring is the central problem in survival analysis where either the time-to-event (for instance, death), or the time-tocensoring (such as loss of follow-up) is observed for each sample. The majority of existing machine learning-based survival analysis methods assume that survival is conditionally independent of censoring given a set of covariates; an assumption that cannot be verified since only marginal distributions is available from the data. The existence of dependent censoring, along with the inherent bias in current estimators has been demonstrated in a variety of applications, accentuating the need for a more nuanced approach. However, existing methods that adjust for dependent censoring require practitioners to specify the ground truth copula. This requirement poses a significant challenge for practical applications, as model misspecification can lead to substantial bias. In this work, we propose a flexible deep learning-based survival analysis method that simultaneously accommodate for dependent censoring and eliminates the requirement for specifying the ground truth copula. We theoretically prove the identifiability of our model under a broad family of copulas and survival distributions. Experiments results from a wide range of datasets demonstrate that our approach successfully discerns the underlying dependency structure and significantly reduces survival estimation bias when compared to existing methods.

Updated: 2024-07-19 04:29:39

标题: 深层Copula基础上的生存分析，面对具有可识别性保证的相关截尾

摘要: 审查是生存分析中的核心问题，在每个样本中观察到事件发生时间（例如死亡）或审查时间（如失访）。现有的大多数基于机器学习的生存分析方法假设在给定一组协变量的情况下，生存与审查是条件独立的；这一假设无法验证，因为数据中只有边际分布可用。已经在各种应用中证明了相关审查的存在，以及当前估计器中固有的偏差，突显了对更复杂方法的需求。然而，现有的调整相关审查的方法需要从业者指定地面真实的copula。这一要求对实际应用构成了重大挑战，因为模型误差可能导致显著偏差。在这项工作中，我们提出了一种基于深度学习的灵活生存分析方法，同时适应相关审查并消除了指定地面真实copula的要求。我们在广泛的copula和生存分布家族下理论上证明了我们模型的可识别性。来自各种数据集的实验结果表明，与现有方法相比，我们的方法成功辨别了潜在的依赖结构，并显著减少了生存估计偏差。

更新时间: 2024-07-19 04:29:39

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2312.15566v4

A Survey of Retrieval Algorithms in Ad and Content Recommendation Systems

This survey examines the most effective retrieval algorithms utilized in ad recommendation and content recommendation systems. Ad targeting algorithms rely on detailed user profiles and behavioral data to deliver personalized advertisements, thereby driving revenue through targeted placements. Conversely, organic retrieval systems aim to improve user experience by recommending content that matches user preferences. This paper compares these two applications and explains the most effective methods employed in each.

Updated: 2024-07-19 04:16:03

标题: 广告和内容推荐系统中检索算法的调查

摘要: 本调查研究了广告推荐和内容推荐系统中使用的最有效的检索算法。广告定位算法依赖于详细的用户资料和行为数据，以提供个性化广告，从而通过定向投放实现收入增长。相反，有机检索系统旨在通过推荐符合用户偏好的内容来提高用户体验。本文比较了这两种应用，并解释了各自采用的最有效方法。

更新时间: 2024-07-19 04:16:03

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2407.01712v2

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existing methods often struggle to achieve substantial end-to-end LLM inference speedup. In this paper, we introduce SLEB, a novel approach designed to streamline LLMs by eliminating redundant transformer blocks. We choose the transformer block as the fundamental unit for pruning, because LLMs exhibit block-level redundancy with high similarity between the outputs of neighboring blocks. This choice allows us to effectively enhance the processing speed of LLMs. Our experimental results demonstrate that SLEB outperforms previous LLM pruning methods in accelerating LLM inference while also maintaining superior perplexity and accuracy, making SLEB as a promising technique for enhancing the efficiency of LLMs. The code is available at: https://github.com/jiwonsong-dev/SLEB.

Updated: 2024-07-19 04:13:59

标题: SLEB：通过冗余验证和消除Transformer块来简化LLMs

摘要: 大型语言模型（LLMs）已被证明在各种自然语言处理任务中非常有效。然而，它们庞大的参数数量对于实际部署提出了重大挑战。修剪是一种旨在减小LLMs大小和复杂性的技术，通过从网络中删除冗余组件提供潜在解决方案。尽管修剪具有潜力，但现有方法通常难以实现实质性的端到端LLM推理加速。在本文中，我们介绍了SLEB，一种旨在通过消除冗余的transformer块来简化LLMs的新方法。我们选择transformer块作为修剪的基本单位，因为LLMs表现出块级冗余，相邻块的输出具有高度相似性。这个选择使我们能够有效提高LLMs的处理速度。我们的实验结果表明，SLEB在加速LLM推理方面优于先前的LLM修剪方法，同时保持更优异的困惑度和准确性，使SLEB成为提高LLMs效率的一种有前途的技术。代码可在以下网址找到：https://github.com/jiwonsong-dev/SLEB。

更新时间: 2024-07-19 04:13:59

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.09025v5

Adversarial Examples in the Physical World: A Survey

Deep neural networks (DNNs) have demonstrated high vulnerability to adversarial examples, raising broad security concerns about their applications. Besides the attacks in the digital world, the practical implications of adversarial examples in the physical world present significant challenges and safety concerns. However, current research on physical adversarial examples (PAEs) lacks a comprehensive understanding of their unique characteristics, leading to limited significance and understanding. In this paper, we address this gap by thoroughly examining the characteristics of PAEs within a practical workflow encompassing training, manufacturing, and re-sampling processes. By analyzing the links between physical adversarial attacks, we identify manufacturing and re-sampling as the primary sources of distinct attributes and particularities in PAEs. Leveraging this knowledge, we develop a comprehensive analysis and classification framework for PAEs based on their specific characteristics, covering over 100 studies on physical-world adversarial examples. Furthermore, we investigate defense strategies against PAEs and identify open challenges and opportunities for future research. We aim to provide a fresh, thorough, and systematic understanding of PAEs, thereby promoting the development of robust adversarial learning and its application in open-world scenarios to provide the community with a continuously updated list of physical world adversarial sample resources, including papers, code, \etc, within the proposed framework

Updated: 2024-07-19 04:06:19

标题: 现实世界中的对抗性示例：一项调查

摘要: 深度神经网络（DNNs）已经表现出对对抗样本具有很高的脆弱性，引发了人们对其应用的广泛安全关注。除了数字世界中的攻击外，对抗样本在物理世界中的实际影响也带来了重大挑战和安全问题。然而，目前关于物理对抗样本（PAEs）的研究缺乏对其独特特征的全面理解，导致了其重要性和理解的限制。本文通过彻底审查PAEs的特征，涵盖了训练、制造和再取样过程，来填补这一空白。通过分析物理对抗攻击之间的联系，我们确定制造和再取样是PAEs中独特属性和特点的主要来源。利用这一知识，我们基于其特定特征开发了一个全面的分析和分类框架，涵盖了100多项关于物理世界对抗样本的研究。此外，我们调查了对抗PAEs的防御策略，并确定了未来研究的开放挑战和机会。我们旨在提供对PAEs的新鲜、全面和系统化的理解，从而促进强大对抗学习的发展及其在开放世界场景中的应用，为社区提供一个持续更新的物理世界对抗样本资源清单，包括论文、代码等，在提出的框架内。

更新时间: 2024-07-19 04:06:19

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2311.01473v2

Investigating the Indirect Object Identification circuit in Mamb

How well will current interpretability techniques generalize to future models? A relevant case study is Mamba, a recent recurrent architecture with scaling comparable to Transformers. We adapt pre-Mamba techniques to Mamba and partially reverse-engineer the circuit responsible for the Indirect Object Identification (IOI) task. Our techniques provide evidence that 1) Layer 39 is a key bottleneck, 2) Convolutions in layer 39 shift names one position forward, and 3) The name entities are stored linearly in Layer 39's SSM. Finally, we adapt an automatic circuit discovery tool, positional Edge Attribution Patching, to identify a Mamba IOI circuit. Our contributions provide initial evidence that circuit-based mechanistic interpretability tools work well for the Mamba architecture.

Updated: 2024-07-19 03:45:27

标题: 研究曼巴间接宾语识别回路

摘要: 当前可解释性技术能否很好地推广到未来的模型？一个相关的案例研究是Mamba，这是一个具有与Transformers相当的规模的最新循环架构。我们将先前的技术调整到Mamba，并部分逆向工程了负责间接对象识别（IOI）任务的电路。我们的技术提供了证据：1）第39层是一个关键的瓶颈，2）第39层的卷积将名称向前移动一个位置，3）名称实体线性存储在第39层的SSM中。最后，我们将自动电路发现工具，位置边缘归因补丁，调整为识别一个Mamba IOI电路。我们的贡献提供了初步证据，即基于电路的机械解释性工具对Mamba架构效果良好。

更新时间: 2024-07-19 03:45:27

领域: cs.LG

下载: http://arxiv.org/abs/2407.14008v1

Multi-modal Relation Distillation for Unified 3D Representation Learning

Recent advancements in multi-modal pre-training for 3D point clouds have demonstrated promising results by aligning heterogeneous features across 3D shapes and their corresponding 2D images and language descriptions. However, current straightforward solutions often overlook intricate structural relations among samples, potentially limiting the full capabilities of multi-modal learning. To address this issue, we introduce Multi-modal Relation Distillation (MRD), a tri-modal pre-training framework, which is designed to effectively distill reputable large Vision-Language Models (VLM) into 3D backbones. MRD aims to capture both intra-relations within each modality as well as cross-relations between different modalities and produce more discriminative 3D shape representations. Notably, MRD achieves significant improvements in downstream zero-shot classification tasks and cross-modality retrieval tasks, delivering new state-of-the-art performance.

Updated: 2024-07-19 03:43:48

标题: 多模态关系蒸馏用于统一的3D表示学习

摘要: 最近在多模态三维点云预训练方面取得了显著进展，通过对齐三维形状及其对应的二维图像和语言描述之间的异构特征，展现出了令人期待的结果。然而，当前的简单解决方案通常忽略了样本之间复杂的结构关系，可能限制了多模态学习的全部能力。为了解决这个问题，我们引入了多模态关系蒸馏（MRD），这是一个三模态预训练框架，旨在有效地将知名的大规模视觉语言模型（VLM）蒸馏到三维骨干中。MRD旨在捕捉每个模态内的内部关系以及不同模态之间的交叉关系，并产生更具区分性的三维形状表示。值得注意的是，MRD在下游零样本分类任务和跨模态检索任务中取得了显著的改进，提供了新的最先进性能。

更新时间: 2024-07-19 03:43:48

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.14007v1

Made to Order: Discovering monotonic temporal changes via self-supervised video ordering

Our objective is to discover and localize monotonic temporal changes in a sequence of images. To achieve this, we exploit a simple proxy task of ordering a shuffled image sequence, with `time' serving as a supervisory signal, since only changes that are monotonic with time can give rise to the correct ordering. We also introduce a transformer-based model for ordering of image sequences of arbitrary length with built-in attribution maps. After training, the model successfully discovers and localizes monotonic changes while ignoring cyclic and stochastic ones. We demonstrate applications of the model in multiple domains covering different scene and object types, discovering both object-level and environmental changes in unseen sequences. We also demonstrate that the attention-based attribution maps function as effective prompts for segmenting the changing regions, and that the learned representations can be used for downstream applications. Finally, we show that the model achieves the state-of-the-art on standard benchmarks for image ordering.

Updated: 2024-07-19 03:31:34

标题: 按需定制：通过自监督视频排序发现单调时间变化

摘要: 我们的目标是发现和定位一系列图像中的单调时间变化。为了实现这一目标，我们利用一个简单的代理任务，即对一组打乱顺序的图像进行排序，其中“时间”作为监督信号，因为只有随时间单调变化的变化才能导致正确的排序。我们还引入了基于transformer的模型，用于对任意长度的图像序列进行排序，并具有内置的属性图。在训练之后，该模型成功地发现和定位单调变化，同时忽略循环和随机变化。我们展示了该模型在涵盖不同场景和对象类型的多个领域中的应用，发现了未见序列中的物体级和环境变化。我们还展示了基于注意力的属性图作为有效提示用于分割变化区域，并且学习的表示可以用于下游应用。最后，我们展示该模型在图像排序的标准基准上取得了最先进的成果。

更新时间: 2024-07-19 03:31:34

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2404.16828v2

Detecting and Characterising Mobile App Metamorphosis in Google Play Store

App markets have evolved into highly competitive and dynamic environments for developers. While the traditional app life cycle involves incremental updates for feature enhancements and issue resolution, some apps deviate from this norm by undergoing significant transformations in their use cases or market positioning. We define this previously unstudied phenomenon as 'app metamorphosis'. In this paper, we propose a novel and efficient multi-modal search methodology to identify apps undergoing metamorphosis and apply it to analyse two snapshots of the Google Play Store taken five years apart. Our methodology uncovers various metamorphosis scenarios, including re-births, re-branding, re-purposing, and others, enabling comprehensive characterisation. Although these transformations may register as successful for app developers based on our defined success score metric (e.g., re-branded apps performing approximately 11.3% better than an average top app), we shed light on the concealed security and privacy risks that lurk within, potentially impacting even tech-savvy end-users.

Updated: 2024-07-19 03:26:40

标题: 在Google Play商店中检测和描述移动应用程序的变形

摘要: 应用市场已经发展成为开发者之间竞争激烈且动态变化的环境。虽然传统的应用生命周期包括逐步更新功能增强和问题解决，但有些应用偏离这一规范，经历了在使用案例或市场定位上的重大转变。我们将这一先前未研究的现象定义为“应用变形”。在本文中，我们提出了一种新颖高效的多模式搜索方法，用于识别正在经历变形的应用，并将其应用于分析相隔五年的两个Google Play商店的快照。我们的方法揭示了各种变形情景，包括重生、重新品牌、重新定位等，实现了全面的特征描述。尽管根据我们定义的成功评分指标（例如，重新品牌的应用表现比平均顶级应用好约11.3%）这些转变可能被开发者视为成功，但我们揭示了其中潜在的隐藏的安全和隐私风险，可能会影响即使是技术熟练的最终用户。

更新时间: 2024-07-19 03:26:40

领域: cs.SE,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.14565v1

Time Series Generative Learning with Application to Brain Imaging Analysis

This paper focuses on the analysis of sequential image data, particularly brain imaging data such as MRI, fMRI, CT, with the motivation of understanding the brain aging process and neurodegenerative diseases. To achieve this goal, we investigate image generation in a time series context. Specifically, we formulate a min-max problem derived from the $f$-divergence between neighboring pairs to learn a time series generator in a nonparametric manner. The generator enables us to generate future images by transforming prior lag-k observations and a random vector from a reference distribution. With a deep neural network learned generator, we prove that the joint distribution of the generated sequence converges to the latent truth under a Markov and a conditional invariance condition. Furthermore, we extend our generation mechanism to a panel data scenario to accommodate multiple samples. The effectiveness of our mechanism is evaluated by generating real brain MRI sequences from the Alzheimer's Disease Neuroimaging Initiative. These generated image sequences can be used as data augmentation to enhance the performance of further downstream tasks, such as Alzheimer's disease detection.

Updated: 2024-07-19 03:24:20

标题: 时间序列生成学习及其在脑成像分析中的应用

摘要: 这篇论文关注于顺序图像数据的分析，特别是脑成像数据，例如MRI、fMRI、CT，旨在理解大脑衰老过程和神经退行性疾病。为了实现这一目标，我们研究了时间序列背景下的图像生成。具体来说，我们制定了一个由邻近对之间的$f$-散度导出的最小-最大问题，以非参数方式学习时间序列生成器。生成器使我们能够通过转换先前的滞后k观察和来自参考分布的随机向量来生成未来图像。通过学习的深度神经网络生成器，我们证明了生成序列的联合分布在马尔可夫和条件不变性条件下收敛于潜在真相。此外，我们将生成机制扩展到面板数据场景，以适应多个样本。我们通过从阿尔茨海默病神经影像学计划生成真实的脑MRI序列来评估我们机制的有效性。这些生成的图像序列可以用作数据增强，以增强进一步下游任务的性能，例如阿尔茨海默病检测。

更新时间: 2024-07-19 03:24:20

领域: stat.ML,cs.LG,eess.IV,stat.ME

下载: http://arxiv.org/abs/2407.14003v1

Quantum One-Wayness of the Single-Round Sponge with Invertible Permutations

Sponge hashing is a widely used class of cryptographic hash algorithms which underlies the current international hash function standard SHA-3. In a nutshell, a sponge function takes as input a bit-stream of any length and processes it via a simple iterative procedure: it repeatedly feeds each block of the input into a so-called block function, and then produces a digest by once again iterating the block function on the final output bits. While much is known about the post-quantum security of the sponge construction when the block function is modeled as a random function or one-way permutation, the case of invertible permutations, which more accurately models the construction underlying SHA-3, has so far remained a fundamental open problem. In this work, we make new progress towards overcoming this barrier and show several results. First, we prove the "double-sided zero-search" conjecture proposed by Unruh (eprint' 2021) and show that finding zero-pairs in a random $2n$-bit permutation requires at least $\Omega(2^{n/2})$ many queries -- and this is tight due to Grover's algorithm. At the core of our proof lies a novel "symmetrization argument" which uses insights from the theory of Young subgroups. Second, we consider more general variants of the double-sided search problem and show similar query lower bounds for them. As an application, we prove the quantum one-wayness of the single-round sponge with invertible permutations in the quantum random oracle model.

Updated: 2024-07-19 03:17:22

标题: 单轮海绵结构与可逆排列的量子单向性

摘要: 海绵哈希是一类广泛使用的密码哈希算法，是当前国际哈希函数标准SHA-3的基础。简而言之，海绵函数接受任意长度的比特流作为输入，并通过一个简单的迭代过程处理它：它反复将输入的每个块馈送到所谓的块函数中，然后通过再次迭代块函数在最终输出比特上产生摘要。虽然关于海绵结构的量子后安全性在块函数被建模为随机函数或单向置换时已知甚多，但将可逆置换的情况，更准确地模拟了SHA-3的构造，迄今仍然是一个基本的开放问题。在这项工作中，我们取得了新的进展以克服这一障碍，并展示了几个结果。首先，我们证明了Unruh提出的“双面零搜索”猜想（eprint' 2021），并表明在随机$2n$比特置换中寻找零对至少需要$\Omega(2^{n/2})$次查询——这是由于Grover算法的紧密性。我们的证明的核心是一种新颖的“对称化论证”，利用了年轻子群理论的见解。其次，我们考虑双面搜索问题的更一般变体，并为它们展示了类似的查询下限。作为应用，我们证明了在量子随机预言机模型中具有可逆置换的单轮海绵的量子单向性。

更新时间: 2024-07-19 03:17:22

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2403.04740v3

Clinical Reading Comprehension with Encoder-Decoder Models Enhanced by Direct Preference Optimization

Extractive question answering over clinical text is a crucial need to help deal with the deluge of clinical text generated in hospitals. While encoder models (e.g., BERT) have been popular for this reading comprehension task, recently encoder-decoder models (e.g., T5) are on the rise. There is also the emergence of preference optimization techniques to align decoder-only LLMs with human preferences. In this paper, we combine encoder-decoder models with the direct preference optimization (DPO) method to improve over prior state of the art for the RadQA radiology question answering task by 12-15 F1 points. To the best of our knowledge, this effort is the first to show that DPO method also works for reading comprehension via novel heuristics to generate preference data without human inputs.

Updated: 2024-07-19 03:12:10

标题: 用直接优化偏好增强的编码器-解码器模型进行临床阅读理解

摘要: 在医院产生的大量临床文本中进行抽取式问答是一项至关重要的需求。虽然编码器模型（例如BERT）在这项阅读理解任务中很受欢迎，但最近编码-解码模型（例如T5）也在崛起。此外，还出现了优化技术，以使解码器专用的LLMs与人类偏好保持一致。本文将编码-解码模型与直接偏好优化（DPO）方法相结合，通过改进RadQA放射学问答任务的先前最先进技术12-15个F1分数。据我们所知，这一努力是首次表明DPO方法也适用于通过新颖的启发式算法生成偏好数据而无需人类输入的阅读理解。

更新时间: 2024-07-19 03:12:10

领域: cs.IR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.14000v1

RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering

Question answering based on retrieval augmented generation (RAG-QA) is an important research topic in NLP and has a wide range of real-world applications. However, most existing datasets for this task are either constructed using a single source corpus or consist of short extractive answers, which fall short of evaluating large language model (LLM) based RAG-QA systems on cross-domain generalization. To address these limitations, we create Long-form RobustQA (LFRQA), a new dataset comprising human-written long-form answers that integrate short extractive answers from multiple documents into a single, coherent narrative, covering 26K queries and large corpora across seven different domains. We further propose RAG-QA Arena by directly comparing model-generated answers against LFRQA's answers using LLMs as evaluators. We show via extensive experiments that RAG-QA Arena and human judgments on answer quality are highly correlated. Moreover, only 41.3% of the most competitive LLM's answers are preferred to LFRQA's answers, demonstrating RAG-QA Arena as a challenging evaluation platform for future research.

Updated: 2024-07-19 03:02:51

标题: RAG-QA竞技场：评估长文检索增强问答领域的鲁棒性

摘要: 基于检索增强生成的问答（RAG-QA）是自然语言处理中一个重要的研究课题，具有广泛的实际应用。然而，目前大多数用于此任务的现有数据集要么是使用单个源语料库构建的，要么由短的抽取式答案组成，这些数据集无法评估基于大型语言模型（LLM）的RAG-QA系统在跨领域泛化上的表现。为了解决这些限制，我们创建了Long-form RobustQA（LFRQA）数据集，该数据集包含人工撰写的长篇答案，将来自多个文档的短抽取式答案整合为一个连贯的叙述，涵盖了26K个查询和七个不同领域的大型语料库。我们进一步提出了RAG-QA Arena，通过直接比较模型生成的答案与LLM作为评估者的LFRQA答案。我们通过大量实验证明，RAG-QA Arena和人类对答案质量的判断高度相关。此外，仅有41.3％最具竞争力的LLM答案优于LFRQA答案，表明RAG-QA Arena是未来研究中具有挑战性的评估平台。

更新时间: 2024-07-19 03:02:51

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.13998v1

Omni-Dimensional Frequency Learner for General Time Series Analysis

Frequency domain representation of time series feature offers a concise representation for handling real-world time series data with inherent complexity and dynamic nature. However, current frequency-based methods with complex operations still fall short of state-of-the-art time domain methods for general time series analysis. In this work, we present Omni-Dimensional Frequency Learner (ODFL) model based on a in depth analysis among all the three aspects of the spectrum feature: channel redundancy property among the frequency dimension, the sparse and un-salient frequency energy distribution among the frequency dimension, and the semantic diversity among the variable dimension. Technically, our method is composed of a semantic-adaptive global filter with attention to the un-salient frequency bands and partial operation among the channel dimension. Empirical results show that ODFL achieves consistent state-of-the-art in five mainstream time series analysis tasks, including short- and long-term forecasting, imputation, classification, and anomaly detection, offering a promising foundation for time series analysis.

Updated: 2024-07-19 03:00:16

标题: 全方位频率学习器用于一般时间序列分析

摘要: 频域表示时间序列特征提供了处理具有固有复杂性和动态性质的现实世界时间序列数据的简洁表示。然而，当前基于频率的方法在复杂操作方面仍然不及最先进的时域方法用于一般时间序列分析。在这项工作中，我们提出了基于全方位分析的全维频率学习器（ODFL）模型，其中包括频谱特征的三个方面：频率维度间的通道冗余属性，频率维度间的稀疏和不显著的频率能量分布，以及变量维度间的语义多样性。在技术上，我们的方法由一个语义自适应全局过滤器组成，注意不显著的频率带，并在通道维度间进行部分操作。实证结果表明，ODFL在包括短期和长期预测、插补、分类和异常检测在内的五个主流时间序列分析任务中取得了始终如一的最先进水平，为时间序列分析提供了有希望的基础。

更新时间: 2024-07-19 03:00:16

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.10419v2

Estimating the stability number of a random graph using convolutional neural networks

Graph combinatorial optimization problems are widely applicable and notoriously difficult to compute; for example, consider the traveling salesman or facility location problems. In this paper, we explore the feasibility of using convolutional neural networks (CNNs) on graph images to predict the cardinality of combinatorial properties of random graphs and networks. Specifically, we use image representations of modified adjacency matrices of random graphs as training samples for a CNN model to predict the stability number of random graphs; where the stability number is the cardinality of a maximum set of vertices in a graph that contains no pairwise adjacency between vertices. The model and results presented in this study suggest potential for applying deep learning in combinatorial optimization problems previously not considered by simple deep learning techniques.

Updated: 2024-07-19 02:51:01

标题: 使用卷积神经网络估计随机图的稳定性数

摘要: 图组合优化问题具有广泛的适用性，计算起来非常困难；例如，考虑旅行推销员或设施选址问题。在本文中，我们探讨了在图像上使用卷积神经网络（CNNs）来预测随机图和网络的组合特性基数的可行性。具体来说，我们使用修改后的随机图的邻接矩阵的图像表示作为CNN模型的训练样本，以预测随机图的稳定数；其中，稳定数是图中不包含任何顶点之间的成对邻接的最大顶点集的基数。本研究中提出的模型和结果表明，在以前未考虑简单深度学习技术的组合优化问题中应用深度学习的潜力。

更新时间: 2024-07-19 02:51:01

领域: cs.LG,math.CO

下载: http://arxiv.org/abs/2407.07827v2

LLAssist: Simple Tools for Automating Literature Review Using Large Language Models

This paper introduces LLAssist, an open-source tool designed to streamline literature reviews in academic research. In an era of exponential growth in scientific publications, researchers face mounting challenges in efficiently processing vast volumes of literature. LLAssist addresses this issue by leveraging Large Language Models (LLMs) and Natural Language Processing (NLP) techniques to automate key aspects of the review process. Specifically, it extracts important information from research articles and evaluates their relevance to user-defined research questions. The goal of LLAssist is to significantly reduce the time and effort required for comprehensive literature reviews, allowing researchers to focus more on analyzing and synthesizing information rather than on initial screening tasks. By automating parts of the literature review workflow, LLAssist aims to help researchers manage the growing volume of academic publications more efficiently.

Updated: 2024-07-19 02:48:54

标题: LLAssist：使用大型语言模型自动化文献综述的简单工具

摘要: 这篇论文介绍了LLAssist，一个旨在简化学术研究文献综述的开源工具。在科学出版物呈指数增长的时代，研究者们面临着处理大量文献的巨大挑战。LLAssist通过利用大型语言模型（LLMs）和自然语言处理（NLP）技术来自动化审阅过程的关键方面，以解决这一问题。具体而言，它从研究文章中提取重要信息，并评估其与用户定义的研究问题的相关性。LLAssist的目标是显著减少进行全面文献综述所需的时间和精力，让研究者能够更多地专注于分析和综合信息，而不是初步筛选任务。通过自动化文献综述工作流程的部分内容，LLAssist旨在帮助研究者更有效地管理不断增长的学术出版物数量。

更新时间: 2024-07-19 02:48:54

领域: cs.DL,cs.AI

下载: http://arxiv.org/abs/2407.13993v1

AI-native Memory: A Pathway from LLMs Towards AGI

Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI). One opinion, especially from some startups working on LLMs, argues that an LLM with nearly unlimited context length can realize AGI. However, they might be too optimistic about the long-context capability of (existing) LLMs -- (1) Recent literature has shown that their effective context length is significantly smaller than their claimed context length; and (2) Our reasoning-in-a-haystack experiments further demonstrate that simultaneously finding the relevant information from a long context and conducting (simple) reasoning is nearly impossible. In this paper, we envision a pathway from LLMs to AGI through the integration of \emph{memory}. We believe that AGI should be a system where LLMs serve as core processors. In addition to raw data, the memory in this system would store a large number of important conclusions derived from reasoning processes. Compared with retrieval-augmented generation (RAG) that merely processing raw data, this approach not only connects semantically related information closer, but also simplifies complex inferences at the time of querying. As an intermediate stage, the memory will likely be in the form of natural language descriptions, which can be directly consumed by users too. Ultimately, every agent/person should have its own large personal model, a deep neural network model (thus \emph{AI-native}) that parameterizes and compresses all types of memory, even the ones cannot be described by natural languages. Finally, we discuss the significant potential of AI-native memory as the transformative infrastructure for (proactive) engagement, personalization, distribution, and social in the AGI era, as well as the incurred privacy and security challenges with preliminary solutions.

Updated: 2024-07-19 02:37:42

标题: AI本地内存：从LLMs走向AGI的路径

摘要: 大型语言模型（LLMs）已经展示了人工通用智能（AGI）的火花。一种观点，特别是一些从事LLMs工作的初创企业，认为一个具有几乎无限上下文长度的LLM可以实现AGI。然而，他们可能对（现有）LLMs的长上下文能力过于乐观——（1）最近的文献表明，它们的有效上下文长度明显小于它们声称的上下文长度；（2）我们的“草堆中的推理”实验进一步证明，同时从长篇上下文中找到相关信息并进行（简单）推理几乎是不可能的。在本文中，我们展望了通过整合“记忆”从LLMs到AGI的路径。我们认为AGI应该是一个LLMs作为核心处理器的系统。除了原始数据，该系统中的记忆还将存储大量从推理过程中得出的重要结论。与仅处理原始数据的检索增强生成（RAG）相比，这种方法不仅将语义相关信息更紧密地连接起来，还简化了在查询时的复杂推理。作为一个中间阶段，记忆很可能是以自然语言描述的形式存在，这也可以直接被用户消费。最终，每个代理/个人应该拥有自己的大型个人模型，一个深度神经网络模型（因此是“AI本地化”），对所有类型的记忆进行参数化和压缩，甚至那些不能用自然语言描述的记忆。最后，我们讨论了AI本地化记忆作为AGI时代的变革基础设施在（主动）参与、个性化、分发和社交方面的巨大潜力，以及初步解决方案带来的隐私和安全挑战。

更新时间: 2024-07-19 02:37:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.18312v2

Commute-Time-Optimised Graphs for GNNs

We explore graph rewiring methods that optimise commute time. Recent graph rewiring approaches facilitate long-range interactions in sparse graphs, making such rewirings commute-time-optimal $\textit{on average}$. However, when an expert prior exists on which node pairs should or should not interact, a superior rewiring would favour short commute times between these privileged node pairs. We construct two synthetic datasets with known priors reflecting realistic settings, and use these to motivate two bespoke rewiring methods that incorporate the known prior. We investigate the regimes where our rewiring improves test performance on the synthetic datasets. Finally, we perform a case study on a real-world citation graph to investigate the practical implications of our work.

Updated: 2024-07-19 02:36:08

标题: 为GNNs优化通勤时间的图网络

摘要: 我们探讨了优化通勤时间的图重连方法。最近的图重连方法促进了稀疏图中的远距离交互，使这种重连在平均情况下是通勤时间最优的。然而，当存在专家先验知识以确定哪些节点对应该或不应该进行交互时，更优秀的重连应该偏向于这些特权节点对之间的短通勤时间。我们构建了两个具有已知先验知识的合成数据集，反映了现实情境，并利用这些数据集激发了两种定制的重连方法，将已知先验知识纳入其中。我们研究了我们的重连在合成数据集上改善测试性能的情况。最后，我们对一个真实的引用图进行案例研究，探讨了我们工作的实际影响。

更新时间: 2024-07-19 02:36:08

领域: cs.SI,cs.LG

下载: http://arxiv.org/abs/2407.08762v2

Enhancing Data-Limited Graph Neural Networks by Actively Distilling Knowledge from Large Language Models

Graphs have emerged as critical data structures for content analysis in various domains, such as social network analysis, bioinformatics, and recommendation systems. Node classification, a fundamental task in this context, is typically tackled using graph neural networks (GNNs). Unfortunately, conventional GNNs still face challenges in scenarios with few labeled nodes, despite the prevalence of few-shot node classification tasks in real-world applications. To address this challenge, various approaches have been proposed, including graph meta-learning, transfer learning, and methods based on Large Language Models (LLMs). However, traditional meta-learning and transfer learning methods often require prior knowledge from base classes or fail to exploit the potential advantages of unlabeled nodes. Meanwhile, LLM-based methods may overlook the zero-shot capabilities of LLMs and rely heavily on the quality of generated contexts. In this paper, we propose a novel approach that integrates LLMs and GNNs, leveraging the zero-shot inference and reasoning capabilities of LLMs and employing a Graph-LLM-based active learning paradigm to enhance GNNs' performance. Extensive experiments demonstrate the effectiveness of our model in improving node classification accuracy with considerably limited labeled data, surpassing state-of-the-art baselines by significant margins.

Updated: 2024-07-19 02:34:10

标题: 通过积极从大型语言模型中提炼知识来增强数据有限的图神经网络

摘要: 图形已经成为各个领域中内容分析的关键数据结构，例如社交网络分析、生物信息学和推荐系统。节点分类是这种情境中的一个基本任务，通常使用图神经网络（GNNs）来解决。然而，尽管在现实世界应用中存在很多少标记节点分类任务，传统的GNNs仍然面临挑战。为了解决这一挑战，提出了各种方法，包括图元学习、迁移学习和基于大型语言模型（LLMs）的方法。然而，传统的元学习和迁移学习方法通常需要来自基类的先验知识，或者无法充分利用未标记节点的潜在优势。同时，基于LLMs的方法可能忽视LLMs的零样本能力，并且过度依赖生成上下文的质量。本文提出了一种新的方法，将LLMs和GNNs进行整合，利用LLMs的零样本推理和推理能力，并采用基于图形LLM的主动学习范式来提升GNNs的性能。大量实验表明，我们的模型在提高节点分类准确性方面具有显著优势，超过了现有技术水平的基线。

更新时间: 2024-07-19 02:34:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.13989v1

Resilient Consensus Sustained Collaboratively

Decentralized systems built around blockchain technology promise clients an immutable ledger. They add a transaction to the ledger after it undergoes consensus among the replicas that run a Proof-of-Stake (PoS) or Byzantine Fault-Tolerant (BFT) consensus protocol. Unfortunately, these protocols face a long-range attack where an adversary having access to the private keys of the replicas can rewrite the ledger. One solution is forcing each committed block from these protocols to undergo another consensus, Proof-of-Work(PoW) consensus; PoW protocol leads to wastage of computational resources as miners compete to solve complex puzzles. In this paper, we present the design of our Power-of-Collaboration (PoC) protocol, which guards existing PoS/BFT blockchains against long-range attacks and requires miners to collaborate rather than compete. PoC guarantees fairness and accountability and only marginally degrades the throughput of the underlying system.

Updated: 2024-07-19 02:29:07

标题: 抗干扰一致性：协作维持下的稳健共识

摘要: 基于区块链技术构建的去中心化系统承诺客户一个不可变的账本。在经过运行权益证明（PoS）或拜占庭容错（BFT）共识协议的副本之间达成共识后，它们将交易添加到账本中。不幸的是，这些协议面临着一种长程攻击，即一个拥有副本私钥访问权限的对手可以重写账本。其中一种解决方案是强制这些协议中的每个已确认区块再经过另一个共识，即工作量证明（PoW）共识；PoW协议导致计算资源的浪费，因为矿工竞争解决复杂的谜题。在本文中，我们提出了我们的“合作力量”（PoC）协议的设计，该协议保护现有的PoS/BFT区块链免受长程攻击，并要求矿工合作而不是竞争。PoC保证了公平性和问责制，并仅对基础系统的吞吐量略有降低。

更新时间: 2024-07-19 02:29:07

领域: cs.CR,cs.DB,cs.DC

下载: http://arxiv.org/abs/2302.02325v5

Decomposed Direct Preference Optimization for Structure-Based Drug Design

Diffusion models have achieved promising results for Structure-Based Drug Design (SBDD). Nevertheless, high-quality protein subpocket and ligand data are relatively scarce, which hinders the models' generation capabilities. Recently, Direct Preference Optimization (DPO) has emerged as a pivotal tool for the alignment of generative models such as large language models and diffusion models, providing greater flexibility and accuracy by directly aligning model outputs with human preferences. Building on this advancement, we introduce DPO to SBDD in this paper. We tailor diffusion models to pharmaceutical needs by aligning them with elaborately designed chemical score functions. We propose a new structure-based molecular optimization method called DecompDPO, which decomposes the molecule into arms and scaffolds and performs preference optimization at both local substructure and global molecule levels, allowing for more precise control with fine-grained preferences. Notably, DecompDPO can be effectively used for two main purposes: (1) fine-tuning pretrained diffusion models for molecule generation across various protein families, and (2) molecular optimization given a specific protein subpocket after generation. Extensive experiments on the CrossDocked2020 benchmark show that DecompDPO significantly improves model performance in both molecule generation and optimization, with up to 100% Median High Affinity and a 54.9% Success Rate.

Updated: 2024-07-19 02:12:25

标题: 基于结构的药物设计的分解直接偏好优化

摘要: 扩散模型在基于结构的药物设计（SBDD）中取得了令人期待的结果。然而，高质量的蛋白亚口袋和配体数据相对稀缺，这限制了模型的生成能力。最近，直接优先选择优化（DPO）已经成为一种对生成模型（如大型语言模型和扩散模型）进行对齐的关键工具，通过直接将模型输出与人类偏好对齐，提供了更大的灵活性和准确性。在这一进展的基础上，本文介绍了DPO在SBDD中的应用。我们通过将扩散模型与精心设计的化学评分函数进行对齐，为药物制造需求量身定制了新的基于结构的分子优化方法，称为DecompDPO。该方法将分子分解为臂和支架，并在局部亚结构和全局分子水平上进行偏好优化，允许更精确地控制细粒度偏好。值得注意的是，DecompDPO可以有效用于两个主要目的：（1）对各种蛋白质家族的分子生成进行预训练扩散模型的微调，以及（2）在生成后给定特定蛋白亚口袋的分子优化。对CrossDocked2020基准进行的广泛实验表明，DecompDPO显著提高了模型在分子生成和优化方面的性能，最高可达100%的中位高亲和力和54.9%的成功率。

更新时间: 2024-07-19 02:12:25

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2407.13981v1

Byzantine-tolerant distributed learning of finite mixture models

This paper proposes two split-and-conquer (SC) learning estimators for finite mixture models that are tolerant to Byzantine failures. In SC learning, individual machines obtain local estimates, which are then transmitted to a central server for aggregation. During this communication, the server may receive malicious or incorrect information from some local machines, a scenario known as Byzantine failures. While SC learning approaches have been devised to mitigate Byzantine failures in statistical models with Euclidean parameters, developing Byzantine-tolerant methods for finite mixture models with non-Euclidean parameters requires a distinct strategy. Our proposed distance-based methods are hyperparameter tuning free, unlike existing methods, and are resilient to Byzantine failures while achieving high statistical efficiency. We validate the effectiveness of our methods both theoretically and empirically via experiments on simulated and real data from machine learning applications for digit recognition. The code for the experiment can be found at https://github.com/SarahQiong/RobustSCGMM.

Updated: 2024-07-19 02:11:26

标题: 拜占庭容错的有限混合模型分布式学习

摘要: 这篇论文提出了两种针对有限混合模型的分割与征服（SC）学习估计器，这些估计器对拜占庭故障具有容忍性。在SC学习中，各个机器获得局部估计，然后将其传输到中央服务器进行聚合。在此通信过程中，服务器可能会收到一些局部机器发送的恶意或不正确的信息，这种情况被称为拜占庭故障。虽然已经设计了SC学习方法来减轻具有欧几里德参数的统计模型中的拜占庭故障，但为具有非欧几里德参数的有限混合模型开发具有拜占庭容忍性的方法需要一种不同的策略。我们提出的基于距离的方法无需调整超参数，不同于现有方法，并且在实现高统计效率的同时对拜占庭故障具有弹性。我们通过模拟和实际数据实验在机器学习应用中的数字识别验证了我们方法的有效性。实验的代码可以在https://github.com/SarahQiong/RobustSCGMM 找到。

更新时间: 2024-07-19 02:11:26

领域: stat.ME,cs.LG,stat.ML,G.3; I.5.3

下载: http://arxiv.org/abs/2407.13980v1

Region-centric Image-Language Pretraining for Open-Vocabulary Detection

We present a new open-vocabulary detection approach based on region-centric image-language pretraining to bridge the gap between image-level pretraining and open-vocabulary object detection. At the pretraining phase, we incorporate the detector architecture on top of the classification backbone, which better serves the region-level recognition needs of detection by enabling the detector heads to learn from large-scale image-text pairs. Using only standard contrastive loss and no pseudo-labeling, our approach is a simple yet effective extension of the contrastive learning method to learn emergent object-semantic cues. In addition, we propose a shifted-window learning approach upon window attention to make the backbone representation more robust, translation-invariant, and less biased by the window pattern. On the popular LVIS open-vocabulary detection benchmark, our approach sets a new state of the art of 37.6 mask APr using the common ViT-L backbone and public LAION dataset, and 40.5 mask APr using the DataComp-1B dataset, significantly outperforming the best existing approach by +3.7 mask APr at system level. On the COCO benchmark, we achieve very competitive 39.6 novel AP without pseudo labeling or weak supervision. In addition, we evaluate our approach on the transfer detection setup, where it demonstrates notable improvement over the baseline. Visualization reveals emerging object locality from the pretraining recipes compared to the baseline.

Updated: 2024-07-19 02:11:04

标题: 区域中心图像-语言预训练用于开放词汇检测

摘要: 我们提出了一种基于区域中心图像语言预训练的新型开放词汇检测方法，以弥合图像级预训练和开放词汇对象检测之间的差距。在预训练阶段，我们在分类骨干结构上引入了检测器架构，这样可以更好地满足检测的区域级识别需求，使检测器头能够从大规模的图像-文本对中学习。我们的方法仅使用标准对比损失，没有伪标记，是对对比学习方法的简单而有效的扩展，用于学习新出现的对象语义线索。此外，我们提出了一种基于窗口注意力的平移窗口学习方法，使骨干表示更加健壮、平移不变，并减少窗口模式的偏差。在流行的LVIS开放词汇检测基准上，我们的方法使用普通的ViT-L骨干和公共LAION数据集取得了新的37.6 mask APr的最新成果，并使用DataComp-1B数据集取得了40.5 mask APr的成果，显著优于现有最佳方法的+3.7 mask APr。在COCO基准上，我们在没有伪标记或弱监督的情况下取得了非常有竞争力的39.6新的AP。此外，我们在转移检测设置上评估了我们的方法，在这种设置下，它相对于基线表现出显著的改进。可视化结果显示，与基线相比，预训练配方中出现了新的对象局部性。

更新时间: 2024-07-19 02:11:04

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.00161v2

Truthfulness of Calibration Measures

We initiate the study of the truthfulness of calibration measures in sequential prediction. A calibration measure is said to be truthful if the forecaster (approximately) minimizes the expected penalty by predicting the conditional expectation of the next outcome, given the prior distribution of outcomes. Truthfulness is an important property of calibration measures, ensuring that the forecaster is not incentivized to exploit the system with deliberate poor forecasts. This makes it an essential desideratum for calibration measures, alongside typical requirements, such as soundness and completeness. We conduct a taxonomy of existing calibration measures and their truthfulness. Perhaps surprisingly, we find that all of them are far from being truthful. That is, under existing calibration measures, there are simple distributions on which a polylogarithmic (or even zero) penalty is achievable, while truthful prediction leads to a polynomial penalty. Our main contribution is the introduction of a new calibration measure termed the Subsampled Smooth Calibration Error (SSCE) under which truthful prediction is optimal up to a constant multiplicative factor.

Updated: 2024-07-19 02:07:55

标题: 校准测量结果的真实性

摘要: 我们开始研究顺序预测中校准度量的真实性。如果一个校准度量是真实的，那么预测者会通过预测下一个结果的条件期望值（在给定先前结果分布的情况下）来（近似地）最小化预期惩罚。真实性是校准度量的一个重要属性，确保预测者没有动机利用系统进行故意糟糕的预测。这使得真实性成为校准度量的一个基本需求，同时还有典型的要求，如完备性和正确性。我们对现有校准度量及其真实性进行了分类。也许令人惊讶的是，我们发现所有这些度量都远非真实。也就是说，在现有的校准度量下，存在一些简单的分布，可以实现对数多项式（甚至是零）的惩罚，而真实的预测会导致多项式的惩罚。我们的主要贡献是引入了一种新的校准度量，称为子采样平滑校准误差（SSCE），在这种度量下，真实的预测最优，最多只会有一个常数乘法因子。

更新时间: 2024-07-19 02:07:55

领域: cs.LG,cs.DS,stat.ML

下载: http://arxiv.org/abs/2407.13979v1

Double Gradient Reversal Network for Single-Source Domain Generalization in Multi-mode Fault Diagnosis

Domain generalization achieves fault diagnosis on unseen modes. In process industrial systems, fault samples are limited, and only single-mode fault data can be obtained. Extracting domain-invariant fault features from single-mode data for unseen mode fault diagnosis poses challenges. Existing methods utilize a generator module to simulate samples of unseen modes. However, multi-mode samples contain complex spatiotemporal information, which brings significant difficulties to accurate sample generation. Therefore, double gradient reversal network (DGRN) is proposed. First, the model is pre-trained to acquire fault knowledge from the single seen mode. Then, pseudo-fault feature generation strategy is designed by Adaptive instance normalization, to simulate fault features of unseen mode. The dual adversarial training strategy is created to enhance the diversity of pseudo-fault features, which models unseen modes with significant distribution differences. Subsequently, domain-invariant feature extraction strategy is constructed by contrastive learning and adversarial learning. This strategy extracts common features of faults and helps multi-mode fault diagnosis. Finally, the experiments were conducted on Tennessee Eastman process and continuous stirred-tank reactor. The experiments demonstrate that DGRN achieves high classification accuracy on unseen modes while maintaining a small model size.

Updated: 2024-07-19 02:06:41

标题: 双梯度反转网络用于多模态故障诊断中的单源域泛化

摘要: 域泛化实现了对未见模式的故障诊断。在工业过程系统中，故障样本有限，只能获取单一模式的故障数据。从单一模式数据中提取域不变故障特征，用于未见模式故障诊断，面临挑战。现有方法利用生成器模块模拟未见模式的样本。然而，多模式样本包含复杂的时空信息，给准确样本生成带来显著困难。因此，提出了双梯度反转网络（DGRN）。首先，模型经过预训练，从单一可见模式中获得故障知识。然后，通过自适应实例规范化设计伪故障特征生成策略，模拟未见模式的故障特征。创建双对抗训练策略，增强伪故障特征的多样性，模拟具有显著分布差异的未见模式。随后，通过对比学习和对抗学习构建域不变特征提取策略。该策略提取故障的共同特征，帮助多模式故障诊断。最后，在Tennessee Eastman过程和连续搅拌反应器上进行实验。实验表明，DGRN在未见模式上实现了高分类准确性，同时保持较小的模型大小。

更新时间: 2024-07-19 02:06:41

领域: cs.LG

下载: http://arxiv.org/abs/2407.13978v1

A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits

We present a unified likelihood ratio-based confidence sequence (CS) for any (self-concordant) generalized linear models (GLMs) that is guaranteed to be convex and numerically tight. We show that this is on par or improves upon known CSs for various GLMs, including Gaussian, Bernoulli, and Poisson. In particular, for the first time, our CS for Bernoulli has a poly(S)-free radius where S is the norm of the unknown parameter. Our first technical novelty is its derivation, which utilizes a time-uniform PAC-Bayesian bound with a uniform prior/posterior, despite the latter being a rather unpopular choice for deriving CSs. As a direct application of our new CS, we propose a simple and natural optimistic algorithm called OFUGLB applicable to any generalized linear bandits (GLB; Filippi et al. (2010)). Our analysis shows that the celebrated optimistic approach simultaneously attains state-of-the-art regrets for various self-concordant (not necessarily bounded) GLBs, and even poly(S)-free for bounded GLBs, including logistic bandits. The regret analysis, our second technical novelty, follows from combining our new CS with a new proof technique that completely avoids the previously widely used self-concordant control lemma (Faury et al., 2020, Lemma 9). Finally, we verify numerically that OFUGLB significantly outperforms the prior state-of-the-art (Lee et al., 2024) for logistic bandits.

Updated: 2024-07-19 02:06:08

标题: 广义线性模型的统一置信序列，及其在赌博机中的应用

摘要: 我们提出了一个统一的基于似然比的置信区间序列（CS）,适用于任何（自协调的）广义线性模型（GLMs），保证是凸的和数值上紧的。我们展示了这与已知的各种GLMs（包括高斯、伯努利和泊松）的CS相媲美或改进。特别是，我们的伯努利CS首次具有一个与未知参数的范数S无关的半径。我们的第一个技术创新在于它的推导，利用了一个带有均匀先验/后验的时间均匀PAC-Bayesian界，尽管后者在推导CS时并不是一个很受欢迎的选择。作为我们新CS的直接应用，我们提出了一个简单自然的乐观算法OFUGLB，适用于任何广义线性赌博机（GLB; Filippi等人（2010））。我们的分析显示，这种乐观的方法同时实现了各种自协调（不一定有界）的GLBs的最先进的后悔值，甚至对于有界的GLBs，包括逻辑赌博机，也是与范数S无关的。我们的第二个技术创新是后悔分析，通过将我们的新CS与一个完全避免先前广泛使用的自协调控制引理（Faury等人，2020，引理9）结合起来。最后，我们在数值上验证，OFUGLB相对于逻辑赌博机的先前最先进技术（Lee等人，2024）有显著的性能提升。

更新时间: 2024-07-19 02:06:08

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.13977v1

PG-Rainbow: Using Distributional Reinforcement Learning in Policy Gradient Methods

This paper introduces PG-Rainbow, a novel algorithm that incorporates a distributional reinforcement learning framework with a policy gradient algorithm. Existing policy gradient methods are sample inefficient and rely on the mean of returns when calculating the state-action value function, neglecting the distributional nature of returns in reinforcement learning tasks. To address this issue, we use an Implicit Quantile Network that provides the quantile information of the distribution of rewards to the critic network of the Proximal Policy Optimization algorithm. We show empirical results that through the integration of reward distribution information into the policy network, the policy agent acquires enhanced capabilities to comprehensively evaluate the consequences of potential actions in a given state, facilitating more sophisticated and informed decision-making processes. We evaluate the performance of the proposed algorithm in the Atari-2600 game suite, simulated via the Arcade Learning Environment (ALE).

Updated: 2024-07-19 02:00:01

标题: PG-Rainbow: 在策略梯度方法中使用分布式强化学习

摘要: 这篇论文介绍了一种新颖的算法PG-Rainbow，该算法将分布式强化学习框架与策略梯度算法相结合。现有的策略梯度方法样本效率低，计算状态-动作值函数时依赖于回报的平均值，忽视了强化学习任务中回报的分布性质。为了解决这个问题，我们使用了一种隐式分位数网络，提供了奖励分布的分位数信息给Proximal Policy Optimization算法的评论网络。我们展示了实证结果，通过将奖励分布信息整合到策略网络中，策略代理获得了更强大的能力，全面评估在给定状态下潜在行动的后果，促进了更复杂和明智的决策过程。我们在Atari-2600游戏套件中通过Arcade Learning Environment（ALE）模拟评估了所提出算法的性能。

更新时间: 2024-07-19 02:00:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.13146v2

Historical Ink: Semantic Shift Detection for 19th Century Spanish

This paper explores the evolution of word meanings in 19th-century Spanish texts, with an emphasis on Latin American Spanish, using computational linguistics techniques. It addresses the Semantic Shift Detection (SSD) task, which is crucial for understanding linguistic evolution, particularly in historical contexts. The study focuses on analyzing a set of Spanish target words. To achieve this, a 19th-century Spanish corpus is constructed, and a customizable pipeline for SSD tasks is developed. This pipeline helps find the senses of a word and measure their semantic change between two corpora using fine-tuned BERT-like models with old Spanish texts for both Latin American and general Spanish cases. The results provide valuable insights into the cultural and societal shifts reflected in language changes over time.

Updated: 2024-07-19 01:54:26

标题: 《历史墨水：19世纪西班牙语语义转变检测》

摘要: 本文探讨了19世纪西班牙文本中词义的演变，重点关注拉丁美洲西班牙语，使用计算语言学技术。它涉及语义转移检测（SSD）任务，这对于理解语言演变尤其是在历史背景下至关重要。该研究侧重于分析一组西班牙目标词。为实现这一目标，构建了一个19世纪的西班牙语语料库，并开发了一个可定制的用于SSD任务的流程。该流程有助于使用细调的类似BERT的模型在拉丁美洲和一般西班牙语情况下的古老西班牙文本中找到一个词的意义，并衡量它们在两个语料库之间的语义变化。结果为我们提供了有关随时间反映在语言变化中的文化和社会转变的宝贵见解。

更新时间: 2024-07-19 01:54:26

领域: cs.CL,cs.AI,I.2.7

下载: http://arxiv.org/abs/2407.12852v2

KAN-ODEs: Kolmogorov-Arnold Network Ordinary Differential Equations for Learning Dynamical Systems and Hidden Physics

Kolmogorov-Arnold networks (KANs) as an alternative to multi-layer perceptrons (MLPs) are a recent development demonstrating strong potential for data-driven modeling. This work applies KANs as the backbone of a neural ordinary differential equation (ODE) framework, generalizing their use to the time-dependent and temporal grid-sensitive cases often seen in dynamical systems and scientific machine learning applications. The proposed KAN-ODEs retain the flexible dynamical system modeling framework of Neural ODEs while leveraging the many benefits of KANs compared to MLPs, including higher accuracy and faster neural scaling, stronger interpretability and generalizability, and lower parameter counts. First, we quantitatively demonstrated these improvements in a comprehensive study of the classical Lotka-Volterra predator-prey model. We then showcased the KAN-ODE framework's ability to learn symbolic source terms and complete solution profiles in higher-complexity and data-lean scenarios including wave propagation and shock formation, the complex Schr\"odinger equation, and the Allen-Cahn phase separation equation. The successful training of KAN-ODEs, and their improved performance compared to traditional Neural ODEs, implies significant potential in leveraging this novel network architecture in myriad scientific machine learning applications for discovering hidden physics and predicting dynamic evolution.

Updated: 2024-07-19 01:36:34

标题: KAN-ODEs：Kolmogorov-Arnold网络常微分方程用于学习动态系统和隐藏的物理

摘要: Kolmogorov-Arnold网络（KANs）作为多层感知器（MLPs）的替代方案，是最近展示出强大数据驱动建模潜力的一个发展。本文将KANs应用作为神经常微分方程（ODE）框架的主干，将它们的使用推广到经常在动力系统和科学机器学习应用中看到的时间相关和时间网格敏感的情况。提出的KAN-ODEs保留了神经ODE的灵活动态系统建模框架，同时利用了与MLPs相比KANs的许多优势，包括更高的准确性和更快的神经扩展，更强的可解释性和泛化性，以及更低的参数计数。首先，我们在对经典的Lotka-Volterra捕食-被捕食者模型进行全面研究中定量证明了这些改进。然后，我们展示了KAN-ODE框架在包括波传播和激波形成、复杂的Schrödinger方程以及Allen-Cahn相分离方程在内的高复杂度和数据稀缺情况下学习符号源项和完整解决方案轮廓的能力。成功训练KAN-ODEs以及它们相对于传统神经ODEs的改进性能，意味着在众多科学机器学习应用中利用这种新颖的网络架构发现隐藏的物理学和预测动态演化的巨大潜力。

更新时间: 2024-07-19 01:36:34

领域: cs.LG,I.6.5; G.1.7

下载: http://arxiv.org/abs/2407.04192v2

Private Mean Estimation with Person-Level Differential Privacy

We study person-level differentially private (DP) mean estimation in the case where each person holds multiple samples. DP here requires the usual notion of distributional stability when $\textit{all}$ of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that \[n = \tilde \Theta\left(\frac{d}{\alpha^2 m} + \frac{d}{\alpha m^{1/2} \varepsilon} + \frac{d}{\alpha^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\] people are necessary and sufficient to estimate the mean up to distance $\alpha$ in $\ell_2$-norm under $\varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the standard clip-and-noise framework, but the analysis for our setting requires both new algorithmic techniques and new analyses. In particular, our new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables may be of interest.

Updated: 2024-07-19 01:35:15

标题: 个人水平差分隐私下的私有均值估计

摘要: 我们研究了个体级别的差分隐私（DP）均值估计，其中每个个体持有多个样本。在这里，DP要求在$\textit{所有}$个体的数据点都可以被修改时满足通常的分布稳定性概念。简而言之，如果$n$个人每个人从一个未知的$d$维分布中抽取了$m$个样本，并且该分布具有有界的$k$阶矩，我们证明了\[n = \tilde \Theta\left(\frac{d}{\alpha^2 m} + \frac{d}{\alpha m^{1/2} \varepsilon} + \frac{d}{\alpha^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\]个人是估计均值在$\ell_2$范数下到距离$\alpha$的必要和充分条件，以达到$\varepsilon$-差分隐私（及其常见的放松）。在多变量设置中，我们提出了基于近似-DP的计算有效算法和基于纯DP的计算低效算法，我们的几乎匹配的下界适用于近似DP的最宽松情况。我们的计算有效估计器基于标准的剪裁和噪声框架，但我们的设置分析需要新的算法技术和新分析。特别是，我们对独立、矢量值、有界矩随机变量之和的尾部的新界限可能会引起兴趣。

更新时间: 2024-07-19 01:35:15

领域: cs.DS,cs.CR,cs.IT,cs.LG,math.IT,stat.ML

下载: http://arxiv.org/abs/2405.20405v3

Why Does New Knowledge Create Messy Ripple Effects in LLMs?

Extensive previous research has focused on post-training knowledge editing (KE) for language models (LMs) to ensure that knowledge remains accurate and up-to-date. One desired property and open question in KE is to let edited LMs correctly handle ripple effects, where LM is expected to answer its logically related knowledge accurately. In this paper, we answer the question of why most KE methods still create messy ripple effects. We conduct extensive analysis and identify a salient indicator, GradSim, that effectively reveals when and why updated knowledge ripples in LMs. GradSim is computed by the cosine similarity between gradients of the original fact and its related knowledge. We observe a strong positive correlation between ripple effect performance and GradSim across different LMs, KE methods, and evaluation metrics. Further investigations into three counter-intuitive failure cases (Negation, Over-Ripple, Multi-Lingual) of ripple effects demonstrate that these failures are often associated with very low GradSim. This finding validates that GradSim is an effective indicator of when knowledge ripples in LMs.

Updated: 2024-07-19 01:33:56

标题: 为什么新知识在LLMs中会产生混乱的连锁反应？

摘要: 先前的大量研究集中在为语言模型（LMs）进行后期知识编辑（KE），以确保知识保持准确和最新。在KE中一个期望的性质和开放问题是让编辑后的LM能够正确处理涟漪效应，即LM被期望能够准确回答其逻辑相关的知识。本文回答了为什么大多数KE方法仍然会造成混乱的涟漪效应的问题。我们进行了广泛的分析，确定了一个显著的指标GradSim，有效地揭示了更新后的知识何时以及为什么会在LMs中产生涟漪效应。GradSim通过原始事实及其相关知识的梯度之间的余弦相似性来计算。我们观察到在不同LMs、KE方法和评估指标之间涟漪效应表现和GradSim之间存在强烈的正相关性。对涟漪效应的三种违反直觉的失败案例（否定、过度涟漪、多语言）的进一步调查表明，这些失败通常与非常低的GradSim相关联。这一发现验证了GradSim是LMs中知识涟漪发生时的有效指标。

更新时间: 2024-07-19 01:33:56

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.12828v2

Optimizing Agricultural Order Fulfillment Systems: A Hybrid Tree Search Approach

Efficient order fulfillment is vital in the agricultural industry, particularly due to the seasonal nature of seed supply chains. This paper addresses the challenge of optimizing seed orders fulfillment in a centralized warehouse where orders are processed in waves, taking into account the unpredictable arrival of seed stocks and strict order deadlines. We model the wave scheduling problem as a Markov decision process and propose an adaptive hybrid tree search algorithm that combines Monte Carlo tree search with domain-specific knowledge to efficiently navigate the complex, dynamic environment of seed distribution. By leveraging historical data and stochastic modeling, our method enables forecast-informed scheduling decisions that balance immediate requirements with long-term operational efficiency. The key idea is that we can augment Monte Carlo tree search algorithm with problem-specific side information that dynamically reduces the number of candidate actions at each decision step to handle the large state and action spaces that render traditional solution methods computationally intractable. Extensive simulations with realistic parameters-including a diverse range of products, a high volume of orders, and authentic seasonal durations-demonstrate that the proposed approach significantly outperforms existing industry standard methods.

Updated: 2024-07-19 01:25:39

标题: 优化农业订单履约系统：一种混合树搜索方法

摘要: 高效的订单履行对农业行业至关重要，尤其是由于种子供应链具有季节性特点。本文讨论了在中央仓库中优化种子订单履行的挑战，其中订单以波次方式处理，考虑到种子库存的不可预测到货和严格的订单截止日期。我们将波次调度问题建模为马尔可夫决策过程，并提出了一种自适应混合树搜索算法，将蒙特卡洛树搜索与领域特定知识相结合，以有效地应对种子分配的复杂、动态环境。通过利用历史数据和随机建模，我们的方法能够实现基于预测的调度决策，平衡即时需求和长期运营效率。关键思想是，我们可以利用问题特定的侧边信息来动态减少每个决策步骤的候选操作数量，以处理大量状态和行动空间，从而使传统解决方法在计算上变得难以处理。基于真实参数的大量模拟实验，包括各种产品、大量订单和真实季节持续时间，表明所提出的方法明显优于现有行业标准方法。

更新时间: 2024-07-19 01:25:39

领域: cs.AI

下载: http://arxiv.org/abs/2407.13968v1

Introduction to Online Nonstochastic Control

This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control. The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies. This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.

Updated: 2024-07-19 00:46:18

标题: 在线非随机控制简介

摘要: 这篇文本介绍了控制动态系统和可微强化学习中新兴范式在线非随机控制。这种新方法应用在线凸优化和凸松弛技术，以获得经典最优和鲁棒控制设置的新方法，并具有可证明的保证。在线非随机控制与其他框架的主要区别在于目标。在最优控制、鲁棒控制和其他假定随机噪声的控制方法中，目标是表现得与离线最优策略相当。在在线非随机控制中，成本函数以及来自假定的动态模型的扰动都是由对手选择的。因此，最优策略不是事先定义的。相反，目标是针对一个基准策略类中事后最佳策略达到低后悔。这个目标暗示了将在线凸优化的决策框架作为算法方法的使用。由此产生的方法基于迭代数学优化算法，并伴随着有限时间后悔和计算复杂性保证。

更新时间: 2024-07-19 00:46:18

领域: cs.LG,cs.RO,cs.SY,eess.SY,math.OC,stat.ML

下载: http://arxiv.org/abs/2211.09619v3

Bayesian Semi-supervised Multi-category Classification under Nonparanormality

Semi-supervised learning is a model training method that uses both labeled and unlabeled data. This paper proposes a fully Bayes semi-supervised learning algorithm that can be applied to any multi-category classification problem. We assume the labels are missing at random when using unlabeled data in a semi-supervised setting. Suppose we have $K$ classes in the data. We assume that the observations follow $K$ multivariate normal distributions depending on their true class labels after some common unknown transformation is applied to each component of the observation vector. The function is expanded in a B-splines series, and a prior is added to the coefficients. We consider a normal prior on the coefficients and constrain the values to meet the normality and identifiability constraints requirement. The precision matrices of the Gaussian distributions are given a conjugate Wishart prior, while the means are given the improper uniform prior. The resulting posterior is still conditionally conjugate, and the Gibbs sampler aided by a data-augmentation technique can thus be adopted. An extensive simulation study compares the proposed method with several other available methods. The proposed method is also applied to real datasets on diagnosing breast cancer and classification of signals. We conclude that the proposed method has a better prediction accuracy in various cases.

Updated: 2024-07-19 00:41:43

标题: 贝叶斯半监督多类别非正态条件下的分类

摘要: 半监督学习是一种模型训练方法，它同时利用有标签和无标签数据。本文提出了一种完全Bayes半监督学习算法，可应用于任何多类别分类问题。我们假设在半监督设置中使用无标签数据时，标签是随机缺失的。假设数据中有$K$个类别。我们假设观测值根据它们的真实类标签依赖于$K$个多元正态分布，在对观测向量的每个分量应用一些常见未知的变换后。函数在B样条系列中扩展，并对系数添加一个先验。我们考虑在系数上施加正态先验，并限制值以满足正态性和可识别性约束要求。高斯分布的精度矩阵具有共轭Wishart先验，而均值具有不合适的均匀先验。结果后验仍然是条件共轭的，通过数据增强技术辅助的吉布斯采样器可以被采用。一项广泛的模拟研究将所提出的方法与其他几种可用方法进行了比较。所提出的方法还应用于诊断乳腺癌和信号分类的真实数据集。我们得出结论，所提出的方法在各种情况下具有更好的预测准确性。

更新时间: 2024-07-19 00:41:43

领域: stat.ML,cs.LG,stat.AP

下载: http://arxiv.org/abs/2001.03798v3

The Group Robustness is in the Details: Revisiting Finetuning under Spurious Correlations

Modern machine learning models are prone to over-reliance on spurious correlations, which can often lead to poor performance on minority groups. In this paper, we identify surprising and nuanced behavior of finetuned models on worst-group accuracy via comprehensive experiments on four well-established benchmarks across vision and language tasks. We first show that the commonly used class-balancing techniques of mini-batch upsampling and loss upweighting can induce a decrease in worst-group accuracy (WGA) with training epochs, leading to performance no better than without class-balancing. While in some scenarios, removing data to create a class-balanced subset is more effective, we show this depends on group structure and propose a mixture method which can outperform both techniques. Next, we show that scaling pretrained models is generally beneficial for worst-group accuracy, but only in conjuction with appropriate class-balancing. Finally, we identify spectral imbalance in finetuning features as a potential source of group disparities -- minority group covariance matrices incur a larger spectral norm than majority groups once conditioned on the classes. Our results show more nuanced interactions of modern finetuned models with group robustness than was previously known. Our code is available at https://github.com/tmlabonte/revisiting-finetuning.

Updated: 2024-07-19 00:34:03

标题: 群体的鲁棒性在于细节：重新审视在假关联下的微调

摘要: 现代机器学习模型往往过度依赖虚假相关性，这经常会导致对少数群体的表现不佳。在本文中，我们通过对视觉和语言任务中四个知名基准数据集进行全面实验，识别了微调模型在最差群体准确率上出现的令人惊讶和微妙的行为。我们首先展示了常用的微批量上采样和损失加权类平衡技术在训练轮次中可能导致最差群体准确率(WGA)下降，导致性能不如没有类平衡的情况。在某些情况下，移除数据以创建一个类平衡子集更有效，我们表明这取决于群体结构，并提出了一种混合方法，可以胜过这两种技术。接下来，我们展示了对预训练模型进行缩放通常对最差群体准确率有益，但只有在适当的类平衡情况下才有效。最后，我们确定微调特征中的谱不平衡作为群体差异的潜在来源 -- 一旦在类的条件下，少数群体的协方差矩阵比多数群体拥有更大的谱范数。我们的结果显示了现代微调模型与群体鲁棒性之间比以往更为复杂的相互作用。我们的代码可在https://github.com/tmlabonte/revisiting-finetuning找到。

更新时间: 2024-07-19 00:34:03

领域: cs.LG

下载: http://arxiv.org/abs/2407.13957v1

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning

In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning methods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: neural networks trained with experience replay tend to have unstable optimization trajectories, impeding their overall accuracy. Surprisingly, these instabilities persist even when the replay buffer stores all previous training examples, suggesting that this issue is orthogonal to catastrophic forgetting. We minimize these instabilities through a simple modification of the optimization geometry. Our solution, Layerwise Proximal Replay (LPR), balances learning from new and replay data while only allowing for gradual changes in the hidden activation of past data. We demonstrate that LPR consistently improves replay-based online continual learning methods across multiple problem settings, regardless of the amount of available replay memory.

Updated: 2024-07-19 00:33:49

标题: 逐层近端重播：一种用于在线持续学习的近端点方法

摘要: 在在线连续学习中，神经网络逐渐从非i.i.d.数据流中学习。几乎所有在线连续学习方法都利用经验重现来同时防止灾难性遗忘和过拟合过去数据。我们的工作展示了这种方法的局限性：使用经验重现训练的神经网络往往具有不稳定的优化轨迹，阻碍了其整体准确性。令人惊讶的是，即使重现缓冲区存储了所有先前的训练样本，这些不稳定性仍然存在，表明这个问题与灾难性遗忘是正交的。我们通过对优化几何形态进行简单修改来最小化这些不稳定性。我们的解决方案，Layerwise Proximal Replay (LPR)，在只允许对过去数据的隐藏激活进行渐变变化的同时，平衡了新数据和重现数据的学习。我们证明，LPR在多个问题设置中始终改进了基于重现的在线连续学习方法，而不受可用重现内存量的影响。

更新时间: 2024-07-19 00:33:49

领域: cs.LG

下载: http://arxiv.org/abs/2402.09542v3

Diffusion Models for Offline Multi-agent Reinforcement Learning with Safety Constraints

In recent advancements in Multi-agent Reinforcement Learning (MARL), its application has extended to various safety-critical scenarios. However, most methods focus on online learning, which presents substantial risks when deployed in real-world settings. Addressing this challenge, we introduce an innovative framework integrating diffusion models within the MARL paradigm. This approach notably enhances the safety of actions taken by multiple agents through risk mitigation while modeling coordinated action. Our framework is grounded in the Centralized Training with Decentralized Execution (CTDE) architecture, augmented by a Diffusion Model for prediction trajectory generation. Additionally, we incorporate a specialized algorithm to further ensure operational safety. We evaluate our model against baselines on the DSRL benchmark. Experiment results demonstrate that our model not only adheres to stringent safety constraints but also achieves superior performance compared to existing methodologies. This underscores the potential of our approach in advancing the safety and efficacy of MARL in real-world applications.

Updated: 2024-07-19 00:30:01

标题: 离线多智能体强化学习的扩散模型与安全约束

摘要: 在多智能体强化学习（MARL）领域的最新进展中，其应用已扩展到各种安全关键的场景。然而，大多数方法侧重于在线学习，在实际应用中存在重大风险。为了解决这一挑战，我们引入了一个创新框架，将扩散模型融入MARL范式中。这种方法显著增强了多个智能体采取行动时的安全性，通过风险缓解同时建模协调行动。我们的框架基于“集中训练，分散执行”（CTDE）架构，增加了用于预测轨迹生成的扩散模型。此外，我们还结合了一种专门的算法来进一步确保操作安全性。我们在DSRL基准上对我们的模型进行了评估。实验证明，与现有方法相比，我们的模型不仅符合严格的安全约束，而且在性能上表现更优秀。这突显了我们的方法在推动MARL在实际应用中的安全性和效果方面的潜力。

更新时间: 2024-07-19 00:30:01

领域: cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.00741v4

LionGuard: Building a Contextualized Moderation Classifier to Tackle Localized Unsafe Content

As large language models (LLMs) become increasingly prevalent in a wide variety of applications, concerns about the safety of their outputs have become more significant. Most efforts at safety-tuning or moderation today take on a predominantly Western-centric view of safety, especially for toxic, hateful, or violent speech. In this paper, we describe LionGuard, a Singapore-contextualized moderation classifier that can serve as guardrails against unsafe LLM outputs. When assessed on Singlish data, LionGuard outperforms existing widely-used moderation APIs, which are not finetuned for the Singapore context, by 14% (binary) and up to 51% (multi-label). Our work highlights the benefits of localization for moderation classifiers and presents a practical and scalable approach for low-resource languages.

Updated: 2024-07-19 00:27:42

标题: 狮卫队：构建一个上下文化的调节分类器来处理本地化的不安全内容

摘要: 随着大型语言模型（LLMs）在各种应用中变得越来越普遍，对其输出安全性的担忧变得更加重要。目前大多数关于安全调整或调节的努力主要采取了以西方为中心的安全观，特别是针对有毒、仇恨或暴力言论。在本文中，我们描述了LionGuard，这是一个新加坡情境化的调节分类器，可以作为防止不安全LLM输出的护栏。在对新加坡式英语数据进行评估时，LionGuard的表现优于现有广泛使用的未经新加坡情境微调的调节API，二元分类提高了14%，多标签分类提高了高达51%。我们的工作强调了情境化对调节分类器的好处，并提出了一种针对低资源语言的实用且可扩展的方法。

更新时间: 2024-07-19 00:27:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.10995v2

Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge

This paper investigates how to efficiently deploy vision transformers on edge devices. Recent methods reduce the latency of transformer neural networks by removing or merging tokens, with small accuracy degradation. However, these methods are not designed with edge device deployment in mind: jthey do not leverage information about the latency vs. workload trends to improve efficiency. First, we show the latency-workload size relationship is nonlinear for certain workload sizes. We consider this relationship to create a token pruning schedule. Second, we demonstrate a training-free, token pruning method utilizing this schedule. We show that for single batch inference, other methods increase latency by 18.6-30.3% with respect to baseline, while we can reduce it by 9%. For similar latency (within 5.2%) across devices we achieve 78.6%-84.5% ImageNet1K accuracy, while the state-of-the-art, Token Merging, achieves 45.8%-85.4%.

Updated: 2024-07-19 00:23:06

标题: 剪枝一个标记就足够了：利用边缘视觉变换器的延迟工作负载非线性特性

摘要: 本文研究了如何在边缘设备上高效部署视觉变换器。最近的方法通过删除或合并令牌来减少变换器神经网络的延迟，同时准确度略有降低。然而，这些方法并非旨在考虑边缘设备部署：它们没有利用延迟与工作负载趋势之间的信息来提高效率。首先，我们展示了对于某些工作负载大小，延迟-工作负载大小关系是非线性的。我们考虑这种关系来创建一个令牌修剪计划。其次，我们展示了一种无需训练的令牌修剪方法，利用这个计划。我们发现，对于单批次推断，其他方法相对于基线增加了18.6%-30.3%的延迟，而我们可以将其减少9%。对于设备之间相似的延迟（在5.2%之内），我们实现了78.6%-84.5%的ImageNet1K准确率，而最先进的Token Merging方法实现了45.8%-85.4%的准确率。

更新时间: 2024-07-19 00:23:06

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2407.05941v2

Differentially Private Latent Diffusion Models

Diffusion models (DMs) are one of the most widely used generative models for producing high quality images. However, a flurry of recent papers points out that DMs are least private forms of image generators, by extracting a significant number of near-identical replicas of training images from DMs. Existing privacy-enhancing techniques for DMs, unfortunately, do not provide a good privacy-utility tradeoff. In this paper, we aim to improve the current state of DMs with differential privacy (DP) by adopting the \textit{Latent} Diffusion Models (LDMs). LDMs are equipped with powerful pre-trained autoencoders that map the high-dimensional pixels into lower-dimensional latent representations, in which DMs are trained, yielding a more efficient and fast training of DMs. Rather than fine-tuning the entire LDMs, we fine-tune only the $\textit{attention}$ modules of LDMs with DP-SGD, reducing the number of trainable parameters by roughly $90\%$ and achieving a better privacy-accuracy trade-off. Our approach allows us to generate realistic, high-dimensional images (256x256) conditioned on text prompts with DP guarantees, which, to the best of our knowledge, has not been attempted before. Our approach provides a promising direction for training more powerful, yet training-efficient differentially private DMs, producing high-quality DP images. Our code is available at https://anonymous.4open.science/r/DP-LDM-4525.

Updated: 2024-07-19 00:21:06

标题: 差分私有潜在扩散模型

摘要: 扩散模型（DMs）是生产高质量图像最广泛使用的生成模型之一。然而，最近一系列论文指出，通过从DMs中提取大量近乎相同的训练图像副本，DMs是最不私密的图像生成器形式之一。现有的用于增强DMs隐私性的技术不幸地没有提供良好的隐私-效用权衡。在本文中，我们旨在通过采用\textit{潜在}扩散模型（LDMs）来改进当前DMs的状态，并引入差分隐私（DP）。LDMs配备了强大的预训练自动编码器，将高维像素映射为更低维的潜在表示，通过这种表示DMs进行训练，从而实现更有效和快速的训练。我们不是对整个LDMs进行微调，而是仅使用DP-SGD微调LDMs的$\textit{注意}$模块，将可训练参数数量减少约90％，并实现更好的隐私-准确性权衡。我们的方法允许我们生成基于文本提示具有DP保证的逼真高维图像（256x256），据我们所知，这之前尚未尝试过。我们的方法为训练更强大且训练效率高的差分私密DMs提供了一个有前途的方向，从而生产高质量的DP图像。我们的代码可在https://anonymous.4open.science/r/DP-LDM-4525 上找到。

更新时间: 2024-07-19 00:21:06

领域: stat.ML,cs.CR,cs.LG

下载: http://arxiv.org/abs/2305.15759v5

Samplable Anonymous Aggregation for Private Federated Data Analysis

We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Locally differentially private algorithms require little trust but are (provably) limited in their utility. Centrally differentially private algorithms can allow significantly better utility but require a trusted curator. This gap has led to significant interest in the design and implementation of simple cryptographic primitives, that can allow central-like utility guarantees without having to trust a central server. Our first contribution is to propose a new primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust assumptions it entails. {\em Shuffling} and {\em aggregation} primitives that have been proposed in earlier works enable this for some algorithms, but have significant limitations as primitives. We propose a {\em Samplable Anonymous Aggregation} primitive, which computes an aggregate over a random subset of the inputs and show that it leads to better privacy-utility trade-offs for various fundamental tasks. Secondly, we propose a system architecture that implements this primitive and perform a security analysis of the proposed system. Our design combines additive secret-sharing with anonymization and authentication infrastructures.

Updated: 2024-07-19 00:18:40

标题: 可抽样的匿名聚合用于私人联邦数据分析

摘要: 我们重新审视了在每个设备持有私有数据时，为私有统计和私有联邦学习设计可扩展协议的问题。本地差分隐私算法需要很少的信任，但在效用方面受到(可证明的)限制。中心差分隐私算法可以允许更好的效用，但需要一个可信的策展人。这种差距导致人们对简单加密原语的设计和实施产生了重大兴趣，这些原语可以在不需要信任中央服务器的情况下提供类似于中央的效用保证。我们的第一个贡献是提出一种新的原语，允许有效实现几种常用算法，并允许进行接近中心设置中的隐私核算，而不需要其所涉及的强烈信任假设。在早期作品中提出的“洗牌”和“聚合”原语使某些算法能够实现这一点，但作为原语具有显著的局限性。我们提出了一种“可抽样匿名聚合”原语，它计算输入的随机子集上的聚合，并展示了它对各种基本任务的隐私效用权衡更好。其次，我们提出了一个实现此原语的系统架构，并对所提出的系统进行了安全分析。我们的设计将加法秘密共享与匿名化和认证基础设施结合起来。

更新时间: 2024-07-19 00:18:40

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2307.15017v2

Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation

Given a real-world dataset, data condensation (DC) aims to synthesize a small synthetic dataset that captures the knowledge of a natural dataset while being usable for training models with comparable accuracy. Recent works propose to enhance DC with data parameterization, which condenses data into very compact parameterized data containers instead of images. The intuition behind data parameterization is to encode shared features of images to avoid additional storage costs. In this paper, we recognize that images share common features in a hierarchical way due to the inherent hierarchical structure of the classification system, which is overlooked by current data parameterization methods. To better align DC with this hierarchical nature and encourage more efficient information sharing inside data containers, we propose a novel data parameterization architecture, Hierarchical Memory Network (HMN). HMN stores condensed data in a three-tier structure, representing the dataset-level, class-level, and instance-level features. Another helpful property of the hierarchical architecture is that HMN naturally ensures good independence among images despite achieving information sharing. This enables instance-level pruning for HMN to reduce redundant information, thereby further minimizing redundancy and enhancing performance. We evaluate HMN on five public datasets and show that our proposed method outperforms all baselines.

Updated: 2024-07-19 00:14:59

标题: 利用层次特征共享来实现高效的数据集压缩

摘要: 鉴于真实世界数据集，数据压缩（DC）旨在合成一个小型的合成数据集，捕捉自然数据集的知识，同时可用于训练具有可比较准确度的模型。最近的研究提出通过数据参数化来增强DC，将数据压缩为非常紧凑的参数化数据容器，而不是图像。数据参数化的直觉是编码图像的共享特征，以避免额外的存储成本。本文认识到，由于分类系统的固有层次结构，图像以层次方式共享共同特征，这是当前数据参数化方法所忽略的。为了更好地使DC与这种层次性质保持一致，并鼓励数据容器内更有效的信息共享，我们提出了一种新颖的数据参数化架构，Hierarchical Memory Network (HMN)。HMN将压缩数据存储在一个三层结构中，代表数据集级别、类别级别和实例级别的特征。层次结构的另一个有用属性是HMN自然地确保图像之间良好的独立性，尽管实现了信息共享。这使得HMN可以进行实例级别的剪枝，减少冗余信息，从而进一步减少冗余并增强性能。我们在五个公共数据集上评估了HMN，并展示了我们提出的方法优于所有基线模型。

更新时间: 2024-07-19 00:14:59

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2310.07506v2

Neural topology optimization: the good, the bad, and the ugly

Neural networks (NNs) hold great promise for advancing inverse design via topology optimization (TO), yet misconceptions about their application persist. This article focuses on neural topology optimization (neural TO), which leverages NNs to reparameterize the decision space and reshape the optimization landscape. While the method is still in its infancy, our analysis tools reveal critical insights into the NNs' impact on the optimization process. We demonstrate that the choice of NN architecture significantly influences the objective landscape and the optimizer's path to an optimum. Notably, NNs introduce non-convexities even in otherwise convex landscapes, potentially delaying convergence in convex problems but enhancing exploration for non-convex problems. This analysis lays the groundwork for future advancements by highlighting: 1) the potential of neural TO for non-convex problems and dedicated GPU hardware (the "good"), 2) the limitations in smooth landscapes (the "bad"), and 3) the complex challenge of selecting optimal NN architectures and hyperparameters for superior performance (the "ugly").

Updated: 2024-07-19 00:10:56

标题: 神经拓扑优化：优点、缺点和丑陋

摘要: 神经网络（NNs）在通过拓扑优化（TO）推动逆向设计方面具有巨大潜力，但关于它们应用的误解仍然存在。本文关注神经拓扑优化（神经TO），利用NNs重新参数化决策空间并重塑优化景观。虽然该方法仍处于起步阶段，但我们的分析工具揭示了NNs对优化过程的关键见解。我们证明NN架构的选择显著影响客观景观和优化器达到最优的路径。值得注意的是，NNs即使在凸优化问题中也引入非凸性，可能延迟凸问题的收敛，但促进非凸问题的探索。这一分析为未来的进步奠定了基础，强调了：1）神经TO在非凸问题和专用GPU硬件方面的潜力（“好”），2）在平滑景观中的局限性（“坏”），以及3）选择优化的NN架构和超参数以获得卓越性能的复杂挑战（“丑”）。

更新时间: 2024-07-19 00:10:56

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2407.13954v1

Sequential Recommendation with Controllable Diversification: Representation Degeneration and Diversity

Sequential recommendation (SR) models the dynamic user preferences and generates the next-item prediction as the affinity between the sequence and items, in a joint latent space with low dimensions (i.e., the sequence and item embedding space). Both sequence and item representations suffer from the representation degeneration issue due to the user/item long-tail distributions, where tail users/ items are indistinguishably distributed as a narrow cone in the latent space. We argue that the representation degeneration issue is the root cause of insufficient recommendation diversity in existing SR methods, impairing the user potential exploration and further worsening the echo chamber issue. In this work, we first disclose the connection between the representation degeneration and recommendation diversity, in which severer representation degeneration indicates lower recommendation diversity. We then propose a novel Singular sPectrum sMoothing regularization for Recommendation (SPMRec), which acts as a controllable surrogate to alleviate the degeneration and achieve the balance between recommendation diversity and performance. The proposed smoothing regularization alleviates the degeneration by maximizing the area under the singular value curve, which is also the diversity surrogate. We conduct experiments on four benchmark datasets to demonstrate the superiority of SPMRec, and show that the proposed singular spectrum smoothing can control the balance of recommendation performance and diversity simultaneously.

Updated: 2024-07-19 00:03:11

标题: 顺序推荐与可控多样性：表示退化和多样性

摘要: Sequential recommendation (SR)模型在低维度的联合潜在空间中建模了动态用户偏好，并将下一个项目的预测生成为序列和项目之间的亲和力。由于用户/项目的长尾分布，序列和项目的表示都受到表示退化问题的影响，即长尾用户/项目在潜在空间中被不可区分地分布为一个狭窄的锥体。我们认为表示退化问题是现有SR方法中推荐多样性不足的根本原因，损害了用户潜在的探索，并进一步恶化了回声室问题。在这项工作中，我们首次揭示了表示退化与推荐多样性之间的联系，表示退化越严重，推荐多样性越低。然后，我们提出了一种新颖的用于推荐的Singular sPectrum sMoothing正则化（SPMRec），它充当一个可控的替代品，以减轻退化并实现推荐多样性和性能之间的平衡。所提出的平滑正则化通过最大化奇异值曲线下的面积来减轻退化，这也是多样性的替代品。我们在四个基准数据集上进行实验，展示了SPMRec的优越性，并表明所提出的奇异值平滑可以同时控制推荐性能和多样性的平衡。

更新时间: 2024-07-19 00:03:11

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2306.11986v2

Knowledge Distillation Approaches for Accurate and Efficient Recommender System

Despite its breakthrough in classification problems, Knowledge distillation (KD) to recommendation models and ranking problems has not been studied well in the previous literature. This dissertation is devoted to developing knowledge distillation methods for recommender systems to fully improve the performance of a compact model. We propose novel distillation methods designed for recommender systems. The proposed methods are categorized according to their knowledge sources as follows: (1) Latent knowledge: we propose two methods that transfer latent knowledge of user/item representation. They effectively transfer knowledge of niche tastes with a balanced distillation strategy that prevents the KD process from being biased towards a small number of large preference groups. Also, we propose a new method that transfers user/item relations in the representation space. The proposed method selectively transfers essential relations considering the limited capacity of the compact model. (2) Ranking knowledge: we propose three methods that transfer ranking knowledge from the recommendation results. They formulate the KD process as a ranking matching problem and transfer the knowledge via a listwise learning strategy. Further, we present a new learning framework that compresses the ranking knowledge of heterogeneous recommendation models. The proposed framework is developed to ease the computational burdens of model ensemble which is a dominant solution for many recommendation applications. We validate the benefit of our proposed methods and frameworks through extensive experiments. To summarize, this dissertation sheds light on knowledge distillation approaches for a better accuracy-efficiency trade-off of the recommendation models.

Updated: 2024-07-19 00:01:18

标题: 知识蒸馏方法用于准确高效的推荐系统

摘要: 尽管在分类问题上取得了突破，但知识蒸馏（KD）在推荐模型和排名问题上尚未得到充分研究。本文致力于为推荐系统开发知识蒸馏方法，以充分提高紧凑模型的性能。我们提出了专为推荐系统设计的新型蒸馏方法。根据其知识来源，我们将所提出的方法分为以下两类：（1）潜在知识：我们提出了两种方法，用于传输用户/物品表示的潜在知识。它们通过平衡的蒸馏策略有效地传输小众口味的知识，防止KD过程偏向于少数大型偏好群体。此外，我们提出了一种新方法，用于在表示空间中传输用户/物品关系。所提出的方法有选择地传输关键关系，考虑到紧凑模型的有限容量。（2）排名知识：我们提出了三种方法，从推荐结果中传输排名知识。它们将KD过程构建为一个排名匹配问题，并通过列表学习策略传输知识。此外，我们提出了一个新的学习框架，用于压缩异构推荐模型的排名知识。所提出的框架旨在减轻模型集成的计算负担，后者是许多推荐应用的主要解决方案。我们通过大量实验验证了我们提出的方法和框架的益处。总之，本文为推荐模型的准确性和效率之间的权衡提供了知识蒸馏方法。

更新时间: 2024-07-19 00:01:18

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2407.13952v1