Arxiv Day: Article

Demystifying Trajectory Recovery From Ash: An Open-Source Evaluation and Enhancement

Once analysed, location trajectories can provide valuable insights beneficial to various applications. However, such data is also highly sensitive, rendering them susceptible to privacy risks in the event of mismanagement, for example, revealing an individual's identity, home address, or political affiliations. Hence, ensuring that privacy is preserved for this data is a priority. One commonly taken measure to mitigate this concern is aggregation. Previous work by Xu et al. shows that trajectories are still recoverable from anonymised and aggregated datasets. However, the study lacks implementation details, obfuscating the mechanisms of the attack. Additionally, the attack was evaluated on commercial non-public datasets, rendering the results and subsequent claims unverifiable. This study reimplements the trajectory recovery attack from scratch and evaluates it on two open-source datasets, detailing the preprocessing steps and implementation. Results confirm that privacy leakage still exists despite common anonymisation and aggregation methods but also indicate that the initial accuracy claims may have been overly ambitious. We release all code as open-source to ensure the results are entirely reproducible and, therefore, verifiable. Moreover, we propose a stronger attack by designing a series of enhancements to the baseline attack. These enhancements yield higher accuracies by up to 16%, providing an improved benchmark for future research in trajectory recovery methods. Our improvements also enable online execution of the attack, allowing partial attacks on larger datasets previously considered unprocessable, thereby furthering the extent of privacy leakage. The findings emphasise the importance of using strong privacy-preserving mechanisms when releasing aggregated mobility data and not solely relying on aggregation as a means of anonymisation.

Updated: 2024-10-01 23:50:33

标题: 揭秘从灰烬中恢复轨迹：一个开源评估和增强

摘要: 一旦分析，位置轨迹可以为各种应用提供宝贵的见解。然而，这些数据也非常敏感，使其容易受到隐私风险的影响，例如，泄露个人身份、家庭地址或政治关联。因此，确保这些数据的隐私得到保护是当务之急。缓解这一问题的一种常见措施是聚合。徐等人以前的研究表明，轨迹仍然可以从匿名化和聚合的数据集中恢复。然而，该研究缺乏实施细节，使攻击机制变得模糊。此外，该攻击是在商业非公开数据集上评估的，使结果和后续声明无法验证。本研究重新实现了从头开始的轨迹恢复攻击，并在两个开源数据集上进行了评估，详细说明了预处理步骤和实施细节。结果确认，即使采用常见的匿名化和聚合方法，隐私泄漏仍然存在，但也表明初始准确性声明可能过于雄心勃勃。我们将所有代码作为开源发布，以确保结果完全可重现，从而可验证。此外，我们通过设计一系列基线攻击的增强措施提出了一种更强大的攻击。这些增强措施可以提高高达16%的准确性，为未来轨迹恢复方法的研究提供了改进的基准。我们的改进还实现了攻击的在线执行，允许对先前被认为无法处理的较大数据集进行部分攻击，从而进一步扩大了隐私泄漏的范围。研究结果强调了在发布聚合移动数据时使用强大的隐私保护机制的重要性，而不仅仅依赖于聚合作为匿名化手段。

更新时间: 2024-10-01 23:50:33

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2409.14645v2

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

We propose a novel neural network architecture, the normalized Transformer (nGPT) with representation learning on the hypersphere. In nGPT, all vectors forming the embeddings, MLP, attention matrices and hidden states are unit norm normalized. The input stream of tokens travels on the surface of a hypersphere, with each layer contributing a displacement towards the target output predictions. These displacements are defined by the MLP and attention blocks, whose vector components also reside on the same hypersphere. Experiments show that nGPT learns much faster, reducing the number of training steps required to achieve the same accuracy by a factor of 4 to 20, depending on the sequence length.

Updated: 2024-10-01 23:50:09

标题: nGPT：在超球面上进行表示学习的归一化Transformer

摘要: 我们提出了一种新颖的神经网络架构，即在超球面上进行表示学习的归一化Transformer（nGPT）。在nGPT中，形成嵌入、MLP、注意力矩阵和隐藏状态的所有向量均进行单位范数归一化。标记的输入流沿着超球面的表面传播，每一层都向目标输出预测提供位移。这些位移由MLP和注意力块定义，其向量分量也驻留在同一超球面上。实验表明，nGPT学习速度更快，将实现相同准确性所需的训练步骤数量减少了4至20倍，取决于序列长度。

更新时间: 2024-10-01 23:50:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.01131v1

Observational Scaling Laws and the Predictability of Language Model Performance

Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of training models across many different scales has limited their use. We propose an alternative, observational approach that bypasses model training and instead builds scaling laws from ~100 publically available models. Building a single scaling law from multiple model families is challenging due to large variations in their training compute efficiencies and capabilities. However, we show that these variations are consistent with a simple, generalized scaling law where language model performance is a function of a low-dimensional capability space, and model families only vary in their efficiency in converting training compute to capabilities. Using this approach, we show the surprising predictability of complex scaling phenomena: we show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models; we show that the agent performance of models such as GPT-4 can be precisely predicted from simpler non-agentic benchmarks; and we show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.

Updated: 2024-10-01 23:38:10

标题: 观察定律和语言模型性能的可预测性

摘要: 理解语言模型性能如何随规模变化是基准和算法开发的关键。尺度律是建立这种理解的一种方法，但是需要跨越许多不同尺度训练模型的要求限制了它们的使用。我们提出了一种替代的观察方法，绕过模型训练，而是从约100个公开可用的模型中构建尺度律。从多个模型家族构建单一的尺度律具有挑战性，因为它们的训练计算效率和能力存在很大变化。然而，我们显示这些变化与一个简单的、广义的尺度律一致，其中语言模型性能是一个低维能力空间的函数，模型家族只在将训练计算转换为能力的效率上有所不同。使用这种方法，我们展示了复杂尺度现象的令人惊讶的可预测性：我们展示了几种新兴现象遵循平滑的、S形的行为，并可以从小型模型预测；我们展示了像GPT-4这样的模型的代理性能可以从简单的非代理基准精确预测；我们展示了如何预测后训练干预的影响，比如Thought-Chain和自一致性，随着语言模型能力的持续提升。

更新时间: 2024-10-01 23:38:10

领域: cs.LG,cs.AI,cs.CL,stat.ML

下载: http://arxiv.org/abs/2405.10938v3

Skill Issues: An Analysis of CS:GO Skill Rating Systems

The meteoric rise of online games has created a need for accurate skill rating systems for tracking improvement and fair matchmaking. Although many skill rating systems are deployed, with various theoretical foundations, less work has been done at analysing the real-world performance of these algorithms. In this paper, we perform an empirical analysis of Elo, Glicko2 and TrueSkill through the lens of surrogate modelling, where skill ratings influence future matchmaking with a configurable acquisition function. We look both at overall performance and data efficiency, and perform a sensitivity analysis based on a large dataset of Counter-Strike: Global Offensive matches.

Updated: 2024-10-01 23:19:31

标题: 技能问题：CS:GO技能评级系统分析

摘要: 在线游戏的迅速崛起引发了对准确技能评分系统的需求，以跟踪改进和公平匹配。尽管部署了许多技能评分系统，具有不同的理论基础，但对这些算法在现实世界中的性能进行分析的工作较少。在本文中，我们通过代理建模的视角对Elo、Glicko2和TrueSkill进行了实证分析，其中技能评分影响未来的匹配，并具有可配置的获取函数。我们对整体表现和数据效率进行了研究，并基于大量Counter-Strike：Global Offensive比赛的数据进行了敏感性分析。

更新时间: 2024-10-01 23:19:31

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.02831v1

DropEdge not Foolproof: Effective Augmentation Method for Signed Graph Neural Networks

The paper discusses signed graphs, which model friendly or antagonistic relationships using edges marked with positive or negative signs, focusing on the task of link sign prediction. While Signed Graph Neural Networks (SGNNs) have advanced, they face challenges like graph sparsity and unbalanced triangles. The authors propose using data augmentation (DA) techniques to address these issues, although many existing methods are not suitable for signed graphs due to a lack of side information. They highlight that the random DropEdge method, a rare DA approach applicable to signed graphs, does not enhance link sign prediction performance. In response, they introduce the Signed Graph Augmentation (SGA) framework, which includes a structure augmentation module to identify candidate edges and a strategy for selecting beneficial candidates, ultimately improving SGNN training. Experimental results show that SGA significantly boosts the performance of SGNN models, with a notable 32.3% improvement in F1-micro for SGCN on the Slashdot dataset.

Updated: 2024-10-01 23:15:48

标题: DropEdge并不是万能的：针对带符号图神经网络的有效增强方法

摘要: 本文讨论了有符号图，以正负符号标记的边来模拟友好或敌对关系，着重于链路符号预测任务。虽然有符号图神经网络（SGNNs）已经取得进展，但面临着图稀疏和不平衡三角形等挑战。作者提出使用数据增强（DA）技术来解决这些问题，尽管许多现有方法并不适用于有符号图，因为缺乏侧面信息。他们强调，随机DropEdge方法是一种适用于有符号图的罕见的DA方法，但并未提高链路符号预测性能。作为回应，他们引入了Signed Graph Augmentation（SGA）框架，该框架包括一个结构增强模块来识别候选边，以及一个选择有益候选者的策略，最终改善了SGNN的训练。实验结果表明，SGA显著提升了SGNN模型的性能，在Slashdot数据集上，SGCN的F1-micro有显著的32.3%改善。

更新时间: 2024-10-01 23:15:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.19620v2

You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes

Foundation models are increasingly ubiquitous in our daily lives, used in everyday tasks such as text-image searches, interactions with chatbots, and content generation. As use increases, so does concern over the disparities in performance and fairness of these models for different people in different parts of the world. To assess these growing regional disparities, we present World Wide Dishes, a mixed text and image dataset consisting of 765 dishes, with dish names collected in 131 local languages. World Wide Dishes has been collected purely through human contribution and decentralised means, by creating a website widely distributed through social networks. Using the dataset, we demonstrate a novel means of operationalising capability and representational biases in foundation models such as language models and text-to-image generative models. We enrich these studies with a pilot community review to understand, from a first-person perspective, how these models generate images for people in five African countries and the United States. We find that these models generally do not produce quality text and image outputs of dishes specific to different regions. This is true even for the US, which is typically considered to be more well-resourced in training data - though the generation of US dishes does outperform that of the investigated African countries. The models demonstrate a propensity to produce outputs that are inaccurate as well as culturally misrepresentative, flattening, and insensitive. These failures in capability and representational bias have the potential to further reinforce stereotypes and disproportionately contribute to erasure based on region. The dataset and code are available at https://github.com/oxai/world-wide-dishes/.

Updated: 2024-10-01 23:11:00

标题: 你是什么，取决于你吃什么？用各地多样化食物数据集训练基础模型

摘要: 基础模型在我们日常生活中越来越普遍，用于日常任务，如文本-图像搜索，与聊天机器人的交互和内容生成。随着使用量的增加，人们对这些模型在不同地区的不同人群中的性能和公平性差异的担忧也在增加。为了评估这些不断增长的区域差异，我们提出了“World Wide Dishes”，这是一个由765种菜肴组成的混合文本和图像数据集，菜名采集了131种当地语言。World Wide Dishes是纯粹通过人类贡献和去中心化手段收集的，通过创建一个广泛分布在社交网络中的网站。利用这个数据集，我们展示了一种新颖的方法来操作基础模型（如语言模型和文本到图像生成模型）中的能力和表现偏见。我们通过一项试点社区审查来丰富这些研究，以了解这些模型如何为来自五个非洲国家和美国的人生成图像。我们发现，这些模型通常无法生成特定地区的菜肴的高质量文本和图像输出。即使对于通常被认为在训练数据方面更具资源的美国来说也是如此 - 尽管美国菜肴的生成结果优于调查的非洲国家。这些模型显示出倾向于产生不准确、文化上误代表、平淡和不敏感的输出。这种能力和表现偏见上的失败有可能进一步强化刻板印象，并在区域基础上不成比例地促成擦除。数据集和代码可在https://github.com/oxai/world-wide-dishes/上获得。

更新时间: 2024-10-01 23:11:00

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2406.09496v2

Explainable Diagnosis Prediction through Neuro-Symbolic Integration

Diagnosis prediction is a critical task in healthcare, where timely and accurate identification of medical conditions can significantly impact patient outcomes. Traditional machine learning and deep learning models have achieved notable success in this domain but often lack interpretability which is a crucial requirement in clinical settings. In this study, we explore the use of neuro-symbolic methods, specifically Logical Neural Networks (LNNs), to develop explainable models for diagnosis prediction. Essentially, we design and implement LNN-based models that integrate domain-specific knowledge through logical rules with learnable thresholds. Our models, particularly $M_{\text{multi-pathway}}$ and $M_{\text{comprehensive}}$, demonstrate superior performance over traditional models such as Logistic Regression, SVM, and Random Forest, achieving higher accuracy (up to 80.52\%) and AUROC scores (up to 0.8457) in the case study of diabetes prediction. The learned weights and thresholds within the LNN models provide direct insights into feature contributions, enhancing interpretability without compromising predictive power. These findings highlight the potential of neuro-symbolic approaches in bridging the gap between accuracy and explainability in healthcare AI applications. By offering transparent and adaptable diagnostic models, our work contributes to the advancement of precision medicine and supports the development of equitable healthcare solutions. Future research will focus on extending these methods to larger and more diverse datasets to further validate their applicability across different medical conditions and populations.

Updated: 2024-10-01 22:47:24

标题: 可解释的诊断预测通过神经符号集成解释

摘要: 诊断预测是医疗保健中的关键任务，及时和准确地识别医疗状况可以显著影响患者的结果。传统的机器学习和深度学习模型在这一领域取得了显著成功，但往往缺乏可解释性，这在临床环境中是一个至关重要的要求。在这项研究中，我们探索了神经符号方法的应用，特别是逻辑神经网络（LNNs），以开发可解释的诊断预测模型。基本上，我们设计和实现了基于LNN的模型，通过逻辑规则和可学习的阈值将领域特定知识整合进来。我们的模型，特别是$M_{\text{multi-pathway}}$和$M_{\text{comprehensive}}$，在糖尿病预测案例研究中表现出优于传统模型如逻辑回归、支持向量机和随机森林的性能，实现了更高的准确度（高达80.52\%）和AUROC分数（高达0.8457）。LNN模型中学习的权重和阈值直接提供了特征贡献的见解，增强了可解释性而不影响预测能力。这些发现突显了神经符号方法在医疗人工智能应用中在准确性和可解释性之间的潜力。通过提供透明且可调整的诊断模型，我们的工作有助于推动精准医学的发展，并支持公平医疗解决方案的发展。未来的研究将重点放在将这些方法扩展到更大、更多样化的数据集，进一步验证它们在不同医疗状况和人群中的适用性。

更新时间: 2024-10-01 22:47:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.01855v1

FairCoT: Enhancing Fairness in Diffusion Models via Chain of Thought Reasoning of Multimodal Language Models

In the domain of text-to-image generative models, biases inherent in training datasets often propagate into generated content, posing significant ethical challenges, particularly in socially sensitive contexts. We introduce FairCoT, a novel framework that enhances fairness in diffusion models through Chain-of-Thought (CoT) reasoning within multimodal generative large language models (LLMs). FairCoT employs iterative CoT refinement and attire-based attribute prediction to systematically mitigate biases, ensuring diverse and equitable representation in generated images. By integrating iterative reasoning processes, FairCoT addresses the limitations of zero-shot CoT in sensitive scenarios, balancing creativity with ethical responsibility. Experimental evaluations across multiple models, including DALL-E and various Stable Diffusion variants, demonstrate that FairCoT significantly improves fairness and diversity metrics without compromising image quality or relevance. Our approach advances ethical AI practices in generative modeling, promoting socially responsible content generation and setting new standards for fairness in AI-generated imagery.

Updated: 2024-10-01 22:45:20

标题: 公平CoT：通过多模态语言模型的思维链推理增强扩散模型中的公平性

摘要: 在文本到图像生成模型领域，训练数据集中固有的偏见经常会传播到生成的内容中，尤其是在社会敏感的环境中，这会带来重大的伦理挑战。我们引入了FairCoT，这是一个通过链式思维（CoT）在多模态生成大型语言模型（LLM）中增强公平性的新框架。FairCoT利用迭代的CoT优化和基于服装的属性预测来系统地减少偏见，确保生成的图像具有多样性和公平的代表性。通过整合迭代推理过程，FairCoT解决了在敏感场景中零冲击CoT的局限性，平衡了创造力和伦理责任。跨多个模型（包括DALL-E和各种稳定扩散变体）的实验评估表明，FairCoT显著提高了公平性和多样性指标，而不会影响图像质量或相关性。我们的方法推动了生成建模中的伦理AI实践，促进了社会责任内容的生成，并为AI生成的图像中的公平标准设立了新的标准。

更新时间: 2024-10-01 22:45:20

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2406.09070v2

Learning to Build by Building Your Own Instructions

Structural understanding of complex visual objects is an important unsolved component of artificial intelligence. To study this, we develop a new technique for the recently proposed Break-and-Make problem in LTRON where an agent must learn to build a previously unseen LEGO assembly using a single interactive session to gather information about its components and their structure. We attack this problem by building an agent that we call \textbf{\ours} that is able to make its own visual instruction book. By disassembling an unseen assembly and periodically saving images of it, the agent is able to create a set of instructions so that it has the information necessary to rebuild it. These instructions form an explicit memory that allows the model to reason about the assembly process one step at a time, avoiding the need for long-term implicit memory. This in turn allows us to train on much larger LEGO assemblies than has been possible in the past. To demonstrate the power of this model, we release a new dataset of procedurally built LEGO vehicles that contain an average of 31 bricks each and require over one hundred steps to disassemble and reassemble. We train these models using online imitation learning which allows the model to learn from its own mistakes. Finally, we also provide some small improvements to LTRON and the Break-and-Make problem that simplify the learning environment and improve usability.

Updated: 2024-10-01 22:39:58

标题: 学习通过自己的指导建造

摘要: 复杂视觉对象的结构理解是人工智能中一个重要未解决的组成部分。为了研究这一问题，我们开发了一种新的技术，用于LTRON中最近提出的Break-and-Make问题，其中代理必须学会使用单个交互会话收集关于其组件及其结构的信息来构建一个以前未见过的乐高装配。我们通过构建一个我们称为\ours 的代理来解决这个问题，该代理能够制作自己的视觉指导书。通过拆解一个未知的装配并定期保存其图像，代理能够创建一组指令，以便具备重建所需的信息。这些指令形成一个明确的记忆，使模型能够逐步推理装配过程，避免长期的隐式记忆的需要。这反过来使我们能够训练比以往可能的更大的乐高装配。为了展示这个模型的能力，我们发布了一个新的数据集，包含平均每个31个砖块的程序构建的乐高车辆，需要超过一百步来拆卸和重组。我们使用在线模仿学习来训练这些模型，这允许模型从自己的错误中学习。最后，我们还对LTRON和Break-and-Make问题做了一些小的改进，简化了学习环境并提高了可用性。

更新时间: 2024-10-01 22:39:58

领域: cs.AI,cs.RO

下载: http://arxiv.org/abs/2410.01111v1

Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM Performance -- A Case Study in Finance

The application of large language models (LLMs) in domain-specific contexts, including finance, has expanded rapidly. Domain-specific LLMs are typically evaluated based on their performance in various downstream tasks relevant to the domain. In this work, we present a detailed analysis of fine-tuning LLMs for such tasks. Somewhat counterintuitively, we find that in domain-specific cases, fine-tuning exclusively on the target task is not always the most effective strategy. Instead, multi-task fine-tuning - where models are trained on a cocktail of related tasks - can significantly enhance performance. We demonstrate how this approach enables a small model, such as Phi-3-Mini, to achieve state-of-the-art results, even surpassing the much larger GPT-4-o model on financial benchmarks. Our study involves a large-scale experiment, training over 200 models using several widely adopted LLMs as baselines, and empirically confirms the benefits of multi-task fine-tuning. Additionally, we explore the use of general instruction data as a form of regularization, suggesting that it helps minimize performance degradation. We also investigate the inclusion of mathematical data, finding improvements in numerical reasoning that transfer effectively to financial tasks. Finally, we note that while fine-tuning for downstream tasks leads to targeted improvements in task performance, it does not necessarily result in broader gains in domain knowledge or complex domain reasoning abilities.

Updated: 2024-10-01 22:35:56

标题: 混合搭配：多任务微调对LLM性能的鸡尾酒效应——金融案例研究

摘要: 大型语言模型（LLMs）在特定领域上的应用，包括金融领域，迅速扩展。特定领域的LLMs通常根据它们在与领域相关的各种下游任务中的表现来进行评估。在这项工作中，我们对LLMs进行了详细的微调分析，针对这些任务。有点出人意料的是，我们发现在特定领域的情况下，仅在目标任务上进行微调并不总是最有效的策略。相反，多任务微调——在这种方法中，模型被训练在一系列相关任务上——可以显著提升性能。我们展示了这种方法如何使小型模型，例如Phi-3-Mini，实现了最先进的结果，甚至超越了规模更大的GPT-4-o模型在金融基准上的表现。我们的研究涉及大规模实验，训练了超过200个模型，使用几种广泛采用的LLMs作为基线，并经验性地证实了多任务微调的好处。此外，我们探讨了将通用指令数据作为一种正则化形式的应用，表明它有助于最小化性能退化。我们还研究了数学数据的包含，发现在数值推理方面的改进有效地转移到金融任务中。最后，我们注意到，虽然为下游任务进行微调会导致任务表现的有针对性改进，但不一定会导致对领域知识或复杂领域推理能力的更广泛提升。

更新时间: 2024-10-01 22:35:56

领域: cs.AI,cs.CE,cs.CL

下载: http://arxiv.org/abs/2410.01109v1

Augmentation through Laundering Attacks for Audio Spoof Detection

Recent text-to-speech (TTS) developments have made voice cloning (VC) more realistic, affordable, and easily accessible. This has given rise to many potential abuses of this technology, including Joe Biden's New Hampshire deepfake robocall. Several methodologies have been proposed to detect such clones. However, these methodologies have been trained and evaluated on relatively clean databases. Recently, ASVspoof 5 Challenge introduced a new crowd-sourced database of diverse acoustic conditions including various spoofing attacks and codec conditions. This paper is our submission to the ASVspoof 5 Challenge and aims to investigate the performance of Audio Spoof Detection, trained using data augmentation through laundering attacks, on the ASVSpoof 5 database. The results demonstrate that our system performs worst on A18, A19, A20, A26, and A30 spoofing attacks and in the codec and compression conditions of C08, C09, and C10.

Updated: 2024-10-01 22:34:51

标题: 通过洗牌攻击增强音频欺骗检测

摘要: 最近的文本转语音（TTS）发展使语音克隆（VC）变得更加逼真、价格实惠和易于获取。这引发了许多潜在的技术滥用，包括乔·拜登在新罕布什尔州的深度伪造电话。已经提出了几种方法来检测这种克隆。然而，这些方法已经在相对干净的数据库上进行了训练和评估。最近，ASVspoof 5挑战引入了一个新的众包数据库，包括各种声学条件，包括各种欺骗攻击和编解码器条件。本文是我们提交给ASVspoof 5挑战的论文，旨在研究通过洗钱攻击进行数据增强训练的音频欺骗检测在ASVSpoof 5数据库上的表现。结果表明，我们的系统在A18、A19、A20、A26和A30的欺骗攻击以及C08、C09和C10的编解码器和压缩条件下表现最差。

更新时间: 2024-10-01 22:34:51

领域: eess.AS,cs.AI,cs.SD

下载: http://arxiv.org/abs/2410.01108v1

Count of Monte Crypto: Accounting-based Defenses for Cross-Chain Bridges

Between 2021 and 2023, crypto assets valued at over \$US2.6 billion were stolen via attacks on "bridges" -- decentralized services designed to allow inter-blockchain exchange. While the individual exploits in each attack vary, a single design flaw underlies them all: the lack of end-to-end value accounting in cross-chain transactions. In this paper, we empirically analyze twenty million transactions used by key bridges during this period. We show that a simple invariant that balances cross-chain inflows and outflows is compatible with legitimate use, yet precisely identifies every known attack (and several likely attacks) in this data. Further, we show that this approach is not only sufficient for post-hoc audits, but can be implemented in-line in existing bridge designs to provide generic protection against a broad array of bridge vulnerabilities.

Updated: 2024-10-01 22:33:03

标题: 《蒙特克里斯托计数：基于会计的跨链桥防御》

摘要: 在2021年至2023年期间，通过对“桥梁”进行攻击，即旨在允许不同区块链之间交换的去中心化服务，加密资产价值超过26亿美元被盗。尽管每次攻击中的个别利用方式各不相同，但一个共同的设计缺陷贯穿其中：跨链交易中缺乏端到端价值核算。本文对在这一期间由关键桥梁使用的两千万笔交易进行了实证分析。我们展示了一个简单的不变量，即平衡跨链流入和流出，与合法使用兼容，但精确识别了这些数据中的每次已知攻击（以及几次可能的攻击）。此外，我们还展示了这种方法不仅足以用于事后审计，而且可以内联实施在现有桥梁设计中，以提供对各种桥梁漏洞的普遍保护。

更新时间: 2024-10-01 22:33:03

领域: cs.CR

下载: http://arxiv.org/abs/2410.01107v1

Approximately Aligned Decoding

It is common to reject undesired outputs of Large Language Models (LLMs); however, current methods to do so require an excessive amount of computation, or severely distort the distribution of outputs. We present a method to balance the distortion of the output distribution with computational efficiency, allowing for the generation of long sequences of text with difficult-to-satisfy constraints, with less amplification of low probability outputs compared to existing methods. We show through a series of experiments that the task-specific performance of our method is comparable to methods that do not distort the output distribution, while being much more computationally efficient.

Updated: 2024-10-01 22:22:13

标题: 近似对齐解码

摘要: 通常会拒绝大型语言模型（LLMs）的不良输出；然而，目前的方法需要大量计算，或严重扭曲输出分布。我们提出了一种方法，可以平衡输出分布的扭曲和计算效率，允许生成具有难以满足约束条件的长文本序列，与现有方法相比，低概率输出的放大较少。通过一系列实验，我们展示了我们的方法在特定任务性能上与不扭曲输出分布的方法相当，同时计算效率更高。

更新时间: 2024-10-01 22:22:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.01103v1

Generative AI Application for Building Industry

This paper investigates the transformative potential of generative AI technologies, particularly large language models (LLMs), within the building industry. By leveraging these advanced AI tools, the study explores their application across key areas such as energy code compliance, building design optimization, and workforce training. The research highlights how LLMs can automate labor-intensive processes, significantly improving efficiency, accuracy, and safety in building practices. The paper also addresses the challenges associated with interpreting complex visual and textual data in architectural plans and regulatory codes, proposing innovative solutions to enhance AI-driven compliance checking and design processes. Additionally, the study considers the broader implications of AI integration, including the development of AI-powered tools for comprehensive code compliance across various regulatory domains and the potential for AI to revolutionize workforce training through realistic simulations. This paper provides a comprehensive analysis of the current capabilities of generative AI in the building industry while outlining future directions for research and development, aiming to pave the way for smarter, more sustainable, and responsive construction practices.

Updated: 2024-10-01 21:59:08

标题: 建筑行业的生成式人工智能应用

摘要: 本文探讨了生成式人工智能技术，特别是大型语言模型(LLMs)在建筑行业中的变革潜力。通过利用这些先进的人工智能工具，研究探讨了它们在能源法规合规、建筑设计优化和员工培训等关键领域的应用。研究突出了LLMs如何可以自动化劳动密集型流程，显著提高建筑实践中的效率、准确性和安全性。本文还探讨了在解释复杂的视觉和文本数据在建筑计划和法规代码中的挑战，并提出了创新解决方案，以增强基于人工智能的合规检查和设计流程。此外，研究考虑了人工智能整合的更广泛影响，包括开发针对各种法规领域的全面法规合规的人工智能工具和人工智能通过逼真模拟来革命性地改变员工培训的潜力。本文对建筑行业中生成式人工智能的当前能力进行了全面分析，同时概述了未来的研究和发展方向，旨在为更智能、更可持续和更响应的建筑实践铺平道路。

更新时间: 2024-10-01 21:59:08

领域: cs.AI,cs.SY,eess.IV,eess.SY

下载: http://arxiv.org/abs/2410.01098v1

Mechanic Maker: Accessible Game Development Via Symbolic Learning Program Synthesis

Game development is a highly technical practice that traditionally requires programming skills. This serves as a barrier to entry for would-be developers or those hoping to use games as part of their creative expression. While there have been prior game development tools focused on accessibility, they generally still require programming, or have major limitations in terms of the kinds of games they can make. In this paper we introduce Mechanic Maker, a tool for creating a wide-range of game mechanics without programming. It instead relies on a backend symbolic learning system to synthesize game mechanics from examples. We conducted a user study to evaluate the benefits of the tool for participants with a variety of programming and game development experience. Our results demonstrated that participants' ability to use the tool was unrelated to programming ability. We conclude that tools like ours could help democratize game development, making the practice accessible regardless of programming skills.

Updated: 2024-10-01 21:58:28

标题: 机械制造者：通过符号学习程序综合实现可访问的游戏开发

摘要: 游戏开发是一项高度技术化的实践，传统上需要编程技能。这对于那些希望成为开发者或将游戏作为创意表达一部分的人来说构成了入门障碍。虽然之前有一些专注于可访问性的游戏开发工具，但它们通常仍然需要编程，或者在制作游戏种类方面存在重大限制。在本文中，我们介绍了Mechanic Maker，这是一个无需编程即可创建各种游戏机制的工具。它依赖于后端符号学习系统，从示例中合成游戏机制。我们进行了一项用户研究，评估了该工具对具有各种编程和游戏开发经验的参与者的益处。我们的结果表明，参与者使用该工具的能力与编程能力无关。我们得出结论，像我们这样的工具可以帮助民主化游戏开发，使该实践无论编程技能如何都能得以实现。

更新时间: 2024-10-01 21:58:28

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2410.01096v1

Efficient and Private Marginal Reconstruction with Local Non-Negativity

Differential privacy is the dominant standard for formal and quantifiable privacy and has been used in major deployments that impact millions of people. Many differentially private algorithms for query release and synthetic data contain steps that reconstruct answers to queries from answers to other queries measured by the mechanism. Reconstruction is an important subproblem for such mechanisms to economize the privacy budget, minimize error on reconstructed answers, and allow for scalability to high-dimensional datasets. In this paper, we introduce a principled and efficient postprocessing method ReM (Residuals-to-Marginals) for reconstructing answers to marginal queries. Our method builds on recent work on efficient mechanisms for marginal query release, based on making measurements using a residual query basis that admits efficient pseudoinversion, which is an important primitive used in reconstruction. An extension GReM-LNN (Gaussian Residuals-to-Marginals with Local Non-negativity) reconstructs marginals under Gaussian noise satisfying consistency and non-negativity, which often reduces error on reconstructed answers. We demonstrate the utility of ReM and GReM-LNN by applying them to improve existing private query answering mechanisms: ResidualPlanner and MWEM.

Updated: 2024-10-01 21:39:28

标题: 高效且私密的局部非负边际重建

摘要: 差分隐私是正式和可量化隐私的主要标准，并已在影响数百万人的主要部署中使用。许多用于查询发布和合成数据的差分私有算法包含重建答案的步骤，这些答案来自通过机制测量的其他查询的答案。重建是这种机制的一个重要子问题，可以节省隐私预算，最小化重建答案上的误差，并实现对高维数据集的可扩展性。在本文中，我们介绍了一种基于原则和高效的后处理方法ReM（残差到边际），用于重建边际查询的答案。我们的方法建立在最近关于边际查询发布的高效机制的工作基础上，基于使用残差查询基础进行测量，该基础可以进行高效伪逆运算，这是重建中使用的一个重要基元。扩展GReM-LNN（具有局部非负性的高斯残差到边际）在满足一致性和非负性的高斯噪声下重建边际，这通常可以减少重建答案的误差。我们通过将ReM和GReM-LNN应用于改进现有的私有查询回答机制：ResidualPlanner和MWEM来展示它们的实用性。

更新时间: 2024-10-01 21:39:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.01091v1

Extracting Memorized Training Data via Decomposition

The widespread use of Large Language Models (LLMs) in society creates new information security challenges for developers, organizations, and end-users alike. LLMs are trained on large volumes of data, and their susceptibility to reveal the exact contents of the source training datasets poses security and safety risks. Although current alignment procedures restrict common risky behaviors, they do not completely prevent LLMs from leaking data. Prior work demonstrated that LLMs may be tricked into divulging training data by using out-of-distribution queries or adversarial techniques. In this paper, we demonstrate a simple, query-based decompositional method to extract news articles from two frontier LLMs. We use instruction decomposition techniques to incrementally extract fragments of training data. Out of 3723 New York Times articles, we extract at least one verbatim sentence from 73 articles, and over 20% of verbatim sentences from 6 articles. Our analysis demonstrates that this method successfully induces the LLM to generate texts that are reliable reproductions of news articles, meaning that they likely originate from the source training dataset. This method is simple, generalizable, and does not fine-tune or change the production model. If replicable at scale, this training data extraction methodology could expose new LLM security and safety vulnerabilities, including privacy risks and unauthorized data leaks. These implications require careful consideration from model development to its end-use.

Updated: 2024-10-01 21:34:42

标题: 通过分解提取记忆训练数据

摘要: 在社会中广泛使用大型语言模型（LLM）在开发人员、组织和最终用户中都带来了新的信息安全挑战。LLM是在大量数据上训练的，它们易于透露源培训数据集的确切内容，从而带来安全和安全风险。尽管当前的对齐程序限制了常见的风险行为，但并不能完全阻止LLM泄露数据。之前的研究表明，LLM可能会被利用到透露训练数据，使用超出分布查询或对抗性技术。在本文中，我们展示了一种简单的基于查询的分解方法，从两个前沿的LLM中提取新闻文章。我们使用指令分解技术逐步提取训练数据的片段。在3723篇《纽约时报》文章中，我们至少从73篇文章中提取了一句原文，从6篇文章中提取了超过20%的原文句子。我们的分析表明，这种方法成功地诱使LLM生成可靠地复制新闻文章的文本，这意味着它们很可能起源于源训练数据集。这种方法简单、可推广，并且不需要对生产模型进行微调或更改。如果在规模上可复制，这种训练数据提取方法可能暴露新的LLM安全性和安全性漏洞，包括隐私风险和未经授权的数据泄漏。这些影响需要从模型开发到最终使用中慎重考虑。

更新时间: 2024-10-01 21:34:42

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2409.12367v2

Large Language Models Can Self-Improve At Web Agent Tasks

Training models to act as agents that can effectively navigate and perform actions in a complex environment, such as a web browser, has typically been challenging due to lack of training data. Large language models (LLMs) have recently demonstrated some capability to navigate novel environments as agents in a zero-shot or few-shot fashion, purely guided by natural language instructions as prompts. Recent research has also demonstrated LLMs have the capability to exceed their base performance through self-improvement, i.e. fine-tuning on data generated by the model itself. In this work, we explore the extent to which LLMs can self-improve their performance as agents in long-horizon tasks in a complex environment using the WebArena benchmark. In WebArena, an agent must autonomously navigate and perform actions on web pages to achieve a specified objective. We explore fine-tuning on three distinct synthetic training data mixtures and achieve a 31\% improvement in task completion rate over the base model on the WebArena benchmark through a self-improvement procedure. We additionally contribute novel evaluation metrics for assessing the performance, robustness, capabilities, and quality of trajectories of our fine-tuned agent models to a greater degree than simple, aggregate-level benchmark scores currently used to measure self-improvement.

Updated: 2024-10-01 21:28:29

标题: 大型语言模型可以在网络代理任务中进行自我改进

摘要: 训练模型以充当代理人，在复杂环境中有效导航和执行操作，例如 Web 浏览器，通常具有挑战性，因为缺乏训练数据。大型语言模型（LLMs）最近展示了一些能力，能够以零次或少次方式在新环境中作为代理人导航，纯粹受自然语言指令的引导。最近的研究还表明，LLMs 通过自我改进（即在模型自身生成的数据上微调）的能力可以超越其基本性能。在这项工作中，我们探讨了在复杂环境中使用 WebArena 基准测试自我改进其作为代理人在长期任务中的表现的LLMs的程度。在WebArena中，代理必须自主导航并在网页上执行操作以实现指定的目标。我们探讨了在三种不同的合成训练数据混合上微调，并通过自我改进程序在WebArena基准测试中实现了任务完成率比基本模型提高31％。我们还提供了新颖的评估指标，用于评估我们微调的代理模型的性能，鲁棒性，能力和轨迹质量，比目前用于衡量自我改进的简单、聚合级别基准分数更为详细。

更新时间: 2024-10-01 21:28:29

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.20309v2

Are Large Language Models Consistent over Value-laden Questions?

Large language models (LLMs) appear to bias their survey answers toward certain values. Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are they? To answer, we first define value consistency as the similarity of answers across (1) paraphrases of one question, (2) related questions under one topic, (3) multiple-choice and open-ended use-cases of one question, and (4) multilingual translations of a question to English, Chinese, German, and Japanese. We apply these measures to small and large, open LLMs including llama-3, as well as gpt-4o, using 8,000 questions spanning more than 300 topics. Unlike prior work, we find that models are relatively consistent across paraphrases, use-cases, translations, and within a topic. Still, some inconsistencies remain. Models are more consistent on uncontroversial topics (e.g., in the U.S., "Thanksgiving") than on controversial ones ("euthanasia"). Base models are both more consistent compared to fine-tuned models and are uniform in their consistency across topics, while fine-tuned models are more inconsistent about some topics ("euthanasia") than others ("women's rights") like our human subjects (n=165).

Updated: 2024-10-01 21:23:18

标题: 大型语言模型在价值导向问题上一致吗？

摘要: 大型语言模型（LLMs）似乎会偏向某些价值观来回答调查问题。然而，有人认为LLMs太不一致，无法模拟特定价值观。它们是吗？为了回答这个问题，我们首先定义价值观一致性为（1）同一问题的释义之间的答案相似度，（2）同一主题下相关问题之间的答案相似度，（3）同一问题的多选和开放式用例之间的答案相似度，以及（4）将问题翻译成英文、中文、德文和日文的多语言翻译之间的答案相似度。我们将这些度量应用于小型和大型的开放式LLM，包括llama-3和gpt-4o，使用超过300个主题的8,000个问题。与之前的研究不同的是，我们发现模型在释义、用例、翻译以及同一主题下相对一致。然而，仍然存在一些不一致性。在美国，如“感恩节”等不太有争议的话题上，模型更加一致，而在有争议的话题上（如“安乐死”）则较不一致。基础模型相对于微调模型来说更一致，并且在不同主题上的一致性是统一的，而微调模型在某些话题（如“安乐死”）上比其他话题（如“妇女权利”）更不一致，就像我们的165个人类受试者一样。

更新时间: 2024-10-01 21:23:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.02996v2

Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness

We study the Differential Privacy (DP) guarantee of hidden-state Noisy-SGD algorithms over a bounded domain. Standard privacy analysis for Noisy-SGD assumes all internal states are revealed, which leads to a divergent R'enyi DP bound with respect to the number of iterations. Ye & Shokri (2022) and Altschuler & Talwar (2022) proved convergent bounds for smooth (strongly) convex losses, and raise open questions about whether these assumptions can be relaxed. We provide positive answers by proving convergent R'enyi DP bound for non-convex non-smooth losses, where we show that requiring losses to have H\"older continuous gradient is sufficient. We also provide a strictly better privacy bound compared to state-of-the-art results for smooth strongly convex losses. Our analysis relies on the improvement of shifted divergence analysis in multiple aspects, including forward Wasserstein distance tracking, identifying the optimal shifts allocation, and the H"older reduction lemma. Our results further elucidate the benefit of hidden-state analysis for DP and its applicability.

Updated: 2024-10-01 20:52:08

标题: 无凸性和光滑性的含噪SGD的隐私损失的收敛

摘要: 我们研究了在有界域上隐藏状态噪声SGD算法的差分隐私（DP）保证。标准的Noisy-SGD隐私分析假设所有内部状态都被揭示，这导致关于迭代次数的R'enyi DP边界发散。Ye＆Shokri（2022年）和Altschuler＆Talwar（2022年）证明了对于平滑（强）凸损失的收敛边界，并提出了关于这些假设是否可以放宽的开放问题。我们通过证明非凸非光滑损失的收敛R'enyi DP边界来提供积极答案，其中我们表明要求损失具有H\"older连续梯度是足够的。与针对平滑强凸损失的最新结果相比，我们还提供了严格更好的隐私边界。我们的分析依赖于改进的转移分析在多个方面，包括前向Wasserstein距离跟踪，确定最佳的转移分配以及H"older减少引理。我们的结果进一步阐明了对于DP的隐藏状态分析及其适用性的好处。

更新时间: 2024-10-01 20:52:08

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.01068v1

From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

Since the onset of LLMs, translating natural language queries to structured SQL commands is assuming increasing. Unlike the previous reviews, this survey provides a comprehensive study of the evolution of LLM-based text-to-SQL systems, from early rule-based models to advanced LLM approaches, and how LLMs impacted this field. We discuss benchmarks, evaluation methods and evaluation metrics. Also, we uniquely study the role of integration of knowledge graphs for better contextual accuracy and schema linking in these systems. The current techniques fall into two categories: in-context learning of corpus and fine-tuning, which then leads to approaches such as zero-shot, few-shot learning from the end, and data augmentation. Finally, we highlight key challenges such as computational efficiency, model robustness, and data privacy with perspectives toward their development and improvements in potential areas for future of LLM-based text-to-SQL system.

Updated: 2024-10-01 20:46:25

标题: 从自然语言到SQL：基于LLM的文本到SQL系统综述

摘要: 自从LLM的出现以来，将自然语言查询转换为结构化SQL命令正在逐渐增加。与先前的评论不同，本调查提供了LLM文本到SQL系统演变的综合研究，从早期基于规则的模型到先进的LLM方法，以及LLM如何影响这一领域。我们讨论了基准、评估方法和评估指标。此外，我们独特地研究了知识图谱在这些系统中更好的语境准确性和模式链接的整合作用。当前的技术分为两类：语境学习和微调，然后导致零样本、少样本学习以及数据增强等方法。最后，我们强调了关键挑战，如计算效率、模型鲁棒性和数据隐私，展望它们的发展和改进的潜在领域，以及LLM文本到SQL系统未来的可能方向。

更新时间: 2024-10-01 20:46:25

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.01066v1

Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and Reliability

Large Language Models (LLMs) often produce outputs that -- though plausible -- can lack consistency and reliability, particularly in ambiguous or complex scenarios. Challenges arise from ensuring that outputs align with both factual correctness and human intent. This is problematic in existing approaches that trade improved consistency for lower accuracy. To mitigate these challenges, we propose a novel game-theoretic approach to enhance consistency and reliability during the decoding stage of LLM output generation. Our method models the decoding process as a multistage Bayesian decoding game. This ensures consistency through Correctness Alignment and enhances reliability via Ambiguity Calibration. The model dynamically converges to a consensus on the most reliable outputs and distinguishes {Valid, Specious} outputs without human feedback or additional training. Our game design allows smaller models to outperform much larger models through game mechanisms (e.g., 78.1 LLaMA13B vs 76.6 PaLM540B), as well as integrating various LL strategies and models, demonstrating the potential of game-theoretic tools to improve the truthfulness and reliability of LLMs.

Updated: 2024-10-01 20:46:10

标题: 真相还是欺骗？贝叶斯解码游戏提升一致性和可靠性

摘要: 大型语言模型（LLMs）通常会产生输出，尽管看起来合理，但在模糊或复杂的场景中可能缺乏一致性和可靠性。挑战在于确保输出既与事实正确性一致，又符合人类意图。现有方法在提高一致性的同时牺牲了准确性，这是有问题的。为了减轻这些挑战，我们提出了一种新颖的博弈论方法，在LLM输出生成的解码阶段增强一致性和可靠性。我们的方法将解码过程建模为多阶段贝叶斯解码游戏。通过正确性对齐确保一致性，并通过模糊度校准增强可靠性。该模型动态收敛到最可靠的输出共识，并在没有人类反馈或额外训练的情况下区分{有效，虚假}的输出。我们的游戏设计允许较小的模型通过游戏机制（例如，78.1 LLaMA13B vs 76.6 PaLM540B）胜过更大的模型，并整合各种LL策略和模型，展示了博弈论工具改善LLMs真实性和可靠性的潜力。

更新时间: 2024-10-01 20:46:10

领域: cs.AI

下载: http://arxiv.org/abs/2410.01064v1

CA-BERT: Leveraging Context Awareness for Enhanced Multi-Turn Chat Interaction

Effective communication in automated chat systems hinges on the ability to understand and respond to context. Traditional models often struggle with determining when additional context is necessary for generating appropriate responses. This paper introduces Context-Aware BERT (CA-BERT), a transformer-based model specifically fine-tuned to address this challenge. CA-BERT innovatively applies deep learning techniques to discern context necessity in multi-turn chat interactions, enhancing both the relevance and accuracy of responses. We describe the development of CA-BERT, which adapts the robust architecture of BERT with a novel training regimen focused on a specialized dataset of chat dialogues. The model is evaluated on its ability to classify context necessity, demonstrating superior performance over baseline BERT models in terms of accuracy and efficiency. Furthermore, CA-BERT's implementation showcases significant reductions in training time and resource usage, making it feasible for real-time applications. The results indicate that CA-BERT can effectively enhance the functionality of chatbots by providing a nuanced understanding of context, thereby improving user experience and interaction quality in automated systems. This study not only advances the field of NLP in chat applications but also provides a framework for future research into context-sensitive AI developments.

Updated: 2024-10-01 20:45:26

标题: CA-BERT：利用上下文感知增强多轮对话交互

摘要: 自动聊天系统中的有效沟通取决于理解和回应上下文。传统模型通常很难确定生成适当响应时是否需要额外的上下文。本文介绍了Context-Aware BERT（CA-BERT），这是一个专门针对这一挑战进行微调的基于transformer的模型。CA-BERT创新地应用深度学习技术来识别多轮对话互动中的上下文必要性，提高了响应的相关性和准确性。我们描述了CA-BERT的开发过程，该模型通过将BERT的坚固结构与专门针对聊天对话的数据集的新颖训练方案相结合。该模型在能否对上下文必要性进行分类方面进行评估，表现出比基准BERT模型更高的准确性和效率。此外，CA-BERT的实现展示了培训时间和资源使用的显着减少，使其适用于实时应用。结果表明，CA-BERT可以通过提供对上下文的微妙理解有效增强聊天机器人的功能，从而改善自动系统中用户体验和互动质量。这项研究不仅推动了NLP在聊天应用中的发展，还为未来研究提供了一个关于上下文敏感AI发展的框架。

更新时间: 2024-10-01 20:45:26

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.13701v2

Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference

We study how to subvert large language models (LLMs) from following prompt-specified rules. We model rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form ``if $P$ and $Q$, then $R$'' for some propositions $P$, $Q$, and $R$. We prove that although LLMs can faithfully follow such rules, maliciously crafted prompts can mislead even idealized, theoretically constructed models. Empirically, we find that the reasoning behavior of LLMs aligns with that of our theoretical constructions, and popular attack algorithms find adversarial prompts with characteristics predicted by our theory. Our logic-based framework provides a novel perspective for mechanistically understanding the behavior of LLMs in rule-based settings such as jailbreak attacks.

Updated: 2024-10-01 20:42:41

标题: 逻辑破坏：理解基于规则推理的颠覆框架

摘要: 我们研究如何颠覆大型语言模型(LLMs)遵循特定提示的规则。我们将遵循规则建模为命题Horn逻辑中的推理，这是一个数学系统，其中规则的形式为“如果$P$和$Q$，则$R$”，其中$P$，$Q$和$R$是一些命题。我们证明，尽管LLMs可以忠实地遵循这些规则，但恶意制作的提示可以误导甚至是理想化的、在理论上构建的模型。在实证研究中，我们发现LLMs的推理行为与我们的理论构建一致，流行的攻击算法找到的对抗提示具有我们理论预测的特征。我们基于逻辑的框架为在基于规则的设置中理解LLMs行为提供了一种新的视角，如越狱攻击。

更新时间: 2024-10-01 20:42:41

领域: cs.AI,cs.CL,cs.CR,cs.LG

下载: http://arxiv.org/abs/2407.00075v2

Provably Secure Commitment-based Protocols over Unauthenticated Channels

In this work we construct an alternative Unauthenticated Model, intended to build a theoretic security framework to cover communications protocols whose characteristics may not always concur with the specifics of already existing models for authenticated exchanges. This model is constructed from the notion of commitment schemes, employing ephemeral information, therefore avoiding the exchange of long-term cryptographic material. From this model, we propose a number of Commitment-based protocols to establish a shared secret between two parties, and study their resistance over unauthenticated channels. This means analyzing the security of the protocol itself, and its robustness against Man-in-the-Middle attacks, by formalizing their security under this model. The key-exchange protocols are constructed from KEX and KEM primitives, to show that this model can be applied to both established and new paradigms. We highlight the differences that arise naturally, due to the nature of KEM constructions, in terms of the protocol itself and the types of attacks that they are subject to. We provide practical go-to protocols instances to migrate to, both for KEM-based and KEX-based cryptographic primitives.

Updated: 2024-10-01 20:41:38

标题: 在未认证通道上基于承诺的协议的可证安全性

摘要: 在这项工作中，我们构建了一种替代的未认证模型，旨在建立一个理论安全框架，覆盖通信协议的特征可能并不总是与已经存在的认证交换模型的具体要求一致。该模型是基于承诺方案的概念构建的，利用瞬时信息，因此避免了长期加密材料的交换。从这个模型中，我们提出了一些基于承诺的协议，用于在两个参与方之间建立共享秘密，并研究它们在未认证通道上的抵抗力。这意味着分析协议本身的安全性，以及它们对中间人攻击的抵抗力，通过在这个模型下形式化它们的安全性。密钥交换协议是从KEX和KEM原语构建的，以展示这个模型可以应用于已经建立的和新的范式。我们强调由于KEM构造的性质，协议本身和它们面临的攻击类型在自然情况下产生的差异。我们提供了实用的基于协议的实例，可用于迁移至基于KEM和KEX的加密原语。

更新时间: 2024-10-01 20:41:38

领域: cs.CR

下载: http://arxiv.org/abs/2307.15465v3

Simulation of Graph Algorithms with Looped Transformers

The execution of graph algorithms using neural networks has recently attracted significant interest due to promising empirical progress. This motivates further understanding of how neural networks can replicate reasoning steps with relational data. In this work, we study the ability of transformer networks to simulate algorithms on graphs from a theoretical perspective. The architecture we use is a looped transformer with extra attention heads that interact with the graph. We prove by construction that this architecture can simulate individual algorithms such as Dijkstra's shortest path, Breadth- and Depth-First Search, and Kosaraju's strongly connected components, as well as multiple algorithms simultaneously. The number of parameters in the networks does not increase with the input graph size, which implies that the networks can simulate the above algorithms for any graph. Despite this property, we show a limit to simulation in our solution due to finite precision. Finally, we show a Turing Completeness result with constant width when the extra attention heads are utilized.

Updated: 2024-10-01 20:30:37

标题: 使用循环变压器模拟图算法

摘要: 最近，使用神经网络执行图算法引起了很大的兴趣，因为有了令人期待的经验进展。这促使我们进一步了解神经网络如何能够复制与关系数据相关的推理步骤。在这项工作中，我们从理论角度研究了transformer网络模拟图算法的能力。我们使用的架构是一个带有额外注意力头的循环transformer，这些头与图交互。我们通过构建证明这种架构可以模拟个别算法，如Dijkstra的最短路径、广度和深度优先搜索，以及Kosaraju的强连接组件，以及同时模拟多个算法。网络中的参数数量不随输入图的大小增加，这意味着网络可以为任何图模拟上述算法。尽管具有这种属性，我们在解决方案中展示了由于有限精度而受到模拟的限制。最后，我们展示了当利用额外的注意力头时，具有恒定宽度的图灵完备性结果。

更新时间: 2024-10-01 20:30:37

领域: cs.LG,cs.AI,cs.DS

下载: http://arxiv.org/abs/2402.01107v3

Reasoning about the Unseen for Efficient Outdoor Object Navigation

Robots should exist anywhere humans do: indoors, outdoors, and even unmapped environments. In contrast, the focus of recent advancements in Object Goal Navigation(OGN) has targeted navigating in indoor environments by leveraging spatial and semantic cues that do not generalize outdoors. While these contributions provide valuable insights into indoor scenarios, the broader spectrum of real-world robotic applications often extends to outdoor settings. As we transition to the vast and complex terrains of outdoor environments, new challenges emerge. Unlike the structured layouts found indoors, outdoor environments lack clear spatial delineations and are riddled with inherent semantic ambiguities. Despite this, humans navigate with ease because we can reason about the unseen. We introduce a new task OUTDOOR, a new mechanism for Large Language Models (LLMs) to accurately hallucinate possible futures, and a new computationally aware success metric for pushing research forward in this more complex domain. Additionally, we show impressive results on both a simulated drone and physical quadruped in outdoor environments. Our agent has no premapping and our formalism outperforms naive LLM-based approaches

Updated: 2024-10-01 20:29:26

标题: 推理无人看到的东西以实现高效的户外物体导航

摘要: 机器人应该存在于人类所存在的任何地方：室内、室外，甚至未被绘制的环境中。相比之下，最近在目标导航(OGN)方面的进展集中在利用室内环境中的空间和语义线索进行导航，这些线索在室外环境中并不具备泛化能力。虽然这些贡献为室内场景提供了宝贵的见解，但更广泛的真实世界机器人应用往往延伸到室外设置。当我们转向室外环境的广阔和复杂地形时，新的挑战出现了。与室内结构化布局不同，室外环境缺乏明确的空间界线，并且充满了固有的语义歧义。尽管如此，人类可以轻松地导航，因为我们可以推理看不见的事物。我们引入了一个新任务OUTDOOR，一种用于大型语言模型(LLMs)准确幻想可能未来的新机制，以及一种新的计算感知成功度量标准，推动研究在这个更复杂的领域取得进展。此外，我们在模拟无人机和室外环境中的实体四足动物上展示了令人印象深刻的结果。我们的代理没有预映射，我们的形式主义胜过了朴素的基于LLM的方法。

更新时间: 2024-10-01 20:29:26

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2309.10103v2

Watch Your Steps: Observable and Modular Chains of Thought

We propose a variant of chain of thought (CoT) prompting called Program Trace Prompting that makes explanations more observable while preserving the power, generality and flexibility of CoT. In our approach, few-shot CoT demonstrations are wrapped in a formal syntax based on Python, and each prompt: identifies and names steps; defines the input/output behavior of steps; and replaces CoT explanations of in-context examples with chains of these formalized steps on the same examples. Program Trace Prompting is applicable to many tasks, achieving strong results on the 23 diverse tasks in the BIG-Bench Hard benchmark. More importantly, by instrumenting explanations in this way, we enable new types of analysis. In particular, we identify "non-local errors" (which correspond to incorrectly learning the reasoning method illustrated in the demonstrations) as an unaddressed issue in CoT learning, and we present methods for verifying the modularity of steps in a CoT explanation.

Updated: 2024-10-01 20:24:38

标题: 注意你的步伐：可观察和模块化的思维链。

摘要: 我们提出了一种被称为程序跟踪提示的思维链（CoT）提示的变体，该方法使解释更加可观察，同时保留了CoT的强大性、普适性和灵活性。在我们的方法中，少样本CoT演示被包装在基于Python的形式语法中，每个提示：标识和命名步骤；定义步骤的输入/输出行为；并用这些形式化步骤的链替换同一示例上的CoT解释。程序跟踪提示适用于许多任务，在BIG-Bench Hard基准测试中的23个不同任务上取得了很好的结果。更重要的是，通过以这种方式调节解释，我们使得新类型的分析成为可能。特别是，我们确定了“非局部错误”（对应于错误地学习示范中所展示的推理方法）作为CoT学习中未解决的问题，并提出了验证CoT解释中步骤的模块化性的方法。

更新时间: 2024-10-01 20:24:38

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.15359v2

Cookie Monster: Efficient On-device Budgeting for Differentially-Private Ad-Measurement Systems

With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, the research community has a timely opportunity to assist industry in qualitatively improving the Web's privacy. This paper discusses our efforts, within a W3C community group, to enhance existing privacy-preserving advertising measurement APIs. We analyze designs from Google, Apple, Meta and Mozilla, and augment them with a more rigorous and efficient differential privacy (DP) budgeting component. Our approach, called Cookie Monster, enforces well-defined DP guarantees and enables advertisers to conduct more private measurement queries accurately. By framing the privacy guarantee in terms of an individual form of DP, we can make DP budgeting more efficient than in current systems that use a traditional DP definition. We incorporate Cookie Monster into Chrome and evaluate it on microbenchmarks and advertising datasets. Across workloads, Cookie Monster significantly outperforms baselines in enabling more advertising measurements under comparable DP protection.

Updated: 2024-10-01 20:06:48

标题: 饼干怪兽：针对差分隐私广告测量系统的高效设备端预算管理

摘要: 随着主要浏览器即将移除第三方Cookie，并引入新的保护隐私的广告API，研究界有一个及时的机会来帮助行业在提高网络隐私方面取得质的改进。本文讨论了我们在W3C社区小组内的努力，以增强现有的保护隐私广告测量API。我们分析了来自谷歌、苹果、Meta和Mozilla的设计，并通过更严格和高效的差分隐私（DP）预算组件对其进行增强。我们的方法称为Cookie Monster，实施了明确定义的DP保证，使广告商能够准确地进行更私密的测量查询。通过将隐私保证框定为一种个体形式的DP，我们可以使DP预算比当前使用传统DP定义的系统更有效率。我们将Cookie Monster整合到Chrome中，并在微基准测试和广告数据集上进行评估。在不同工作负载下，Cookie Monster在提供相当的DP保护的同时显著优于基准，使更多广告测量成为可能。

更新时间: 2024-10-01 20:06:48

领域: cs.CR

下载: http://arxiv.org/abs/2405.16719v5

RATIONALYST: Pre-training Process-Supervision for Improving Reasoning

The reasoning steps generated by LLMs might be incomplete, as they mimic logical leaps common in everyday communication found in their pre-training data: underlying rationales are frequently left implicit (unstated). To address this challenge, we introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training on a vast collection of rationale annotations extracted from unlabeled data. We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention. This web-scale pre-training for reasoning allows RATIONALYST to consistently generalize across diverse reasoning tasks, including mathematical, commonsense, scientific, and logical reasoning. Fine-tuned from LLaMa-3-8B, RATIONALYST improves the accuracy of reasoning by an average of 3.9% on 7 representative reasoning benchmarks. It also demonstrates superior performance compared to significantly larger verifiers like GPT-4 and similarly sized models fine-tuned on matching training sets.

Updated: 2024-10-01 20:05:51

标题: RATIONALYST：用于改善推理的预训练过程监督

摘要: 由LLMs生成的推理步骤可能是不完整的，因为它们模仿了在它们的预训练数据中发现的日常交流中常见的逻辑跳跃：基本原因经常被留下隐含（未明示）。为了解决这一挑战，我们引入了RATIONALYST，这是一个基于预先训练的模型，用于处理从未标记数据中提取的大量基本原因注释。我们从网络规模的未标记数据集（Pile）和一些推理数据集中提取了79k个基本原因，并结合最少的人工干预。这种面向推理的网络规模预训练使RATIONALYST能够在各种推理任务中保持一致的泛化能力，包括数学推理、常识推理、科学推理和逻辑推理。通过从LLaMa-3-8B进行微调，RATIONALYST在7个代表性推理基准测试中平均提高了3.9%的推理准确性。与GPT-4等规模更大的验证器以及在匹配训练集上进行微调的同等规模模型相比，它还表现出更优越的性能。

更新时间: 2024-10-01 20:05:51

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.01044v1

Learning from Demonstration with Implicit Nonlinear Dynamics Models

Learning from Demonstration (LfD) is a useful paradigm for training policies that solve tasks involving complex motions, such as those encountered in robotic manipulation. In practice, the successful application of LfD requires overcoming error accumulation during policy execution, i.e. the problem of drift due to errors compounding over time and the consequent out-of-distribution behaviours. Existing works seek to address this problem through scaling data collection, correcting policy errors with a human-in-the-loop, temporally ensembling policy predictions or through learning a dynamical system model with convergence guarantees. In this work, we propose and validate an alternative approach to overcoming this issue. Inspired by reservoir computing, we develop a recurrent neural network layer that includes a fixed nonlinear dynamical system with tunable dynamical properties for modelling temporal dynamics. We validate the efficacy of our neural network layer on the task of reproducing human handwriting motions using the LASA Human Handwriting Dataset. Through empirical experiments we demonstrate that incorporating our layer into existing neural network architectures addresses the issue of compounding errors in LfD. Furthermore, we perform a comparative evaluation against existing approaches including a temporal ensemble of policy predictions and an Echo State Network (ESN) implementation. We find that our approach yields greater policy precision and robustness on the handwriting task while also generalising to multiple dynamics regimes and maintaining competitive latency scores.

Updated: 2024-10-01 20:05:35

标题: 学习演示与隐式非线性动力学模型

摘要: 学习演示（LfD）是一个有用的范例，用于训练解决涉及复杂运动的任务的策略，例如在机器人操作中遇到的任务。在实践中，成功应用LfD需要克服策略执行过程中的误差积累，即由于错误随时间累积导致的漂移问题以及随之而来的超出分布行为问题。现有的研究试图通过扩大数据收集、利用人机协作纠正策略错误、暂时集成策略预测或通过学习具有收敛保证的动态系统模型来解决这个问题。在这项工作中，我们提出并验证了一种克服这个问题的另类方法。受沉箱计算的启发，我们开发了一个包含具有可调动态属性的固定非线性动态系统的递归神经网络层，用于建模时间动态。我们验证了我们神经网络层在使用LASA人类手写数据集重现人类手写运动任务中的效力。通过实证实验，我们证明将我们的层结合到现有神经网络架构中可以解决LfD中错误累积的问题。此外，我们与现有方法进行了比较评估，包括策略预测的时间集成和Echo State Network（ESN）实现。我们发现我们的方法在手写任务上产生更高的策略精度和鲁棒性，同时也能推广到多个动态区域并保持竞争性的延迟分数。

更新时间: 2024-10-01 20:05:35

领域: cs.AI,cs.LG,cs.RO,cs.SY,eess.SY,I.2

下载: http://arxiv.org/abs/2409.18768v2

Reinforcement learning-assisted quantum architecture search for variational quantum algorithms

A significant hurdle in the noisy intermediate-scale quantum (NISQ) era is identifying functional quantum circuits. These circuits must also adhere to the constraints imposed by current quantum hardware limitations. Variational quantum algorithms (VQAs), a class of quantum-classical optimization algorithms, were developed to address these challenges in the currently available quantum devices. However, the overall performance of VQAs depends on the initialization strategy of the variational circuit, the structure of the circuit (also known as ansatz), and the configuration of the cost function. Focusing on the structure of the circuit, in this thesis, we improve the performance of VQAs by automating the search for an optimal structure for the variational circuits using reinforcement learning (RL). Within the thesis, the optimality of a circuit is determined by evaluating its depth, the overall count of gates and parameters, and its accuracy in solving the given problem. The task of automating the search for optimal quantum circuits is known as quantum architecture search (QAS). The majority of research in QAS is primarily focused on a noiseless scenario. Yet, the impact of noise on the QAS remains inadequately explored. In this thesis, we tackle the issue by introducing a tensor-based quantum circuit encoding, restrictions on environment dynamics to explore the search space of possible circuits efficiently, an episode halting scheme to steer the agent to find shorter circuits, a double deep Q-network (DDQN) with an $\epsilon$-greedy policy for better stability. The numerical experiments on noiseless and noisy quantum hardware show that in dealing with various VQAs, our RL-based QAS outperforms existing QAS. Meanwhile, the methods we propose in the thesis can be readily adapted to address a wide range of other VQAs.

Updated: 2024-10-01 19:58:40

标题: 强化学习辅助的变分量子算法量子架构搜索

摘要: 在嘈杂的中等规模量子（NISQ）时代，一个重要的障碍是识别功能性的量子电路。这些电路必须也遵守当前量子硬件限制所施加的约束。变分量子算法（VQA）是一类量子-经典优化算法，旨在解决当前可用量子设备中的挑战。然而，VQA的整体性能取决于变分电路的初始化策略、电路的结构（也被称为ansatz）和成本函数的配置。在本论文中，我们着重于电路的结构，通过使用强化学习（RL）自动搜索变分电路的最佳结构来改善VQA的性能。在论文中，电路的最佳性由评估其深度、门和参数的总数以及在解决给定问题上的准确性来确定。自动搜索最佳量子电路的任务被称为量子架构搜索（QAS）。目前，QAS的研究主要集中在无噪声的情况下。然而，噪声对QAS的影响尚未充分探讨。在本论文中，我们通过引入基于张量的量子电路编码、限制环境动态以有效地探索可能电路的搜索空间、一个用于引导代理找到更短电路的剧集停止方案、一个双深度Q网络（DDQN）和一个$\epsilon$-贪婪策略以提高稳定性来解决这一问题。对无噪声和有噪声的量子硬件进行的数值实验表明，在处理各种VQA时，我们基于RL的QAS胜过现有的QAS。同时，我们在论文中提出的方法可以轻松地适应解决各种其他VQA。

更新时间: 2024-10-01 19:58:40

领域: quant-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.13754v4

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

The rise of foundation models (FMs), coupled with regulatory efforts addressing their risks and impacts, has sparked significant interest in open-source models. However, existing speech FMs (SFMs) fall short of full compliance with the open-source principles, even if claimed otherwise, as no existing SFM has model weights, code, and training data publicly available under open-source terms. In this work, we take the first step toward filling this gap by focusing on the 24 official languages of the European Union (EU). We collect suitable training data by surveying automatic speech recognition datasets and unlabeled speech corpora under open-source compliant licenses, for a total of 950k hours. Additionally, we release automatic transcripts for 441k hours of unlabeled data under the permissive CC-BY license, thereby facilitating the creation of open-source SFMs for the EU languages.

Updated: 2024-10-01 19:54:10

标题: MOSEL：950,000小时语音数据用于欧盟语言的开源语音基础模型训练

摘要: 基金会模型（FMs）的兴起，以及针对其风险和影响的监管努力，引发了对开源模型的重大兴趣。然而，现有的语音基金会模型（SFMs）在遵守开源原则方面存在不足，即使声称相反，因为没有一个现有的SFMs具有模型权重、代码和训练数据公开可用的开源条款。在这项工作中，我们首先着手填补这一空白，重点关注欧盟（EU）的24种官方语言。我们通过调查自动语音识别数据集和未标记语音语料库，收集适当的训练数据，这些数据符合开源许可，总共达到950k小时。此外，我们在宽松的CC-BY许可下发布了441k小时的未标记数据的自动转录，从而促进了为欧盟语言创建开源SFMs的进程。

更新时间: 2024-10-01 19:54:10

领域: cs.CL,cs.AI,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.01036v1

SQFT: Low-cost Model Adaptation in Low-precision Sparse Foundation Models

Large pre-trained models (LPMs), such as large language models, have become ubiquitous and are employed in many applications. These models are often adapted to a desired domain or downstream task through a fine-tuning stage. This paper proposes SQFT, an end-to-end solution for low-precision sparse parameter-efficient fine-tuning of LPMs, allowing for effective model manipulation in resource-constrained environments. Additionally, an innovative strategy enables the merging of sparse weights with low-rank adapters without losing sparsity and accuracy, overcoming the limitations of previous approaches. SQFT also addresses the challenge of having quantized weights and adapters with different numerical precisions, enabling merging in the desired numerical format without sacrificing accuracy. Multiple adaptation scenarios, models, and comprehensive sparsity levels demonstrate the effectiveness of SQFT. Models and code are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.

Updated: 2024-10-01 19:49:35

标题: SQFT: 低成本模型适应在低精度稀疏基础模型中

摘要: 大型预训练模型（LPMs），如大型语言模型，已经变得无处不在，并被应用于许多应用中。这些模型通常通过微调阶段来适应所需的领域或下游任务。本文提出了SQFT，一种用于低精度稀疏参数高效微调LPMs的端到端解决方案，允许在资源受限环境中进行有效的模型操作。此外，一种创新策略使得稀疏权重能够与低秩适配器合并而不丢失稀疏性和准确性，克服了以往方法的限制。SQFT还解决了具有不同数值精度的量化权重和适配器的挑战，使得能够在所需的数值格式中进行合并而不损失准确性。多种适应场景、模型和综合稀疏水平展示了SQFT的有效性。模型和代码可在https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning找到。

更新时间: 2024-10-01 19:49:35

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.03750v1

A Generalized Approach to Root-based Attacks against PLWE

The Polynomial Learning With Errors problem (PLWE) serves as the background of two of the three cryptosystems standardized in August 2024 by the National Institute of Standards and Technology to replace non-quantum resistant current primitives like those based on RSA, Diffie-Hellman or its elliptic curve analogue. Although PLWE is highly believed to be quantum resistant, this fact has not yet been established, contrariwise to other post-quantum proposals like multivariate and some code based ones. Moreover, several vulnerabilities have been encountered for a number of specific instances. In a search for more flexibility, it becomes fully relevant to study the robustness of PLWE based on other polynomials, not necessarily cyclotomic. In 2015, Elias et al found a good number of attacks based on different features of the roots of the polynomial. In the present work we present an overview of the approximations made against PLWE derived from this and subsequent works, along with several new attacks which refine those by Elias et al. exploiting the order of the trace of roots over finite extensions of the finite field under the three scenarios laid out by Elias et al., allowing to generalize the setting in which the attacks can be carried out.

Updated: 2024-10-01 19:25:04

标题: 一种针对PLWE的基于根的攻击的广义方法

摘要: The Polynomial Learning With Errors problem (PLWE) is the basis for two of the three cryptosystems standardized by the National Institute of Standards and Technology in August 2024. These systems are intended to replace current non-quantum resistant primitives such as RSA, Diffie-Hellman, and elliptic curve cryptography. While PLWE is believed to be quantum resistant, this has not been definitively proven, unlike other post-quantum proposals. Vulnerabilities have been found in specific instances of PLWE, prompting further study into the robustness of PLWE based on different types of polynomials, not just cyclotomic ones. In 2015, Elias et al identified attacks on PLWE based on characteristics of the polynomial's roots. This paper provides an overview of the approximations made against PLWE based on these attacks, as well as new attacks that build upon Elias et al.'s work. These new attacks exploit the order of the trace of roots over finite extensions of the finite field under different scenarios, allowing for a more generalized understanding of where attacks can be carried out.

更新时间: 2024-10-01 19:25:04

领域: cs.CR

下载: http://arxiv.org/abs/2410.01017v1

LLMs May Not Be Human-Level Players, But They Can Be Testers: Measuring Game Difficulty with LLM Agents

Recent advances in Large Language Models (LLMs) have demonstrated their potential as autonomous agents across various tasks. One emerging application is the use of LLMs in playing games. In this work, we explore a practical problem for the gaming industry: Can LLMs be used to measure game difficulty? We propose a general game-testing framework using LLM agents and test it on two widely played strategy games: Wordle and Slay the Spire. Our results reveal an interesting finding: although LLMs may not perform as well as the average human player, their performance, when guided by simple, generic prompting techniques, shows a statistically significant and strong correlation with difficulty indicated by human players. This suggests that LLMs could serve as effective agents for measuring game difficulty during the development process. Based on our experiments, we also outline general principles and guidelines for incorporating LLMs into the game testing process.

Updated: 2024-10-01 18:40:43

标题: LLMs可能不是人类水平的玩家，但它们可以成为测试者：使用LLM代理测量游戏难度

摘要: 最近对大型语言模型（LLMs）的进展表明它们在各种任务中作为自主代理的潜力。一个新兴的应用是在游戏中使用LLMs。在这项工作中，我们探索了游戏行业的一个实际问题：LLMs能否用于衡量游戏难度？我们提出了一个使用LLM代理进行游戏测试的通用框架，并在两款广受欢迎的策略游戏Wordle和Slay the Spire上进行了测试。我们的结果揭示了一个有趣的发现：尽管LLMs的表现可能不如平均人类玩家，但在简单、通用的提示技术指导下，它们的表现与人类玩家所指示的难度有着显著的统计相关性和强相关性。这表明LLMs在开发过程中可以作为测量游戏难度的有效代理。根据我们的实验，我们还概述了将LLMs纳入游戏测试过程的一般原则和指导方针。

更新时间: 2024-10-01 18:40:43

领域: cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.02829v1

Distributed AI Platform for the 6G RAN

Cellular Radio Access Networks (RANs) are rapidly evolving towards 6G, driven by the need to reduce costs and introduce new revenue streams for operators and enterprises. In this context, AI emerges as a key enabler in solving complex RAN problems spanning both the management and application domains. Unfortunately, and despite the undeniable promise of AI, several practical challenges still remain, hindering the widespread adoption of AI applications in the RAN space. This article attempts to shed light to these challenges and argues that existing approaches in addressing them are inadequate for realizing the vision of a truly AI-native 6G network. Motivated by this lack of solutions, it proposes a generic distributed AI platform architecture, tailored to the needs of an AI-native RAN and discusses its alignment with ongoing standardization efforts.

Updated: 2024-10-01 18:35:25

标题: 分布式人工智能平台用于6G RAN

摘要: 移动通信网络（RANs）正迅速向6G发展，驱动力来自于降低成本和为运营商和企业引入新的收入流。在这种背景下，人工智能（AI）被视为解决复杂RAN问题的关键推动因素，涵盖管理和应用领域。尽管AI具有不可否认的潜力，但仍然存在几个实际挑战，阻碍了AI应用在RAN领域的广泛推广。本文试图探讨这些挑战，并认为目前的解决方法不足以实现真正的AI本地化6G网络的愿景。受到这种解决方案缺乏的启发，本文提出了一个通用的分布式AI平台架构，针对AI本地化RAN的需求，讨论了其与正在进行的标准化工作的对齐情况。

更新时间: 2024-10-01 18:35:25

领域: cs.NI,cs.AI

下载: http://arxiv.org/abs/2410.03747v1

Large Language Models and Games: A Survey and Roadmap

Recent years have seen an explosive increase in research on large language models (LLMs), and accompanying public engagement on the topic. While starting as a niche area within natural language processing, LLMs have shown remarkable potential across a broad range of applications and domains, including games. This paper surveys the current state of the art across the various applications of LLMs in and for games, and identifies the different roles LLMs can take within a game. Importantly, we discuss underexplored areas and promising directions for future uses of LLMs in games and we reconcile the potential and limitations of LLMs within the games domain. As the first comprehensive survey and roadmap at the intersection of LLMs and games, we are hopeful that this paper will serve as the basis for groundbreaking research and innovation in this exciting new field.

Updated: 2024-10-01 18:34:37

标题: 大型语言模型和游戏：调查和路线图

摘要: 近年来，关于大型语言模型（LLMs）的研究呈爆炸式增长，伴随着公众对该主题的积极参与。虽然起初是自然语言处理领域的一个小众领域，但LLMs已经展现出在广泛应用和领域中的显著潜力，包括游戏。本文调查了LLMs在游戏中及为游戏的各种应用的最新技术水平，并确定了LLMs在游戏中可以扮演的不同角色。重要的是，我们讨论了尚未充分开发的领域和LLMs在游戏中未来用途的有前途的方向，同时协调了LLMs在游戏领域内的潜力和限制。作为LLMs和游戏交叉领域的第一份全面调查和路线图，我们希望这篇论文将为这一令人兴奋的新领域的开创性研究和创新奠定基础。

更新时间: 2024-10-01 18:34:37

领域: cs.CL,cs.AI,cs.HC

下载: http://arxiv.org/abs/2402.18659v4

Robust Guided Diffusion for Offline Black-Box Optimization

Offline black-box optimization aims to maximize a black-box function using an offline dataset of designs and their measured properties. Two main approaches have emerged: the forward approach, which learns a mapping from input to its value, thereby acting as a proxy to guide optimization, and the inverse approach, which learns a mapping from value to input for conditional generation. (a) Although proxy-free~(classifier-free) diffusion shows promise in robustly modeling the inverse mapping, it lacks explicit guidance from proxies, essential for generating high-performance samples beyond the training distribution. Therefore, we propose \textit{proxy-enhanced sampling} which utilizes the explicit guidance from a trained proxy to bolster proxy-free diffusion with enhanced sampling control. (b) Yet, the trained proxy is susceptible to out-of-distribution issues. To address this, we devise the module \textit{diffusion-based proxy refinement}, which seamlessly integrates insights from proxy-free diffusion back into the proxy for refinement. To sum up, we propose \textit{\textbf{R}obust \textbf{G}uided \textbf{D}iffusion for Offline Black-box Optimization}~(\textbf{RGD}), combining the advantages of proxy~(explicit guidance) and proxy-free diffusion~(robustness) for effective conditional generation. RGD achieves state-of-the-art results on various design-bench tasks, underscoring its efficacy. Our code is at https://anonymous.4open.science/r/RGD-27A5/README.md.

Updated: 2024-10-01 18:14:25

标题: 离线黑盒优化的稳健引导扩散

摘要: 离线黑盒优化旨在最大化黑盒函数，利用设计及其测量特性的离线数据集。出现了两种主要方法：前向方法学习从输入到其值的映射，因此充当代理以指导优化，而逆向方法学习从值到输入的映射，用于条件生成。(a) 尽管无代理（无分类器）扩散显示了在稳健建模逆向映射方面的潜力，但缺乏来自代理的明确指导，这是生成超出训练分布高性能样本所必需的。因此，我们提出了\textit{代理增强采样}，利用训练代理的明确指导来增强无代理扩散的采样控制。(b) 然而，训练代理容易受到超出分布问题的影响。为了解决这个问题，我们设计了\textit{基于扩散的代理细化}模块，无缝地将无代理扩散的见解整合回代理进行改进。总之，我们提出了\textit{\textbf{R}obust \textbf{G}uided \textbf{D}iffusion for Offline Black-box Optimization}（\textbf{RGD}），结合了代理（明确指导）和无代理扩散（稳健性）的优势，用于有效的条件生成。RGD在各种设计基准任务上取得了最先进的结果，凸显了其有效性。我们的代码位于https://anonymous.4open.science/r/RGD-27A5/README.md。

更新时间: 2024-10-01 18:14:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.00983v1

Heterogeneous sound classification with the Broad Sound Taxonomy and Dataset

Automatic sound classification has a wide range of applications in machine listening, enabling context-aware sound processing and understanding. This paper explores methodologies for automatically classifying heterogeneous sounds characterized by high intra-class variability. Our study evaluates the classification task using the Broad Sound Taxonomy, a two-level taxonomy comprising 28 classes designed to cover a heterogeneous range of sounds with semantic distinctions tailored for practical user applications. We construct a dataset through manual annotation to ensure accuracy, diverse representation within each class and relevance in real-world scenarios. We compare a variety of both traditional and modern machine learning approaches to establish a baseline for the task of heterogeneous sound classification. We investigate the role of input features, specifically examining how acoustically derived sound representations compare to embeddings extracted with pre-trained deep neural networks that capture both acoustic and semantic information about sounds. Experimental results illustrate that audio embeddings encoding acoustic and semantic information achieve higher accuracy in the classification task. After careful analysis of classification errors, we identify some underlying reasons for failure and propose actions to mitigate them. The paper highlights the need for deeper exploration of all stages of classification, understanding the data and adopting methodologies capable of effectively handling data complexity and generalizing in real-world sound environments.

Updated: 2024-10-01 18:09:02

标题: 具有广义声音分类和数据集的异质声音分类

摘要: 自动声音分类在机器听觉中有着广泛的应用，可以实现上下文感知的声音处理和理解。本文探讨了针对高内类别变异性的异质声音自动分类的方法。我们的研究使用Broad Sound Taxonomy评估分类任务，这是一个包含28个类别的两级分类法，旨在覆盖具有语义区别的异质声音范围，适用于实际用户应用。我们通过手动注释构建数据集，以确保准确性、每个类别内的多样性表示和在实际场景中的相关性。我们比较了各种传统和现代机器学习方法，以建立异质声音分类任务的基线。我们研究了输入特征的作用，特别是检查声学衍生的声音表示与使用预先训练的深度神经网络提取的嵌入之间的比较，后者捕捉了有关声音的声学和语义信息。实验结果表明，编码声学和语义信息的音频嵌入在分类任务中实现了更高的准确性。通过仔细分析分类错误，我们确定了一些失败的潜在原因，并提出了缓解措施。本文强调了对分类的所有阶段的深入探索的需求，理解数据并采用能够有效处理数据复杂性并在实际声音环境中泛化的方法。

更新时间: 2024-10-01 18:09:02

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2410.00980v1

Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning

Iterative human engagement is a common and effective means of leveraging the advanced language processing power of large language models (LLMs). Using well-structured prompts in a conversational manner, human users can effectively influence an LLM to develop more thoughtful and accurate responses. Motivated by this insight, we propose the Iteration of Thought (IoT) framework for enhancing LLM responses by generating "thought"-provoking prompts vis a vis an input query and the current iteration of an LLM's response. Unlike static or semi-static approaches, e.g. Chain of Thought (CoT) or Tree of Thoughts (ToT), IoT adapts its reasoning path dynamically, based on evolving context, and without generating alternate explorative thoughts which are ultimately discarded. The three components of the IoT framework are (1) an Inner Dialogue Agent (IDA) responsible for generating instructive, context-specific prompts; (2) an LLM Agent (LLMA) that processes these prompts to refine its responses; and (3) an iterative prompting loop that implements a conversation between the former two components. We introduce two variants of our framework: Autonomous Iteration of Thought (AIoT), where an LLM decides when to stop iterating, and Guided Iteration of Thought (GIoT), which always forces a fixed number iterations. We investigate the performance of IoT across various datasets, spanning complex reasoning tasks from the GPQA dataset, explorative problem-solving in Game of 24, puzzle solving in Mini Crosswords, and multi-hop question answering from the HotpotQA dataset. Our results show that IoT represents a viable paradigm for autonomous response refinement in LLMs, showcasing significant improvements over CoT and thereby enabling more adaptive and efficient reasoning systems that minimize human intervention.

Updated: 2024-10-01 17:50:25

标题: 思维的迭代：利用内部对话促进自主大型语言模型推理

摘要: 迭代式人类参与是利用大型语言模型（LLMs）的先进语言处理能力的常见和有效手段。通过以对话方式使用结构良好的提示，人类用户可以有效地影响LLM以制定更加深思熟虑和准确的回答。受此启发，我们提出了“思维的迭代”（IoT）框架，通过生成与输入查询和LLM当前迭代响应相关的引人深思的提示来增强LLM的回应。与静态或半静态方法（例如思维链（CoT）或思维树（ToT））不同，IoT根据不断发展的上下文动态地调整其推理路径，而不生成最终被丢弃的替代性探索思维。IoT框架的三个组成部分是：（1）负责生成指导性、上下文特定提示的内部对话代理（IDA）；（2）处理这些提示以完善其回应的LLM代理（LLMA）；以及（3）实施前两个组件之间对话的迭代提示循环。我们引入了我们框架的两个变体：自主思维迭代（AIoT），其中LLM决定何时停止迭代；以及引导思维迭代（GIoT），它始终强制进行固定数量的迭代。我们研究了IoT在各种数据集上的性能，涵盖了来自GPQA数据集的复杂推理任务、《24点游戏》中的探索性问题解决、《迷你填字游戏》中的拼图解决，以及来自HotpotQA数据集的多跳问题回答。我们的结果表明，IoT代表了LLMs中自主回应完善的可行范式，相比CoT有显著改进，从而实现更加自适应和高效的推理系统，最大程度地减少人类干预。

更新时间: 2024-10-01 17:50:25

领域: cs.CL,cs.AI,cs.LG,cs.MA

下载: http://arxiv.org/abs/2409.12618v2

AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow

Simulated patient systems play a crucial role in modern medical education and research, providing safe, integrative learning environments and enabling clinical decision-making simulations. Large Language Models (LLM) could advance simulated patient systems by replicating medical conditions and patient-doctor interactions with high fidelity and low cost. However, ensuring the effectiveness and trustworthiness of these systems remains a challenge, as they require a large, diverse, and precise patient knowledgebase, along with a robust and stable knowledge diffusion to users. Here, we developed AIPatient, an advanced simulated patient system with AIPatient Knowledge Graph (AIPatient KG) as the input and the Reasoning Retrieval-Augmented Generation (Reasoning RAG) agentic workflow as the generation backbone. AIPatient KG samples data from Electronic Health Records (EHRs) in the Medical Information Mart for Intensive Care (MIMIC)-III database, producing a clinically diverse and relevant cohort of 1,495 patients with high knowledgebase validity (F1 0.89). Reasoning RAG leverages six LLM powered agents spanning tasks including retrieval, KG query generation, abstraction, checker, rewrite, and summarization. This agentic framework reaches an overall accuracy of 94.15% in EHR-based medical Question Answering (QA), outperforming benchmarks that use either no agent or only partial agent integration. Our system also presents high readability (median Flesch Reading Ease 77.23; median Flesch Kincaid Grade 5.6), robustness (ANOVA F-value 0.6126, p>0.1), and stability (ANOVA F-value 0.782, p>0.1). The promising performance of the AIPatient system highlights its potential to support a wide range of applications, including medical education, model evaluation, and system integration.

Updated: 2024-10-01 17:49:00

标题: AIPatient: 使用电子健康记录和LLM动力代理工作流模拟患者

摘要: 模拟患者系统在现代医学教育和研究中发挥着关键作用，提供安全、整合的学习环境，并实现临床决策模拟。大型语言模型（LLM）可以通过高度保真和低成本的方式复制医疗条件和医生-患者互动，推进模拟患者系统的发展。然而，确保这些系统的有效性和可信度仍然是一个挑战，因为它们需要一个大、多样化和精确的患者知识库，以及向用户提供强大和稳定的知识传播。在这里，我们开发了AIPatient，这是一个先进的模拟患者系统，采用AIPatient Knowledge Graph（AIPatient KG）作为输入，采用推理检索增强生成（Reasoning RAG）代理工作流作为生成骨干。AIPatient KG从医疗信息管理系统（MIMIC）-III数据库中的电子健康记录（EHRs）中提取数据，生成了一个包含1,495名患者的临床多样化和相关的队列，具有高知识库有效性（F1 0.89）。Reasoning RAG利用六个由LLM驱动的代理，涵盖检索、KG查询生成、抽象、检查、重写和摘要等任务。这种代理框架在基于EHR的医学问答（QA）中达到了94.15%的整体准确率，优于使用没有代理或仅部分代理集成的基准。我们的系统还具有高可读性（中值Flesch阅读易度77.23；中值Flesch Kincaid等级5.6）、健壮性（ANOVA F值0.6126，p>0.1）和稳定性（ANOVA F值0.782，p>0.1）。AIPatient系统的良好表现突显了其支持广泛应用的潜力，包括医学教育、模型评估和系统集成。

更新时间: 2024-10-01 17:49:00

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.18924v2

Synthesizing Tight Privacy and Accuracy Bounds via Weighted Model Counting

Programmatically generating tight differential privacy (DP) bounds is a hard problem. Two core challenges are (1) finding expressive, compact, and efficient encodings of the distributions of DP algorithms, and (2) state space explosion stemming from the multiple quantifiers and relational properties of the DP definition. We address the first challenge by developing a method for tight privacy and accuracy bound synthesis using weighted model counting on binary decision diagrams, a state-of-the-art technique from the artificial intelligence and automated reasoning communities for exactly computing probability distributions. We address the second challenge by developing a framework for leveraging inherent symmetries in DP algorithms. Our solution benefits from ongoing research in probabilistic programming languages, allowing us to succinctly and expressively represent different DP algorithms with approachable language syntax that can be used by non-experts. We provide a detailed case study of our solution on the binary randomized response algorithm. We also evaluate an implementation of our solution using the Dice probabilistic programming language for the randomized response and truncated geometric above threshold algorithms. We compare to prior work on exact DP verification using Markov chain probabilistic model checking and the decision procedure DiPC. Very few existing works consider mechanized analysis of accuracy guarantees for DP algorithms. We additionally provide a detailed analysis using our technique for finding tight accuracy bounds for DP algorithms.

Updated: 2024-10-01 17:45:37

标题: 通过加权模型计数综合紧密的隐私和准确性边界

摘要: 以程序方式生成紧密的差分隐私（DP）边界是一个困难的问题。两个核心挑战是（1）找到表达丰富、紧凑和高效的DP算法分布编码，以及（2）由于DP定义中的多个量词和关系属性而产生的状态空间爆炸。我们通过使用二进制决策图上的加权模型计数来开发一种用于紧密隐私和准确性边界综合的方法，这是来自人工智能和自动推理社区的最先进技术，用于精确计算概率分布，解决了第一个挑战。我们通过开发一个利用DP算法中固有对称性的框架来解决第二个挑战。我们的解决方案受益于概率编程语言的持续研究，使我们能够用易于理解的语法简洁地和表现力地表示不同的DP算法，可供非专家使用。我们提供了关于二进制随机响应算法的详细案例研究。我们还使用Dice概率编程语言对随机响应和截断几何大于阈值算法的实现进行了评估。我们与使用马尔可夫链概率模型检查和决策程序DiPC进行精确DP验证的先前工作进行了比较。很少有现有的工作考虑对DP算法的准确性保证进行机械化分析。此外，我们还提供了使用我们的技术找到DP算法紧密准确性边界的详细分析。

更新时间: 2024-10-01 17:45:37

领域: cs.CR,cs.PL

下载: http://arxiv.org/abs/2402.16982v3

FLRT: Fluent Student-Teacher Redteaming

Many publicly available language models have been safety tuned to reduce the likelihood of toxic or liability-inducing text. To redteam or jailbreak these models for compliance with toxic requests, users and security analysts have developed adversarial prompting techniques. One attack method is to apply discrete optimization techniques to the prompt. However, the resulting attack strings are often gibberish text, easily filtered by defenders due to high measured perplexity, and may fail for unseen tasks and/or well-tuned models. In this work, we improve existing algorithms (primarily GCG and BEAST) to develop powerful and fluent attacks on safety-tuned models like Llama-2 and Phi-3. Our technique centers around a new distillation-based approach that encourages the victim model to emulate a toxified finetune, either in terms of output probabilities or internal activations. To encourage human-fluent attacks, we add a multi-model perplexity penalty and a repetition penalty to the objective. We also enhance optimizer strength by allowing token insertions, token swaps, and token deletions and by using longer attack sequences. The resulting process is able to reliably jailbreak the most difficult target models with prompts that appear similar to human-written prompts. On Advbench we achieve attack success rates $>93$% for Llama-2-7B, Llama-3-8B, and Vicuna-7B, while maintaining model-measured perplexity $<33$; we achieve $95$% attack success for Phi-3, though with higher perplexity. We also find a universally-optimized single fluent prompt that induces $>88$% compliance on previously unseen tasks across Llama-2-7B, Phi-3-mini and Vicuna-7B and transfers to other black-box models.

Updated: 2024-10-01 17:39:09

标题: FLRT：流利的学生-教师红队合作

摘要: 许多公开可用的语言模型已经进行了安全调整，以减少有毒或引发责任的文本的可能性。为了符合有毒请求，用户和安全分析师开发了对抗性提示技术，以对这些模型进行红队测试或越狱。一种攻击方法是将离散优化技术应用于提示。然而，由于高度测量的困惑度，结果攻击字符串通常是无意义的文本，容易被防御者过滤，并且可能在未见任务和/或调整良好的模型上失败。在这项工作中，我们改进了现有算法（主要是GCG和BEAST），以开发针对像Llama-2和Phi-3这样的安全调整模型的强大而流畅的攻击。我们的技术围绕一种新的基于蒸馏的方法展开，鼓励受害模型模仿有毒的微调，无论是在输出概率还是内部激活方面。为了鼓励人类流畅的攻击，我们在目标中添加了多模型困惑度惩罚和重复惩罚。我们还通过允许标记插入、标记交换和标记删除以及使用更长的攻击序列来增强优化器的强度。由此产生的过程能够可靠地越狱最困难的目标模型，使用看起来类似于人类编写的提示。在Advbench上，我们实现了对Llama-2-7B、Llama-3-8B和Vicuna-7B的攻击成功率超过93%，同时维持模型测量的困惑度低于33%；我们对Phi-3实现了95%的攻击成功率，尽管困惑度更高。我们还发现了一种通用优化的单个流畅提示，可以在Llama-2-7B、Phi-3-mini和Vicuna-7B上诱导超过88%的遵从度，并转移到其他黑匣子模型。

更新时间: 2024-10-01 17:39:09

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.17447v2

Measuring and Mitigating Bias for Tabular Datasets with Multiple Protected Attributes

Motivated by the recital (67) of the current corrigendum of the AI Act in the European Union, we propose and present measures and mitigation strategies for discrimination in tabular datasets. We specifically focus on datasets that contain multiple protected attributes, such as nationality, age, and sex. This makes measuring and mitigating bias more challenging, as many existing methods are designed for a single protected attribute. This paper comes with a twofold contribution: Firstly, new discrimination measures are introduced. These measures are categorized in our framework along with existing ones, guiding researchers and practitioners in choosing the right measure to assess the fairness of the underlying dataset. Secondly, a novel application of an existing bias mitigation method, FairDo, is presented. We show that this strategy can mitigate any type of discrimination, including intersectional discrimination, by transforming the dataset. By conducting experiments on real-world datasets (Adult, Bank, COMPAS), we demonstrate that de-biasing datasets with multiple protected attributes is possible. All transformed datasets show a reduction in discrimination, on average by 28%. Further, these datasets do not compromise any of the tested machine learning models' performances significantly compared to the original datasets. Conclusively, this study demonstrates the effectiveness of the mitigation strategy used and contributes to the ongoing discussion on the implementation of the European Union's AI Act.

Updated: 2024-10-01 17:39:02

标题: 测量和减轻具有多个受保护属性的表格数据集的偏见

摘要: 受欧盟AI法案当前勘误的影响，我们提出并展示了针对表格数据集中的歧视问题的措施和缓解策略。我们特别关注包含多个受保护属性（如国籍、年龄和性别）的数据集。这使得衡量和减轻偏见变得更具挑战性，因为许多现有方法是为单个受保护属性设计的。本文提出了双重贡献：首先，引入了新的歧视度量方法。这些度量方法在我们的框架中与现有的方法进行分类，指导研究人员和从业者选择正确的度量方法来评估底层数据集的公平性。其次，介绍了一个现有偏见缓解方法FairDo的新应用。我们展示这种策略可以通过转换数据集来减轻任何类型的歧视，包括交叉歧视。通过在真实数据集（成人、银行、COMPAS）上进行实验，我们证明了对具有多个受保护属性的数据集进行去偏见处理是可行的。所有转换后的数据集平均减少了28%的歧视。此外，与原始数据集相比，这些数据集不会显著损害任何经过测试的机器学习模型的性能。最后，本研究展示了所使用的缓解策略的有效性，并为有关欧盟AI法案实施的持续讨论做出了贡献。

更新时间: 2024-10-01 17:39:02

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19300v3

The Gradient of Health Data Privacy

In the era of digital health and artificial intelligence, the management of patient data privacy has become increasingly complex, with significant implications for global health equity and patient trust. This paper introduces a novel "privacy gradient" approach to health data governance, offering a more nuanced and adaptive framework than traditional binary privacy models. Our multidimensional concept considers factors such as data sensitivity, stakeholder relationships, purpose of use, and temporal aspects, allowing for context-sensitive privacy protections. Through policy analyses, ethical considerations, and case studies spanning adolescent health, integrated care, and genomic research, we demonstrate how this approach can address critical privacy challenges in diverse healthcare settings worldwide. The privacy gradient model has the potential to enhance patient engagement, improve care coordination, and accelerate medical research while safeguarding individual privacy rights. We provide policy recommendations for implementing this approach, considering its impact on healthcare systems, research infrastructures, and global health initiatives. This work aims to inform policymakers, healthcare leaders, and digital health innovators, contributing to a more equitable, trustworthy, and effective global health data ecosystem in the digital age.

Updated: 2024-10-01 17:35:18

标题: 健康数据隐私的梯度

摘要: 在数字健康和人工智能时代，患者数据隐私管理变得越来越复杂，对全球卫生公平和患者信任具有重要影响。本文介绍了一种新颖的“隐私梯度”方法来进行健康数据治理，提供了比传统的二元隐私模型更为细致和灵活的框架。我们的多维概念考虑了数据敏感性、利益相关者关系、使用目的和时间因素等因素，允许进行与上下文相关的隐私保护。通过政策分析、伦理考虑和涵盖青少年健康、综合护理和基因组研究的案例研究，我们展示了这种方法如何能够解决全球各种医疗环境中的关键隐私挑战。隐私梯度模型有潜力增强患者参与、改善护理协调，并加速医学研究，同时保护个人隐私权。我们提供了实施这一方法的政策建议，考虑了其对医疗系统、研究基础设施和全球卫生倡议的影响。这项工作旨在为决策者、医疗领导者和数字健康创新者提供信息，为数字时代的全球卫生数据生态系统贡献更加公平、可信赖和有效的贡献。

更新时间: 2024-10-01 17:35:18

领域: cs.CY,cs.AI,cs.HC,q-bio.OT

下载: http://arxiv.org/abs/2410.00897v1

Paths to Equilibrium in Games

In multi-agent reinforcement learning (MARL) and game theory, agents repeatedly interact and revise their strategies as new data arrives, producing a sequence of strategy profiles. This paper studies sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning, where an agent who is best responding in one period does not switch its strategy in the next period. This constraint merely requires that optimizing agents do not switch strategies, but does not constrain the non-optimizing agents in any way, and thus allows for exploration. Sequences with this property are called satisficing paths, and arise naturally in many MARL algorithms. A fundamental question about strategic dynamics is such: for a given game and initial strategy profile, is it always possible to construct a satisficing path that terminates at an equilibrium? The resolution of this question has implications about the capabilities or limitations of a class of MARL algorithms. We answer this question in the affirmative for normal-form games. Our analysis reveals a counterintuitive insight that reward deteriorating strategic updates are key to driving play to equilibrium along a satisficing path.

Updated: 2024-10-01 17:33:13

标题: 博弈中的均衡路径

摘要: 在多智能体强化学习（MARL）和博弈论中，智能体反复互动并根据新数据调整策略，产生一系列策略配置。本文研究满足一种受强化学习中政策更新启发的成对约束的策略序列，其中在一个周期中表现最佳的智能体在下一个周期不会改变其策略。这种约束仅要求优化智能体不改变策略，但不以任何方式限制非优化智能体，因此允许探索。具有这种特性的序列被称为满意路径，并在许多MARL算法中自然出现。关于战略动态的一个基本问题是：对于一个给定的游戏和初始策略配置，是否总是可以构建一个在平衡点终止的满意路径？这个问题的解决对于一类MARL算法的能力或限制具有重要意义。我们对正态形式游戏的这个问题作出肯定回答。我们的分析揭示了一个违反直觉的见解，即奖励恶化的战略更新是推动游戏沿着满意路径走向平衡的关键。

更新时间: 2024-10-01 17:33:13

领域: cs.GT,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.18079v2

Conversational Complexity for Assessing Risk in Large Language Models

Large Language Models (LLMs) present a dual-use dilemma: they enable beneficial applications while harboring potential for harm, particularly through conversational interactions. Despite various safeguards, advanced LLMs remain vulnerable. A watershed case was Kevin Roose's notable conversation with Bing, which elicited harmful outputs after extended interaction. This contrasts with simpler early jailbreaks that produced similar content more easily, raising the question: How much conversational effort is needed to elicit harmful information from LLMs? We propose two measures: Conversational Length (CL), which quantifies the conversation length used to obtain a specific response, and Conversational Complexity (CC), defined as the Kolmogorov complexity of the user's instruction sequence leading to the response. To address the incomputability of Kolmogorov complexity, we approximate CC using a reference LLM to estimate the compressibility of user instructions. Applying this approach to a large red-teaming dataset, we perform a quantitative analysis examining the statistical distribution of harmful and harmless conversational lengths and complexities. Our empirical findings suggest that this distributional analysis and the minimisation of CC serve as valuable tools for understanding AI safety, offering insights into the accessibility of harmful information. This work establishes a foundation for a new perspective on LLM safety, centered around the algorithmic complexity of pathways to harm.

Updated: 2024-10-01 17:21:28

标题: 大语言模型中用于评估风险的对话复杂性

摘要: 大型语言模型（LLMs）存在双重用途困境：它们能够实现有益的应用，同时也潜藏着潜在的危害，特别是通过会话交互。尽管有各种各样的保障措施，先进的LLMs仍然容易受到攻击。一个具有标志性意义的案例是Kevin Roose与Bing之间的对话，经过长时间的互动后产生了有害的结果。这与早期简单的越狱行为形成鲜明对比，后者更容易产生类似的内容，这引发了一个问题：需要多少对话努力才能从LLMs获取有害信息？我们提出了两个度量标准：对话长度（CL），它量化了用于获得特定响应的对话长度，以及对话复杂度（CC），它定义为导致响应的用户指令序列的科尔莫哥罗夫复杂度。为了解决科尔莫哥罗夫复杂度的不可计算性，我们使用一个参考LLM来近似CC，以估计用户指令的可压缩性。将这种方法应用于大规模的红队测试数据集，我们进行了定量分析，检查有害和无害对话长度和复杂度的统计分布。我们的实证研究结果表明，这种分布分析和对CC的最小化作为理解AI安全性的有价值工具，为了解有害信息的可获取性提供了见解。这项工作为围绕通往危害的路径的算法复杂度建立了LLM安全性的新视角的基础。

更新时间: 2024-10-01 17:21:28

领域: cs.AI,cs.CL,cs.IT,math.IT

下载: http://arxiv.org/abs/2409.01247v2

GEMS: Generative Expert Metric System through Iterative Prompt Priming

Across domains, metrics and measurements are fundamental to identifying challenges, informing decisions, and resolving conflicts. Despite the abundance of data available in this information age, not only can it be challenging for a single expert to work across multi-disciplinary data, but non-experts can also find it unintuitive to create effective measures or transform theories into context-specific metrics that are chosen appropriately. This technical report addresses this challenge by examining software communities within large software corporations, where different measures are used as proxies to locate counterparts within the organization to transfer tacit knowledge. We propose a prompt-engineering framework inspired by neural activities, demonstrating that generative models can extract and summarize theories and perform basic reasoning, thereby transforming concepts into context-aware metrics to support software communities given software repository data. While this research zoomed in on software communities, we believe the framework's applicability extends across various fields, showcasing expert-theory-inspired metrics that aid in triaging complex challenges.

Updated: 2024-10-01 17:14:54

标题: GEMS：通过迭代提示引导的生成专家度量系统

摘要: 在各个领域，度量和测量是识别挑战、指导决策和解决冲突的基础。尽管在这个信息时代有大量的数据可用，但不仅对于单个专家来说跨学科数据可能具有挑战性，非专家也可能发现创造有效的度量或将理论转化为选择适当的上下文特定度量是不直观的。本技术报告通过研究大型软件公司内的软件社区来解决这一挑战，在这些社区中，不同的度量被用作代理以定位组织内的对应者来传递隐性知识。我们提出了一个受神经活动启发的即时工程框架，证明生成模型可以提取和总结理论并进行基本推理，从而将概念转化为支持软件社区的上下文感知度量，给定软件存储库数据。虽然这项研究聚焦于软件社区，但我们相信该框架的适用性跨越各个领域，展示了专家理论启发的度量，有助于解决复杂挑战。

更新时间: 2024-10-01 17:14:54

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2410.00880v1

Empirical Perturbation Analysis of Linear System Solvers from a Data Poisoning Perspective

The perturbation analysis of linear solvers applied to systems arising broadly in machine learning settings -- for instance, when using linear regression models -- establishes an important perspective when reframing these analyses through the lens of a data poisoning attack. By analyzing solvers' responses to such attacks, this work aims to contribute to the development of more robust linear solvers and provide insights into poisoning attacks on linear solvers. In particular, we investigate how the errors in the input data will affect the fitting error and accuracy of the solution from a linear system-solving algorithm under perturbations common in adversarial attacks. We propose data perturbation through two distinct knowledge levels, developing a poisoning optimization and studying two methods of perturbation: Label-guided Perturbation (LP) and Unconditioning Perturbation (UP). Existing works mainly focus on deriving the worst-case perturbation bound from a theoretical perspective, and the analysis is often limited to specific kinds of linear system solvers. Under the circumstance that the data is intentionally perturbed -- as is the case with data poisoning -- we seek to understand how different kinds of solvers react to these perturbations, identifying those algorithms most impacted by different types of adversarial attacks.

Updated: 2024-10-01 17:14:05

标题: 从数据污染的角度对线性系统求解器进行的经验性扰动分析

摘要: 线性求解器的扰动分析应用于机器学习设置中广泛出现的系统，例如在使用线性回归模型时，通过数据中毒攻击的视角重新构建这些分析，建立了重要的视角。通过分析求解器对这种攻击的响应，本文旨在为更健壮的线性求解器的发展做出贡献，并深入了解线性求解器的中毒攻击。具体而言，我们研究了输入数据中的错误如何影响线性系统求解算法在对抗性攻击中常见的扰动下的拟合误差和解的准确性。我们提出了通过两种不同的知识水平进行数据扰动，开发了一种中毒优化并研究了两种扰动方法：标签引导扰动（LP）和非条件扰动（UP）。现有研究主要集中在从理论角度推导最坏情况的扰动界限，并且分析通常局限于特定类型的线性系统求解器。在数据被故意扰动的情况下，正如数据中毒的情况一样，我们试图了解不同类型的求解器如何对这些扰动做出反应，识别哪些算法受到不同类型对抗性攻击的影响最大。

更新时间: 2024-10-01 17:14:05

领域: cs.LG,cs.CR,cs.NA,math.NA

下载: http://arxiv.org/abs/2410.00878v1

Generative Expansion of Small Datasets: An Expansive Graph Approach

Limited data availability in machine learning significantly impacts performance and generalization. Traditional augmentation methods enhance moderately sufficient datasets. GANs struggle with convergence when generating diverse samples. Diffusion models, while effective, have high computational costs. We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples. It uses expander graph mappings and feature interpolation to preserve data distribution and feature relationships. The model leverages neural networks' non-linear latent space, captured by a Koopman operator, to create a linear feature space for dataset expansion. An autoencoder with self-attention layers and optimal transport refines distributional consistency. We validate by comparing classifiers trained on generated data to those trained on original datasets. Results show comparable performance, demonstrating the model's potential to augment training data effectively. This work advances data generation, addressing scarcity in machine learning applications.

Updated: 2024-10-01 17:12:57

标题: 小数据集的生成扩展：一种扩展图方法

摘要: 在机器学习中，有限的数据可用性显着影响了性能和泛化能力。传统的数据增强方法可以增强相对充足的数据集。生成对抗网络在生成多样化样本时存在收敛困难。扩散模型虽然有效，但计算成本高昂。我们引入了一个扩张综合模型，可以从最少的样本生成大规模、信息丰富的数据集。该模型利用扩展图映射和特征插值来保持数据分布和特征关系。该模型利用神经网络的非线性潜在空间，通过Koopman算子捕获线性特征空间，实现数据集扩展。一个具有自注意力层和最优传输的自编码器可以提高分布一致性。我们通过将在生成数据上训练的分类器与在原始数据集上训练的分类器进行比较来验证。结果显示出可比较的性能，表明该模型有效增强了训练数据的潜力。这项工作推进了数据生成，解决了机器学习应用中的稀缺性问题。

更新时间: 2024-10-01 17:12:57

领域: cs.LG,cs.CV,eess.IV

下载: http://arxiv.org/abs/2406.17238v2

Inference Optimization of Foundation Models on AI Accelerators

Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions, among others. However, as the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. As a result, the demand for cost-effective and fast inference using AI accelerators is ever more higher. To this end, our tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators. Beginning with an overview of basic Transformer architectures and deep learning system frameworks, we deep dive into system optimization techniques for fast and memory-efficient attention computations and discuss how they can be implemented efficiently on AI accelerators. Next, we describe architectural elements that are key for fast transformer inference. Finally, we examine various model compression and fast decoding strategies in the same context.

Updated: 2024-10-01 17:10:07

标题: 人工智能加速器上基础模型推理优化

摘要: 强大的基础模型，包括采用Transformer架构的大型语言模型（LLMs），已经在各个行业引入了生成式人工智能的新时代。工业界和研究社区见证了许多基于这些基础模型的新应用的出现。这些应用包括问答系统、客户服务、图像和视频生成以及代码补全等。然而，随着模型参数数量达到数百亿，它们在实际场景中的部署会导致高昂的推理成本和高延迟。因此，对于使用人工智能加速器进行成本效益高且快速推理的需求更加迫切。为此，我们的教程提供了关于使用人工智能加速器进行互补推理优化技术的全面讨论。从基本Transformer架构和深度学习系统框架的概述开始，我们深入探讨了用于快速和内存高效的注意力计算的系统优化技术，并讨论了它们如何可以在人工智能加速器上高效实现。接下来，我们描述了对于快速Transformer推理至关重要的架构元素。最后，我们在相同的背景下探讨了各种模型压缩和快速解码策略。

更新时间: 2024-10-01 17:10:07

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.09111v2

Do Music Generation Models Encode Music Theory?

Music foundation models possess impressive music generation capabilities. When people compose music, they may infuse their understanding of music into their work, by using notes and intervals to craft melodies, chords to build progressions, and tempo to create a rhythmic feel. To what extent is this true of music generation models? More specifically, are fundamental Western music theory concepts observable within the "inner workings" of these models? Recent work proposed leveraging latent audio representations from music generation models towards music information retrieval tasks (e.g. genre classification, emotion recognition), which suggests that high-level musical characteristics are encoded within these models. However, probing individual music theory concepts (e.g. tempo, pitch class, chord quality) remains under-explored. Thus, we introduce SynTheory, a synthetic MIDI and audio music theory dataset, consisting of tempos, time signatures, notes, intervals, scales, chords, and chord progressions concepts. We then propose a framework to probe for these music theory concepts in music foundation models (Jukebox and MusicGen) and assess how strongly they encode these concepts within their internal representations. Our findings suggest that music theory concepts are discernible within foundation models and that the degree to which they are detectable varies by model size and layer.

Updated: 2024-10-01 17:06:30

标题: 音乐生成模型是否编码音乐理论？

摘要: 音乐基础模型具有令人印象深刻的音乐生成能力。当人们创作音乐时，他们可能会将对音乐的理解融入到自己的作品中，通过使用音符和音程来构建旋律，和弦来建立和弦进行，以及节奏来创造节奏感。音乐生成模型在多大程度上符合这一点？更具体地说，在这些模型的“内部运作”中是否可以观察到基本的西方音乐理论概念？最近的研究提出利用音乐生成模型中的潜在音频表示进行音乐信息检索任务（例如流派分类、情感识别），这表明这些模型中编码了高级音乐特征。然而，对个别音乐理论概念（例如节奏、音高类别、和弦品质）的探究仍未被充分探讨。因此，我们引入了SynTheory，一个合成的MIDI和音频音乐理论数据集，包括节奏、拍号、音符、音程、音阶、和弦以及和弦进行的概念。然后，我们提出了一个框架，用于探究这些音乐理论概念在音乐基础模型（Jukebox和MusicGen）中的表现，并评估它们在内部表示中编码这些概念的程度。我们的研究结果表明，音乐理论概念在基础模型中是可辨识的，并且它们的可检测程度因模型大小和层次而异。

更新时间: 2024-10-01 17:06:30

领域: cs.SD,cs.AI,cs.CL,cs.LG,eess.AS

下载: http://arxiv.org/abs/2410.00872v1

MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

Mamba has achieved significant advantages in long-context modeling and autoregressive tasks, but its scalability with large parameters remains a major limitation in vision applications. pretraining is a widely used strategy to enhance backbone model performance. Although the success of Masked Autoencoder in Transformer pretraining is well recognized, it does not significantly improve Mamba's visual learning performance. We found that using the correct autoregressive pretraining can significantly boost the performance of the Mamba architecture. Based on this analysis, we propose Masked Autoregressive Pretraining (MAP) to pretrain a hybrid Mamba-Transformer vision backbone network. This strategy combines the strengths of both MAE and Autoregressive pretraining, improving the performance of Mamba and Transformer modules within a unified paradigm. Additionally, in terms of integrating Mamba and Transformer modules, we empirically found that inserting Transformer layers at regular intervals within Mamba layers can significantly enhance downstream task performance. Experimental results show that both the pure Mamba architecture and the hybrid Mamba-Transformer vision backbone network pretrained with MAP significantly outperform other pretraining strategies, achieving state-of-the-art performance. We validate the effectiveness of the method on both 2D and 3D datasets and provide detailed ablation studies to support the design choices for each component.

Updated: 2024-10-01 17:05:08

标题: MAP: 通过掩码自回归预训练释放混合曼巴-变压器视觉骨干的潜力

摘要: 曼巴在长上下文建模和自回归任务方面取得了显著优势，但其在大参数下的可扩展性仍然是视觉应用中的一个主要限制。预训练是增强骨干模型性能的常用策略。尽管掩蔽自编码器在Transformer预训练中的成功被广泛认可，但并不能显著提高曼巴的视觉学习性能。我们发现使用正确的自回归预训练可以显著提升曼巴架构的性能。基于这一分析，我们提出了掩蔽自回归预训练（MAP）来预训练混合曼巴-Transformer视觉骨干网络。这一策略结合了MAE和自回归预训练的优势，提升了曼巴和Transformer模块在一个统一范式内的性能。此外，就整合曼巴和Transformer模块而言，我们经验性地发现在曼巴层内定期插入Transformer层可以显著提升下游任务性能。实验结果显示，纯曼巴架构和通过MAP预训练的混合曼巴-Transformer视觉骨干网络在性能上明显优于其他预训练策略，实现了最先进的性能。我们验证了该方法在2D和3D数据集上的有效性，并提供详细的消融研究来支持每个组件的设计选择。

更新时间: 2024-10-01 17:05:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.00871v1

PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System

Generative Artificial Intelligence (GenAI) is becoming ubiquitous in our daily lives. The increase in computational power and data availability has led to a proliferation of both single- and multi-modal models. As the GenAI ecosystem matures, the need for extensible and model-agnostic risk identification frameworks is growing. To meet this need, we introduce the Python Risk Identification Toolkit (PyRIT), an open-source framework designed to enhance red teaming efforts in GenAI systems. PyRIT is a model- and platform-agnostic tool that enables red teamers to probe for and identify novel harms, risks, and jailbreaks in multimodal generative AI models. Its composable architecture facilitates the reuse of core building blocks and allows for extensibility to future models and modalities. This paper details the challenges specific to red teaming generative AI systems, the development and features of PyRIT, and its practical applications in real-world scenarios.

Updated: 2024-10-01 17:00:59

标题: PyRIT：生成式人工智能系统中安全风险识别和红队行动的框架

摘要: 生成人工智能（GenAI）正在成为我们日常生活中无处不在的存在。计算能力的增加和数据的可用性导致单模和多模型的激增。随着GenAI生态系统的成熟，对可扩展和模型不可知的风险识别框架的需求正在增长。为了满足这一需求，我们介绍了Python风险识别工具包（PyRIT），这是一个旨在增强GenAI系统中红队工作的开源框架。 PyRIT是一个模型和平台不可知的工具，使红队人员能够探索和识别多模生成人工智能模型中的新伤害、风险和越狱。其可组合的架构促进了核心构建块的重复使用，并允许将来模型和模态的可扩展性。本文详细介绍了红队测试生成人工智能系统所面临的挑战，PyRIT的开发和特性，以及其在现实场景中的实际应用。

更新时间: 2024-10-01 17:00:59

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.02828v1

Timber! Poisoning Decision Trees

We present Timber, the first white-box poisoning attack targeting decision trees. Timber is based on a greedy attack strategy leveraging sub-tree retraining to efficiently estimate the damage performed by poisoning a given training instance. The attack relies on a tree annotation procedure which enables sorting training instances so that they are processed in increasing order of computational cost of sub-tree retraining. This sorting yields a variant of Timber supporting an early stopping criterion designed to make poisoning attacks more efficient and feasible on larger datasets. We also discuss an extension of Timber to traditional random forest models, which is useful because decision trees are normally combined into ensembles to improve their predictive power. Our experimental evaluation on public datasets shows that our attacks outperform existing baselines in terms of effectiveness, efficiency or both. Moreover, we show that two representative defenses can mitigate the effect of our attacks, but fail at effectively thwarting them.

Updated: 2024-10-01 16:58:54

标题: 木材！毒害决策树

摘要: 我们提出了Timber，这是针对决策树的第一种白盒中毒攻击。Timber基于一种贪婪攻击策略，利用子树重新训练高效地估计中毒给定训练实例造成的损害。该攻击依赖于一种树注释过程，使训练实例按照子树重新训练的计算成本递增的顺序进行处理。这种排序产生了Timber的一个变体，支持早停止准则，旨在使中毒攻击在更大的数据集上更加高效和可行。我们还讨论了将Timber扩展到传统随机森林模型的方法，这很有用，因为决策树通常被组合成集成以提高它们的预测能力。我们在公共数据集上进行的实验评估显示，我们的攻击在效果、效率或两者方面均优于现有基线。此外，我们展示了两种代表性的防御方法可以减轻我们的攻击带来的影响，但无法有效地挫败它们。

更新时间: 2024-10-01 16:58:54

领域: cs.LG,cs.CR,stat.ML

下载: http://arxiv.org/abs/2410.00862v1

LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation

Pre-training visual and textual representations from large-scale image-text pairs is becoming a standard approach for many downstream vision-language tasks. The transformer-based models learn inter and intra-modal attention through a list of self-supervised learning tasks. This paper proposes LAViTeR, a novel architecture for visual and textual representation learning. The main module, Visual Textual Alignment (VTA) will be assisted by two auxiliary tasks, GAN-based image synthesis and Image Captioning. We also propose a new evaluation metric measuring the similarity between the learnt visual and textual embedding. The experimental results on two public datasets, CUB and MS-COCO, demonstrate superior visual and textual representation alignment in the joint feature embedding space

Updated: 2024-10-01 16:54:57

标题: LAViTeR：通过图像和标题生成辅助学习对齐的视觉和文本表示

摘要: 从大规模图像文本对预训练视觉和文本表示正变成许多下游视觉语言任务的标准方法。基于Transformer的模型通过一系列自监督学习任务学习跨模态和内模态关注。本文提出了LAViTeR，一种用于视觉和文本表示学习的新型架构。主要模块Visual Textual Alignment（VTA）将由两个辅助任务，基于GAN的图像合成和图像字幕，辅助完成。我们还提出了一种衡量学习视觉和文本嵌入之间相似性的新评估指标。在两个公共数据集CUB和MS-COCO上的实验结果表明，在联合特征嵌入空间中表现出卓越的视觉和文本表示对齐。

更新时间: 2024-10-01 16:54:57

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2109.04993v4

PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response

The rise of AI voice-cloning technology, particularly audio Real-time Deepfakes (RTDFs), has intensified social engineering attacks by enabling real-time voice impersonation that bypasses conventional enrollment-based authentication. To address this, we propose PITCH, a robust challenge-response method to detect and tag interactive deepfake audio calls. We developed a comprehensive taxonomy of audio challenges based on the human auditory system, linguistics, and environmental factors, yielding 20 prospective challenges. These were tested against leading voice-cloning systems using a novel dataset comprising 18,600 original and 1.6 million deepfake samples from 100 users. PITCH's prospective challenges enhanced machine detection capabilities to 88.7% AUROC score on the full unbalanced dataset, enabling us to shortlist 10 functional challenges that balance security and usability. For human evaluation and subsequent analyses, we filtered a challenging, balanced subset. On this subset, human evaluators independently scored 72.6% accuracy, while machines achieved 87.7%. Acknowledging that call environments require higher human control, we aided call receivers in making decisions with them using machines. Our solution uses an early warning system to tag suspicious incoming calls as "Deepfake-likely." Contrary to prior findings, we discovered that integrating human intuition with machine precision offers complementary advantages. Our solution gave users maximum control and boosted detection accuracy to 84.5%. Evidenced by this jump in accuracy, PITCH demonstrated the potential for AI-assisted pre-screening in call verification processes, offering an adaptable and usable approach to combat real-time voice-cloning attacks. Code to reproduce and access data at \url{https://github.com/mittalgovind/PITCH-Deepfakes}.

Updated: 2024-10-01 16:54:49

标题: 标题翻译：利用挑战-响应技术辅助AI标记深度伪造音频通话

摘要: 人工智能语音克隆技术的崛起，特别是音频实时Deepfakes（RTDFs），通过实时语音冒充绕过传统的基于注册的身份验证，加剧了社会工程攻击。为了解决这个问题，我们提出了PITCH，一种强大的挑战-响应方法，用于检测和标记交互式深度伪造音频通话。我们基于人类听觉系统、语言学和环境因素开发了一套全面的音频挑战分类法，得出了20个潜在的挑战。这些挑战通过一个新数据集对领先的语音克隆系统进行了测试，该数据集包括来自100个用户的18,600个原始样本和1.6百万个深度伪造样本。PITCH的潜在挑战将机器检测能力提高到了88.7%的AUROC分数，从而使我们能够列出10个平衡安全性和可用性的功能性挑战。为了进行人类评估和随后的分析，我们筛选了一个具有挑战性、平衡的子集。在这个子集上，人类评估者独立评分准确率为72.6%，而机器则达到了87.7%。鉴于通话环境需要更高的人类控制，我们帮助通话接收者使用机器做出决策。我们的解决方案使用早期警告系统将可疑的来电标记为“深度伪造可能”。与先前的研究结果相反，我们发现将人类直觉与机器精度整合提供了互补优势。我们的解决方案为用户提供了最大的控制权，并将检测准确率提高到了84.5%。通过这一准确率的提升，PITCH展示了AI辅助预筛选在通话验证过程中的潜力，提供了一种适应性和可用性强的方法来对抗实时语音克隆攻击。可在\url{https://github.com/mittalgovind/PITCH-Deepfakes}上复制和获取数据的代码。

更新时间: 2024-10-01 16:54:49

领域: cs.SD,cs.CR,eess.AS

下载: http://arxiv.org/abs/2402.18085v3

Enhancing Web Spam Detection through a Blockchain-Enabled Crowdsourcing Mechanism

The proliferation of spam on the Web has necessitated the development of machine learning models to automate their detection. However, the dynamic nature of spam and the sophisticated evasion techniques employed by spammers often lead to low accuracy in these models. Traditional machine-learning approaches struggle to keep pace with spammers' constantly evolving tactics, resulting in a persistent challenge to maintain high detection rates. To address this, we propose blockchain-enabled incentivized crowdsourcing as a novel solution to enhance spam detection systems. We create an incentive mechanism for data collection and labeling by leveraging blockchain's decentralized and transparent framework. Contributors are rewarded for accurate labels and penalized for inaccuracies, ensuring high-quality data. A smart contract governs the submission and evaluation process, with participants staking cryptocurrency as collateral to guarantee integrity. Simulations show that incentivized crowdsourcing improves data quality, leading to more effective machine-learning models for spam detection. This approach offers a scalable and adaptable solution to the challenges of traditional methods.

Updated: 2024-10-01 16:53:42

标题: 通过区块链技术增强网络垃圾邮件检测的众包机制

摘要: 网络垃圾邮件的泛滥促使了机器学习模型的发展，以自动化检测。然而，垃圾邮件的动态性和黑客常用的复杂规避技术往往导致这些模型的准确性较低。传统的机器学习方法难以跟上黑客不断演变的战术，导致保持高检测率一直是一个持续的挑战。为了解决这个问题，我们提出了利用区块链启用的激励众包作为增强垃圾邮件检测系统的新颖解决方案。我们通过利用区块链的去中心化和透明框架，为数据采集和标记创建了激励机制。贡献者将因准确标记而获得奖励，因不准确标记而受到惩罚，确保高质量的数据。智能合约管理提交和评估过程，参与者抵押加密货币以保证完整性。模拟显示，激励众包提高了数据质量，从而为垃圾邮件检测提供了更有效的机器学习模型。这种方法提供了一种可扩展和适应性强的解决方案，以解决传统方法的挑战。

更新时间: 2024-10-01 16:53:42

领域: cs.CR,cs.SI

下载: http://arxiv.org/abs/2410.00860v1

BarraCUDA: Edge GPUs do Leak DNN Weights

Over the last decade, applications of neural networks (NNs) have spread to various aspects of our lives. A large number of companies base their businesses on building products that use neural networks for tasks such as face recognition, machine translation, and self-driving cars. Much of the intellectual property underpinning these products is encoded in the exact parameters of the neural networks. Consequently, protecting these is of utmost priority to businesses. At the same time, many of these products need to operate under a strong threat model, in which the adversary has unfettered physical control of the product. In this work, we present BarraCUDA, a novel attack on general purpose Graphic Processing Units (GPUs) that can extract parameters of neural networks running on the popular Nvidia Jetson Nano device. BarraCUDA uses correlation electromagnetic analysis to recover parameters of real-world convolutional neural networks.

Updated: 2024-10-01 16:50:38

标题: BarraCUDA：边缘GPU泄漏DNN权重

摘要: 在过去的十年里，神经网络（NNs）的应用已经扩展到我们生活的各个方面。许多公司以构建使用神经网络执行任务的产品为基础，例如人脸识别、机器翻译和自动驾驶汽车。支撑这些产品的大部分知识产权编码在神经网络的确切参数中。因此，保护这些参数对企业至关重要。同时，许多这些产品需要在强大的威胁模型下运行，其中对手可以无限制地控制产品。在这项工作中，我们提出了BarraCUDA，这是一种对通用图形处理单元（GPUs）的新型攻击，可以提取运行在流行的Nvidia Jetson Nano设备上的神经网络的参数。BarraCUDA使用相关电磁分析来恢复真实世界卷积神经网络的参数。

更新时间: 2024-10-01 16:50:38

领域: cs.CR

下载: http://arxiv.org/abs/2312.07783v3

Dual-Space Knowledge Distillation for Large Language Models

Knowledge distillation (KD) is known as a promising solution to compress large language models (LLMs) via transferring their knowledge to smaller models. During this process, white-box KD methods usually minimize the distance between the output distributions of the two models so that more knowledge can be transferred. However, in the current white-box KD framework, the output distributions are from the respective output spaces of the two models, using their own prediction heads. We argue that the space discrepancy will lead to low similarity between the teacher model and the student model on both representation and distribution levels. Furthermore, this discrepancy also hinders the KD process between models with different vocabularies, which is common for current LLMs. To address these issues, we propose a dual-space knowledge distillation (DSKD) framework that unifies the output spaces of the two models for KD. On the basis of DSKD, we further develop a cross-model attention mechanism, which can automatically align the representations of the two models with different vocabularies. Thus, our framework is not only compatible with various distance functions for KD (e.g., KL divergence) like the current framework, but also supports KD between any two LLMs regardless of their vocabularies. Experiments on task-agnostic instruction-following benchmarks show that DSKD significantly outperforms the current white-box KD framework with various distance functions, and also surpasses existing KD methods for LLMs with different vocabularies.

Updated: 2024-10-01 16:45:12

标题: 大型语言模型的双重空间知识蒸馏

摘要: 知识蒸馏（KD）被认为是一种将大型语言模型（LLMs）压缩的有希望的解决方案，通过将它们的知识转移给较小的模型。在这个过程中，白盒KD方法通常通过最小化两个模型的输出分布之间的距离来转移更多的知识。然而，在当前的白盒KD框架中，输出分布来自两个模型的各自输出空间，使用它们自己的预测头。我们认为，这种空间差异将导致教师模型和学生模型在表示和分布级别上相似度较低。此外，这种差异也阻碍了具有不同词汇表的模型之间的KD过程，这对当前的LLMs来说是常见的。为了解决这些问题，我们提出了一个双空间知识蒸馏（DSKD）框架，统一了两个模型的输出空间用于KD。基于DSKD，我们进一步开发了一个跨模型注意机制，可以自动对齐具有不同词汇表的两个模型的表示。因此，我们的框架不仅与当前框架一样兼容各种KD距离函数（例如KL散度），而且还支持任意两个LLMs之间的KD，而不考虑它们的词汇表。对任务无关的指令遵循基准的实验表明，DSKD在各种距离函数下明显优于当前的白盒KD框架，并且也超过了现有的针对具有不同词汇表的LLMs的KD方法。

更新时间: 2024-10-01 16:45:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.17328v3

What is the Role of Large Language Models in the Evolution of Astronomy Research?

ChatGPT and other state-of-the-art large language models (LLMs) are rapidly transforming multiple fields, offering powerful tools for a wide range of applications. These models, commonly trained on vast datasets, exhibit human-like text generation capabilities, making them useful for research tasks such as ideation, literature review, coding, drafting, and outreach. We conducted a study involving 13 astronomers at different career stages and research fields to explore LLM applications across diverse tasks over several months and to evaluate their performance in research-related activities. This work was accompanied by an anonymous survey assessing participants' experiences and attitudes towards LLMs. We provide a detailed analysis of the tasks attempted and the survey answers, along with specific output examples. Our findings highlight both the potential and limitations of LLMs in supporting research while also addressing general and research-specific ethical considerations. We conclude with a series of recommendations, emphasizing the need for researchers to complement LLMs with critical thinking and domain expertise, ensuring these tools serve as aids rather than substitutes for rigorous scientific inquiry.

Updated: 2024-10-01 16:34:13

标题: 大型语言模型在天文学研究进化中扮演的角色是什么？

摘要: ChatGPT和其他最先进的大型语言模型(LLMs)正迅速改变多个领域，为广泛的应用提供强大的工具。这些模型通常在庞大的数据集上进行训练，展现出类似人类的文本生成能力，使它们在研究任务如构思、文献综述、编码、起草和推广等方面非常有用。我们进行了一项研究，涉及不同职业阶段和研究领域的13名天文学家，以探索LLM在多个任务中的应用，持续数月并评估它们在研究相关活动中的表现。这项工作伴随着一项匿名调查，评估参与者对LLMs的体验和态度。我们提供了尝试的任务和调查答案的详细分析，以及具体的输出示例。我们的发现突显了LLMs在支持研究中的潜力和局限性，同时也涉及一般和研究特定的伦理考虑。我们最后得出一系列建议，强调研究人员需要用批判性思维和领域专业知识补充LLMs，确保这些工具作为辅助而非替代严格的科学探究。

更新时间: 2024-10-01 16:34:13

领域: astro-ph.IM,cs.AI

下载: http://arxiv.org/abs/2409.20252v2

Dynamic Pricing in Securities Lending Market: Application in Revenue Optimization for an Agent Lender Portfolio

Securities lending is an important part of the financial market structure, where agent lenders help long term institutional investors to lend out their securities to short sellers in exchange for a lending fee. Agent lenders within the market seek to optimize revenue by lending out securities at the highest rate possible. Typically, this rate is set by hard-coded business rules or standard supervised machine learning models. These approaches are often difficult to scale and are not adaptive to changing market conditions. Unlike a traditional stock exchange with a centralized limit order book, the securities lending market is organized similarly to an e-commerce marketplace, where agent lenders and borrowers can transact at any agreed price in a bilateral fashion. This similarity suggests that the use of typical methods for addressing dynamic pricing problems in e-commerce could be effective in the securities lending market. We show that existing contextual bandit frameworks can be successfully utilized in the securities lending market. Using offline evaluation on real historical data, we show that the contextual bandit approach can consistently outperform typical approaches by at least 15% in terms of total revenue generated.

Updated: 2024-10-01 16:33:36

标题: 证券借贷市场的动态定价：在代理出借方组合中的收入优化应用

摘要: 证券借贷是金融市场结构中的一个重要组成部分，代理出借人帮助长期机构投资者将他们的证券出借给卖空者，以换取借贷费。市场内的代理出借人努力通过以尽可能高的利率出借证券来优化收入。通常情况下，这一利率由硬编码的商业规则或标准的监督机器学习模型设定。这些方法通常难以扩展，并且无法适应不断变化的市场条件。与传统股票交易所不同，证券借贷市场类似于电子商务市场，代理出借人和借款人可以双边协商任何同意的价格进行交易。这种相似性表明，在证券借贷市场中使用电子商务中处理动态定价问题的典型方法可能是有效的。我们展示了现有的上下文强盗框架可以成功应用于证券借贷市场。通过对真实历史数据的离线评估，我们展示了上下文强盗方法在总收入生成方面至少比典型方法提高了15%。

更新时间: 2024-10-01 16:33:36

领域: q-fin.TR,cs.LG

下载: http://arxiv.org/abs/2407.13687v3

Short vs. Long-term Coordination of Drones: When Distributed Optimization Meets Deep Reinforcement Learning

Swarms of autonomous interactive drones can provide compelling sensing capabilities in Smart City applications, such as traffic monitoring. This paper focuses on the task assignment problem for large-scale spatio-temporal sensing by a drone swarm. However, existing approaches have distinct challenges: distributed evolutionary optimization, such as collective learning, lacks long-term adaptability in dynamic environments, while deep reinforcement learning (DRL) is limited to scale effectively due to the curse of dimensionality. Therefore, this paper proposes a novel synergetic optimization approach by integrating long-term DRL and short-term collective learning. Through this approach, each drone independently and proactively determines its flying direction and recharging location using DRL, while evolving their navigation and sensing policies through collective learning based on a structured tree communication model. Extensive experiments with datasets generated from realistic urban mobility demonstrate an outstanding performance of the proposed solution in complex scenarios. New insights show that this approach provides a win-win synthesis of short-term and long-term strategies for drone-based traffic monitoring, with short-term methods addressing training complexity and energy management, while long-term methods preserving high sensing performance.

Updated: 2024-10-01 16:11:27

标题: 短期与长期无人机协调：分布式优化与深度强化学习的结合

摘要: 一群自主互动的无人机群体可以为智能城市应用提供引人注目的感知能力，如交通监控。本文关注无人机群体在大规模时空感知中的任务分配问题。然而，现有方法存在明显挑战：分布式进化优化（如集体学习）在动态环境中缺乏长期适应性，而深度强化学习（DRL）由于维度灾难而无法有效扩展。因此，本文提出了一种新颖的协同优化方法，将长期DRL和短期集体学习相结合。通过这种方法，每架无人机独立并主动地使用DRL确定其飞行方向和充电位置，同时通过基于结构化树通信模型的集体学习来演化它们的导航和感知策略。通过使用从现实城市移动性生成的数据集进行广泛实验，证明了所提出解决方案在复杂场景中的出色表现。新的见解显示，这种方法为基于无人机的交通监测提供了短期和长期策略的双赢综合，短期方法解决了训练复杂性和能源管理问题，而长期方法保持了高感知性能。

更新时间: 2024-10-01 16:11:27

领域: cs.RO,cs.LG,cs.MA

下载: http://arxiv.org/abs/2311.09852v7

Clustering Three-Way Data with Outliers

Matrix-variate distributions are a recent addition to the model-based clustering field, thereby making it possible to analyze data in matrix form with complex structure such as images and time series. Due to its recent appearance, there is limited literature on matrix-variate data, with even less on dealing with outliers in these models. An approach for clustering matrix-variate normal data with outliers is discussed. The approach, which uses the distribution of subset log-likelihoods, extends the OCLUST algorithm to matrix-variate normal data and uses an iterative approach to detect and trim outliers.

Updated: 2024-10-01 16:08:52

标题: 使用异常值进行三维数据聚类

摘要: 矩阵变量分布是模型基础聚类领域的最新进展，从而使得可能分析具有复杂结构的矩阵形式数据，如图像和时间序列。由于其最近出现，关于矩阵变量数据的文献有限，对于在这些模型中处理异常值的文献更少。本文讨论了一种用于聚类具有异常值的矩阵变量正态数据的方法。该方法利用子集对数似然分布，将OCLUST算法扩展到矩阵变量正态数据，并采用迭代方法来检测和修剪异常值。

更新时间: 2024-10-01 16:08:52

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2310.05288v3

WiGNet: Windowed Vision Graph Neural Network

In recent years, Graph Neural Networks (GNNs) have demonstrated strong adaptability to various real-world challenges, with architectures such as Vision GNN (ViG) achieving state-of-the-art performance in several computer vision tasks. However, their practical applicability is hindered by the computational complexity of constructing the graph, which scales quadratically with the image size. In this paper, we introduce a novel Windowed vision Graph neural Network (WiGNet) model for efficient image processing. WiGNet explores a different strategy from previous works by partitioning the image into windows and constructing a graph within each window. Therefore, our model uses graph convolutions instead of the typical 2D convolution or self-attention mechanism. WiGNet effectively manages computational and memory complexity for large image sizes. We evaluate our method in the ImageNet-1k benchmark dataset and test the adaptability of WiGNet using the CelebA-HQ dataset as a downstream task with higher-resolution images. In both of these scenarios, our method achieves competitive results compared to previous vision GNNs while keeping memory and computational complexity at bay. WiGNet offers a promising solution toward the deployment of vision GNNs in real-world applications. We publicly released the code at https://github.com/EIDOSLAB/WiGNet.

Updated: 2024-10-01 15:54:07

标题: WiGNet：窗口化视觉图神经网络

摘要: 近年来，图神经网络（GNNs）已经展示出对各种现实世界挑战的强大适应性，例如Vision GNN（ViG）等结构在几个计算机视觉任务中实现了最先进的性能。然而，它们的实际适用性受到构建图的计算复杂性的限制，该复杂性随图像大小呈二次增长。在本文中，我们提出了一种新颖的窗口化视觉图神经网络（WiGNet）模型，用于高效的图像处理。WiGNet探索了与以往工作不同的策略，通过将图像分割成窗口并在每个窗口内构建图。因此，我们的模型使用图卷积而不是典型的2D卷积或自注意机制。WiGNet有效地管理了大尺寸图像的计算和内存复杂性。我们在ImageNet-1k基准数据集中评估了我们的方法，并使用CelebA-HQ数据集作为下游任务测试了WiGNet的适应性，其中包含更高分辨率的图像。在这两种情况下，我们的方法与以前的视觉GNNs相比取得了竞争性的结果，同时保持了内存和计算复杂性的平衡。WiGNet为在实际应用中部署视觉GNNs提供了一个有前途的解决方案。我们在https://github.com/EIDOSLAB/WiGNet上公开发布了代码。

更新时间: 2024-10-01 15:54:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.00807v1

Large-Scale Security Analysis of Real-World Backend Deployments Speaking IoT-Focused Protocols

Internet-of-Things (IoT) devices, ranging from smart home assistants to health devices, are pervasive: Forecasts estimate their number to reach 29 billion by 2030. Understanding the security of their machine-to-machine communication is crucial. Prior work focused on identifying devices' vulnerabilities or proposed protocol-specific solutions. Instead, we investigate the security of backends speaking IoT protocols, that is, the backbone of the IoT ecosystem. We focus on three real-world protocols for our large-scale analysis: MQTT, CoAP, and XMPP. We gather a dataset of over 337,000 backends, augment it with geographical and provider data, and perform non-invasive active measurements to investigate three major security threats: information leakage, weak authentication, and denial of service. Our results provide quantitative evidence of a problematic immaturity in the IoT ecosystem. Among other issues, we find that 9.44% backends expose information, 30.38% CoAP-speaking backends are vulnerable to denial of service attacks, and 99.84% of MQTT- and XMPP-speaking backends use insecure transport protocols (only 0.16% adopt TLS, of which 70.93% adopt a vulnerable version).

Updated: 2024-10-01 15:52:15

标题: 大规模安全分析现实世界后端部署，讨论物联网协议

摘要: 物联网(IoT)设备，从智能家居助手到健康设备，已经无处不在：预测估计到2030年它们的数量将达到290亿。了解它们之间的机器对机器通信的安全性是至关重要的。先前的工作侧重于识别设备的漏洞或提出特定于协议的解决方案。相反，我们调查了后端说话的物联网协议的安全性，也就是物联网生态系统的支柱。我们针对我们的大规模分析关注三种现实世界的协议：MQTT，CoAP和XMPP。我们收集了超过33.7万个后端的数据集，补充了地理和供应商数据，并进行了非侵入式主动测量，以调查三个主要的安全威胁：信息泄露，弱身份验证和拒绝服务。我们的结果提供了物联网生态系统中问题成熟度的定量证据。在其他问题中，我们发现9.44%的后端暴露信息，30.38%的CoAP后端容易受到拒绝服务攻击，而99.84%的MQTT和XMPP后端使用不安全的传输协议(只有0.16%采用TLS，其中70.93%采用易受攻击的版本)。

更新时间: 2024-10-01 15:52:15

领域: cs.CR,cs.NI

下载: http://arxiv.org/abs/2405.09662v2

GAMMA-PD: Graph-based Analysis of Multi-Modal Motor Impairment Assessments in Parkinson's Disease

The rapid advancement of medical technology has led to an exponential increase in multi-modal medical data, including imaging, genomics, and electronic health records (EHRs). Graph neural networks (GNNs) have been widely used to represent this data due to their prominent performance in capturing pairwise relationships. However, the heterogeneity and complexity of multi-modal medical data still pose significant challenges for standard GNNs, which struggle with learning higher-order, non-pairwise relationships. This paper proposes GAMMA-PD (Graph-based Analysis of Multi-modal Motor Impairment Assessments in Parkinson's Disease), a novel heterogeneous hypergraph fusion framework for multi-modal clinical data analysis. GAMMA-PD integrates imaging and non-imaging data into a "hypernetwork" (patient population graph) by preserving higher-order information and similarity between patient profiles and symptom subtypes. We also design a feature-based attention-weighted mechanism to interpret feature-level contributions towards downstream decision tasks. We evaluate our approach with clinical data from the Parkinson's Progression Markers Initiative (PPMI) and a private dataset. We demonstrate gains in predicting motor impairment symptoms in Parkinson's disease. Our end-to-end framework also learns associations between subsets of patient characteristics to generate clinically relevant explanations for disease and symptom profiles. The source code is available at https://github.com/favour-nerrise/GAMMA-PD.

Updated: 2024-10-01 15:51:33

标题: GAMMA-PD：帕金森病多模态运动障碍评估的基于图的分析

摘要: 医学技术的快速发展导致多模态医学数据的指数增长，包括成像、基因组学和电子健康记录（EHRs）。图神经网络（GNNs）广泛用于表示这些数据，因为它们在捕捉成对关系方面表现突出。然而，多模态医学数据的异质性和复杂性仍然对标准GNNs构成重大挑战，这些模型在学习高阶、非成对关系方面存在困难。本文提出了GAMMA-PD（帕金森病多模态运动障碍评估的基于图的分析）这一新颖的异质超图融合框架，用于多模态临床数据分析。GAMMA-PD将成像和非成像数据整合到一个“超网络”（患者人口图）中，通过保留患者配置文件和症状亚型之间的高阶信息和相似性。我们还设计了一个基于特征的注意力加权机制，以解释特征级别对下游决策任务的贡献。我们使用帕金森病进展标记倡议（PPMI）和一个私有数据集的临床数据评估我们的方法。我们展示了在预测帕金森病运动障碍症状方面的收益。我们的端到端框架还学习了患者特征子集之间的关联，以生成与疾病和症状配置文件相关的临床解释。源代码可在https://github.com/favour-nerrise/GAMMA-PD 上找到。

更新时间: 2024-10-01 15:51:33

领域: q-bio.QM,cs.AI,cs.LG,eess.IV,q-bio.NC

下载: http://arxiv.org/abs/2410.00944v1

Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles

Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as shortcut learning, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose DiffDiv an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs) to mitigate this form of bias. We show that at particular training intervals, DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features. We leverage this crucial property to generate synthetic counterfactuals to increase model diversity via ensemble disagreement. We show that DPM-guided diversification is sufficient to remove dependence on shortcut cues, without a need for additional supervised signals. We further empirically quantify its efficacy on several diversification objectives, and finally show improved generalization and diversification on par with prior work that relies on auxiliary data collection.

Updated: 2024-10-01 15:50:57

标题: 用扩散对抗事实和多样化集成缓解快捷学习

摘要: 数据中的虚假相关性，即多个线索都能预测目标标签，通常会导致一种被称为快捷学习的现象，即模型依赖错误的、容易学习的线索，而忽略可靠的线索。在这项工作中，我们提出了一个利用扩散概率模型（DPMs）的集成多样化框架DiffDiv来缓解这种偏见。我们展示了在特定的训练间隔中，DPMs可以生成具有新特征组合的图像，即使在显示相关输入特征的样本上训练。我们利用这一关键特性通过集成争议生成合成对照实验，以增加模型多样性。我们展示了DPM引导的多样化足以消除对快捷线索的依赖，无需额外的监督信号。我们进一步通过实证量化了其在几个多样化目标上的有效性，并最终展示了与依赖辅助数据收集的既往工作相当的改进泛化和多样化。

更新时间: 2024-10-01 15:50:57

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2311.16176v4

DRIM: Learning Disentangled Representations from Incomplete Multimodal Healthcare Data

Real-life medical data is often multimodal and incomplete, fueling the growing need for advanced deep learning models capable of integrating them efficiently. The use of diverse modalities, including histopathology slides, MRI, and genetic data, offers unprecedented opportunities to improve prognosis prediction and to unveil new treatment pathways. Contrastive learning, widely used for deriving representations from paired data in multimodal tasks, assumes that different views contain the same task-relevant information and leverages only shared information. This assumption becomes restrictive when handling medical data since each modality also harbors specific knowledge relevant to downstream tasks. We introduce DRIM, a new multimodal method for capturing these shared and unique representations, despite data sparsity. More specifically, given a set of modalities, we aim to encode a representation for each one that can be divided into two components: one encapsulating patient-related information common across modalities and the other, encapsulating modality-specific details. This is achieved by increasing the shared information among different patient modalities while minimizing the overlap between shared and unique components within each modality. Our method outperforms state-of-the-art algorithms on glioma patients survival prediction tasks, while being robust to missing modalities. To promote reproducibility, the code is made publicly available at https://github.com/Lucas-rbnt/DRIM

Updated: 2024-10-01 15:47:14

标题: DRIM：从不完整的多模态医疗数据中学习分离表示

摘要: 真实生活中的医疗数据通常是多模态且不完整的，这加剧了对能够有效整合它们的先进深度学习模型的需求。使用包括组织病理学切片、MRI和基因数据在内的多种模态提供了改善预后预测和揭示新治疗途径的前所未有机会。对比学习广泛用于多模态任务中从配对数据中导出表示，假定不同视图包含相同的任务相关信息并且利用共享信息。然而，在处理医疗数据时，这种假设变得过于严格，因为每种模态还包含与下游任务相关的特定知识。我们介绍了DRIM，这是一种新的多模态方法，可以捕获这些共享和独特的表示，尽管数据稀疏。具体来说，给定一组模态，我们的目标是为每个模态编码一个表示，该表示可以分为两个组成部分：一个封装了跨模态共同的与患者相关信息，另一个封装了模态特定的细节。通过增加不同患者模态之间的共享信息，同时最小化每个模态中共享和独特组件之间的重叠，实现了这一目标。我们的方法在胶质瘤患者生存预测任务上优于最先进的算法，同时对缺失的模态具有鲁棒性。为了促进可重现性，我们将代码公开发布在https://github.com/Lucas-rbnt/DRIM。

更新时间: 2024-10-01 15:47:14

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.17055v2

M$^{2}$M: Learning controllable Multi of experts and multi-scale operators are the Partial Differential Equations need

Learning the evolutionary dynamics of Partial Differential Equations (PDEs) is critical in understanding dynamic systems, yet current methods insufficiently learn their representations. This is largely due to the multi-scale nature of the solution, where certain regions exhibit rapid oscillations while others evolve more slowly. This paper introduces a framework of multi-scale and multi-expert (M$^2$M) neural operators designed to simulate and learn PDEs efficiently. We employ a divide-and-conquer strategy to train a multi-expert gated network for the dynamic router policy. Our method incorporates a controllable prior gating mechanism that determines the selection rights of experts, enhancing the model's efficiency. To optimize the learning process, we have implemented a PI (Proportional, Integral) control strategy to adjust the allocation rules precisely. This universal controllable approach allows the model to achieve greater accuracy. We test our approach on benchmark 2D Navier-Stokes equations and provide a custom multi-scale dataset. M$^2$M can achieve higher simulation accuracy and offer improved interpretability compared to baseline methods.

Updated: 2024-10-01 15:42:09

标题: M$^{2}$M：学习可控的专家组合和多尺度操作符是需要的偏微分方程

摘要: 学习偏微分方程（PDEs）的演变动力学对于理解动态系统至关重要，然而当前的方法未能充分学习它们的表示。这在很大程度上是由于解的多尺度性质，其中某些区域表现出快速振荡，而其他区域演变较慢。本文介绍了一个多尺度和多专家（M$^2$M）神经算子框架，旨在高效模拟和学习PDEs。我们采用分而治之的策略来训练一个多专家门控网络，用于动态路由策略。我们的方法结合了一个可控的先验门控机制，确定专家的选择权利，增强了模型的效率。为了优化学习过程，我们实施了一个PI（比例、积分）控制策略，精确调整分配规则。这种通用的可控方法使模型能够达到更高的准确性。我们在基准2D Navier-Stokes方程上测试了我们的方法，并提供了一个定制的多尺度数据集。与基准方法相比，M$^2$M能够实现更高的模拟准确性，并提供更好的可解释性。

更新时间: 2024-10-01 15:42:09

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.11617v1

The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums

Large language models (LLMs) can be used to analyze cyber threat intelligence (CTI) data from cybercrime forums, which contain extensive information and key discussions about emerging cyber threats. However, to date, the level of accuracy and efficiency of LLMs for such critical tasks has yet to be thoroughly evaluated. Hence, this study assesses the performance of an LLM system built on the OpenAI GPT-3.5-turbo model [8] to extract CTI information. To do so, a random sample of more than 700 daily conversations from three cybercrime forums - XSS, Exploit_in, and RAMP - was extracted, and the LLM system was instructed to summarize the conversations and predict 10 key CTI variables, such as whether a large organization and/or a critical infrastructure is being targeted, with only simple human-language instructions. Then, two coders reviewed each conversation and evaluated whether the information extracted by the LLM was accurate. The LLM system performed well, with an average accuracy score of 96.23%, an average precision of 90% and an average recall of 88.2%. Various ways to enhance the model were uncovered, such as the need to help the LLM distinguish between stories and past events, as well as being careful with verb tenses in prompts. Nevertheless, the results of this study highlight the relevance of using LLMs for cyber threat intelligence.

Updated: 2024-10-01 15:41:22

标题: 大型语言模型（LLM）在网络犯罪论坛中用于网络威胁情报（CTI）的应用

摘要: 大型语言模型（LLMs）可以用于分析来自网络犯罪论坛的网络威胁情报（CTI）数据，这些数据包含有关新兴网络威胁的广泛信息和关键讨论。然而，迄今为止，LLMs在这些关键任务中的准确性和效率水平尚未得到彻底评估。因此，本研究评估了建立在OpenAI GPT-3.5-turbo模型上的LLM系统的性能，用于提取CTI信息。为此，从三个网络犯罪论坛（XSS、Exploit_in和RAMP）中提取了700多个日常对话的随机样本，并指示LLM系统总结对话并预测10个关键的CTI变量，如是否针对大型组织和/或关键基础设施，仅通过简单的人类语言指示。然后，两名编码员审查每个对话，并评估LLM提取的信息是否准确。LLM系统表现良好，平均准确率为96.23%，平均精确度为90%，平均召回率为88.2%。发现了增强模型的各种方法，例如需要帮助LLM区分故事和过去事件，以及在提示中小心使用动词时态。尽管如此，本研究的结果突显了使用LLMs进行网络威胁情报的相关性。

更新时间: 2024-10-01 15:41:22

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2408.03354v3

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

Recognizing if LLM output can be grounded in evidence is central to many tasks in NLP: retrieval-augmented generation, summarization, document-grounded dialogue, and more. Current approaches to this kind of fact-checking are based on verifying each piece of a model generation against potential evidence using an LLM. However, this process can be very computationally expensive, requiring many calls to a model to check a single response. In this work, we show how to build small fact-checking models that have GPT-4-level performance but for 400x lower cost. We do this by constructing synthetic training data with GPT-4, which involves creating realistic yet challenging instances of factual errors via a structured generation procedure. Training on this data teaches models to check each fact in the claim and recognize synthesis of information across sentences. For evaluation, we unify datasets from recent work on fact-checking and grounding LLM generations into a new benchmark, LLM-AggreFact. Our best system MiniCheck-FT5 (770M parameters) outperforms all systems of comparable size and reaches GPT-4 accuracy. We release LLM-AggreFact, code for data synthesis, and models.

Updated: 2024-10-01 15:39:48

标题: MiniCheck：在基础文件上高效核实LLMs

摘要: 识别LLM输出是否可以基于证据进行支撑对于自然语言处理中的许多任务至关重要：检索增强生成、摘要、文档支撑对话等。目前针对这种事实核查的方法是基于使用LLM验证模型生成的每一部分与潜在证据的匹配。然而，这个过程可能非常昂贵，需要对一个单一响应进行多次调用模型。在这项工作中，我们展示了如何构建具有GPT-4级性能但成本降低400倍的小型事实核查模型。我们通过使用GPT-4构建合成训练数据来实现这一点，这涉及通过结构化生成过程创建现实但具有挑战性的事实错误实例。在这些数据上进行训练教导模型检查每个声明中的事实，并识别跨句子的信息综合。为了评估，我们将最近关于事实核查和基于LLM生成的数据集统一到一个新的基准数据集LLM-AggreFact中。我们最好的系统MiniCheck-FT5（770M参数）优于所有相同规模系统，并达到了GPT-4的准确度。我们发布了LLM-AggreFact、数据合成的代码和模型。

更新时间: 2024-10-01 15:39:48

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.10774v2

Fast Multiplication and the PLWE-RLWE Equivalence for an Infinite Family of Cyclotomic Subextensions

We prove the equivalence between the Ring Learning With Errors (RLWE) and the Polynomial Learning With Errors (PLWE) problems for the maximal totally real subfield of the $2^r 3^s$-th cyclotomic field for $r \geq 3$ and $s \geq 1$. Moreover, we describe a fast algorithm for computing the product of two elements in the ring of integers of these subfields. This multiplication algorithm has quasilinear complexity in the dimension of the field, as it makes use of the fast Discrete Cosine Transform (DCT). Our approach assumes that the two input polynomials are given in a basis of Chebyshev-like polynomials, in contrast to the customary power basis. To validate this assumption, we prove that the change of basis from the power basis to the Chebyshev-like basis can be computed with $\mathcal{O}(n \log n)$ arithmetic operations, where $n$ is the problem dimension. Finally, we provide a heuristic and theoretical comparison of the vulnerability to some attacks for the $p$-th cyclotomic field versus the maximal totally real subextension of the $4p$-th cyclotomic field for a reasonable set of parameters of cryptographic size.

Updated: 2024-10-01 15:32:02

标题: 快速乘法和一个无限循环子扩展族的PLWE-RLWE等价性

摘要: 我们证明了对于$r \geq 3$和$s \geq 1$的$2^r 3^s$-th旋转域的最大完全实子域，环学习带错误（RLWE）和多项式学习带错误（PLWE）问题之间的等价性。此外，我们描述了一个快速算法，用于计算这些子域的整数环中两个元素的乘积。这个乘法算法在域的维度方面具有准线性复杂度，因为它利用了快速离散余弦变换（DCT）。我们的方法假设两个输入多项式以Chebyshev样式多项式为基础，而不是通常的幂基础。为了验证这一假设，我们证明了从幂基础到Chebyshev样式基础的基变换可以通过$\mathcal{O}(n \log n)$算术运算来计算，其中$n$是问题的维度。最后，我们对密码尺寸的一组合理参数进行了$p$-th旋转域与$4p$-th旋转域的最大完全实子扩展的一些攻击的脆弱性进行了启发式和理论比较。

更新时间: 2024-10-01 15:32:02

领域: cs.CR,math.NT,11T71 (Primary), 94A60 (Secondary)

下载: http://arxiv.org/abs/2410.00792v1

NeuroPath: A Neural Pathway Transformer for Joining the Dots of Human Connectomes

Although modern imaging technologies allow us to study connectivity between two distinct brain regions in-vivo, an in-depth understanding of how anatomical structure supports brain function and how spontaneous functional fluctuations emerge remarkable cognition is still elusive. Meanwhile, tremendous efforts have been made in the realm of machine learning to establish the nonlinear mapping between neuroimaging data and phenotypic traits. However, the absence of neuroscience insight in the current approaches poses significant challenges in understanding cognitive behavior from transient neural activities. To address this challenge, we put the spotlight on the coupling mechanism of structural connectivity (SC) and functional connectivity (FC) by formulating such network neuroscience question into an expressive graph representation learning problem for high-order topology. Specifically, we introduce the concept of topological detour to characterize how a ubiquitous instance of FC (direct link) is supported by neural pathways (detour) physically wired by SC, which forms a cyclic loop interacted by brain structure and function. In the clich\'e of machine learning, the multi-hop detour pathway underlying SC-FC coupling allows us to devise a novel multi-head self-attention mechanism within Transformer to capture multi-modal feature representation from paired graphs of SC and FC. Taken together, we propose a biological-inspired deep model, coined as NeuroPath, to find putative connectomic feature representations from the unprecedented amount of neuroimages, which can be plugged into various downstream applications such as task recognition and disease diagnosis. We have evaluated NeuroPath on large-scale public datasets including HCP and UK Biobank under supervised and zero-shot learning, where the state-of-the-art performance by our NeuroPath indicates great potential in network neuroscience.

Updated: 2024-10-01 15:23:56

标题: 神经路径转换器NeuroPath：连接人类连接组的节点

摘要: 尽管现代成像技术使我们能够在体内研究两个不同脑区之间的连接性，但对解剖结构如何支持大脑功能以及自发功能波动如何产生卓越认知的深入理解仍然难以捉摸。与此同时，机器学习领域已经做出了巨大努力，以建立神经影像数据和表型特征之间的非线性映射。然而，当前方法中缺乏神经科学的洞察力，这在理解认知行为从瞬时神经活动中的挑战中带来了重大障碍。为了解决这一挑战，我们将焦点放在结构连接（SC）和功能连接（FC）的耦合机制上，将这种网络神经科学问题表达为一个高阶拓扑的表达式图学习问题。具体地，我们引入了拓扑绕道的概念，以描述一个普遍的FC实例（直接链接）如何通过由SC物理连通的神经通路（绕道）支持，形成一个由大脑结构和功能互动的循环环。在机器学习的陈词滥调中，潜在的SC-FC耦合下的多跳绕道路径使我们能够在Transformer内设计一种新颖的多头自注意机制，以从SC和FC的成对图中捕获多模态特征表示。综上所述，我们提出了一个生物启发的深度模型，称为NeuroPath，用于从前所未有的大量神经影像中找到假定的连接组特征表示，这些特征表示可以应用到各种下游应用中，如任务识别和疾病诊断。我们已经在包括HCP和英国生物银行在内的大规模公共数据集上评估了NeuroPath，采用监督学习和零样本学习，在这些数据集上我们的NeuroPath表现出的最新性能显示出了在网络神经科学中的巨大潜力。

更新时间: 2024-10-01 15:23:56

领域: q-bio.NC,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.17510v2

Early Detection of Coronary Heart Disease Using Hybrid Quantum Machine Learning Approach

Coronary heart disease (CHD) is a severe cardiac disease, and hence, its early diagnosis is essential as it improves treatment results and saves money on medical care. The prevailing development of quantum computing and machine learning (ML) technologies may bring practical improvement to the performance of CHD diagnosis. Quantum machine learning (QML) is receiving tremendous interest in various disciplines due to its higher performance and capabilities. A quantum leap in the healthcare industry will increase processing power and optimise multiple models. Techniques for QML have the potential to forecast cardiac disease and help in early detection. To predict the risk of coronary heart disease, a hybrid approach utilizing an ensemble machine learning model based on QML classifiers is presented in this paper. Our approach, with its unique ability to address multidimensional healthcare data, reassures the method's robustness by fusing quantum and classical ML algorithms in a multi-step inferential framework. The marked rise in heart disease and death rates impacts worldwide human health and the global economy. Reducing cardiac morbidity and mortality requires early detection of heart disease. In this research, a hybrid approach utilizes techniques with quantum computing capabilities to tackle complex problems that are not amenable to conventional machine learning algorithms and to minimize computational expenses. The proposed method has been developed in the Raspberry Pi 5 Graphics Processing Unit (GPU) platform and tested on a broad dataset that integrates clinical and imaging data from patients suffering from CHD and healthy controls. Compared to classical machine learning models, the accuracy, sensitivity, F1 score, and specificity of the proposed hybrid QML model used with CHD are manifold higher.

Updated: 2024-10-01 15:21:05

标题: 使用混合量子机器学习方法早期检测冠心病

摘要: 冠心病（CHD）是一种严重的心脏疾病，因此早期诊断对改善治疗结果和节省医疗费用至关重要。量子计算和机器学习（ML）技术的不断发展可能会实现对CHD诊断性能的实际改进。量子机器学习（QML）由于其更高的性能和能力，正在各个学科中引起巨大关注。医疗保健行业的量子飞跃将增加处理能力，并优化多个模型。QML技术具有预测心脏疾病和帮助早期检测的潜力。为了预测冠心病的风险，本文提出了一种利用基于QML分类器的集成机器学习模型的混合方法。我们的方法通过在多步推理框架中融合量子和经典ML算法，具有处理多维医疗数据的独特能力，从而保证了方法的稳健性。心脏疾病和死亡率的显著增加影响全球人类健康和全球经济。减少心脏发病率和死亡率需要早期检测心脏疾病。在这项研究中，一种混合方法利用具有量子计算能力的技术解决传统机器学习算法无法解决的复杂问题，并最小化计算开销。所提出的方法已在树莓派5图形处理单元（GPU）平台上开发，并在整合了CHD患者和健康对照者的临床和影像数据的广泛数据集上进行了测试。与经典机器学习模型相比，使用CHD的提出的混合QML模型的准确性、敏感性、F1分数和特异性都更高。

更新时间: 2024-10-01 15:21:05

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.10932v2

Divide And Conquer: Learning Chaotic Dynamical Systems With Multistep Penalty Neural Ordinary Differential Equations

Forecasting high-dimensional dynamical systems is a fundamental challenge in various fields, such as geosciences and engineering. Neural Ordinary Differential Equations (NODEs), which combine the power of neural networks and numerical solvers, have emerged as a promising algorithm for forecasting complex nonlinear dynamical systems. However, classical techniques used for NODE training are ineffective for learning chaotic dynamical systems. In this work, we propose a novel NODE-training approach that allows for robust learning of chaotic dynamical systems. Our method addresses the challenges of non-convexity and exploding gradients associated with underlying chaotic dynamics. Training data trajectories from such systems are split into multiple, non-overlapping time windows. In addition to the deviation from the training data, the optimization loss term further penalizes the discontinuities of the predicted trajectory between the time windows. The window size is selected based on the fastest Lyapunov time scale of the system. Multi-step penalty(MP) method is first demonstrated on Lorenz equation, to illustrate how it improves the loss landscape and thereby accelerates the optimization convergence. MP method can optimize chaotic systems in a manner similar to least-squares shadowing with significantly lower computational costs. Our proposed algorithm, denoted the Multistep Penalty NODE, is applied to chaotic systems such as the Kuramoto-Sivashinsky equation, the two-dimensional Kolmogorov flow, and ERA5 reanalysis data for the atmosphere. It is observed that MP-NODE provide viable performance for such chaotic systems, not only for short-term trajectory predictions but also for invariant statistics that are hallmarks of the chaotic nature of these dynamics.

Updated: 2024-10-01 15:19:42

标题: 分而治之：利用多步惩罚神经普通微分方程学习混沌动力系统

摘要: 预测高维动力系统是各个领域的基本挑战，如地球科学和工程学。神经常微分方程（NODEs）结合了神经网络和数值求解器的强大功能，已经成为预测复杂非线性动力系统的一种有前途的算法。然而，用于NODE训练的经典技术对于学习混沌动力系统是无效的。在这项工作中，我们提出了一种新颖的NODE训练方法，允许对混沌动力系统进行强大的学习。我们的方法解决了与底层混沌动态相关的非凸性和梯度爆炸的挑战。来自这些系统的训练数据轨迹被分割为多个不重叠的时间窗口。除了与训练数据的偏差之外，优化损失项进一步惩罚了预测轨迹在时间窗口之间的不连续性。窗口大小是根据系统的最快Lyapunov时间尺度选择的。首先在Lorenz方程上展示了多步惩罚（MP）方法，以说明它如何改善损失景观，从而加速优化收敛。MP方法可以以类似于最小二乘阴影法的方式优化混沌系统，并且计算成本显著降低。我们提出的算法，称为多步惩罚NODE，应用于混沌系统，如Kuramoto-Sivashinsky方程、二维Kolmogorov流动和ERA5大气再分析数据。观察到MP-NODE为这些混沌系统提供了可行的性能，不仅用于短期轨迹预测，还用于这些动力学混沌特性的不变统计数据。

更新时间: 2024-10-01 15:19:42

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.00568v4

Adaptive Motion Generation Using Uncertainty-Driven Foresight Prediction

Uncertainty of environments has long been a difficult characteristic to handle, when performing real-world robot tasks. This is because the uncertainty produces unexpected observations that cannot be covered by manual scripting. Learning based robot controlling methods are a promising approach for generating flexible motions against unknown situations, but still tend to suffer under uncertainty due to its deterministic nature. In order to adaptively perform the target task under such conditions, the robot control model must be able to accurately understand the possible uncertainty, and to exploratively derive the optimal action that minimizes such uncertainty. This paper extended an existing predictive learning based robot control method, which employ foresight prediction using dynamic internal simulation. The foresight module refines the model's hidden states by sampling multiple possible futures and replace with the one that led to the lower future uncertainty. The adaptiveness of the model was evaluated on a door opening task. The door can be opened either by pushing, pulling, or sliding, but robot cannot visually distinguish which way, and is required to adapt on the fly. The results showed that the proposed model adaptively diverged its motion through interaction with the door, whereas conventional methods failed to stably diverge. The models were analyzed on Lyapunov exponents of RNN hidden states which reflect the possible divergence at each time step during task execution. The result indicated that the foresight module biased the model to consider future consequences, which lead to embedding uncertainties at the policy of the robot controller, rather than the resultant observation. This is beneficial for implementing adaptive behaviors, which indices derivation of diverse motion during exploration.

Updated: 2024-10-01 15:13:27

标题: 利用不确定性驱动的先见预测进行自适应运动生成

摘要: 环境的不确定性长期以来一直是在执行现实世界机器人任务时难以处理的一个特征。这是因为不确定性会产生无法通过手动脚本覆盖的意外观察结果。基于学习的机器人控制方法是一种有希望的途径，可以针对未知情况生成灵活的动作，但由于其确定性特性，仍然容易在不确定性下受到影响。为了在这种条件下适应性地执行目标任务，机器人控制模型必须能够准确理解可能的不确定性，并进行探索性地推导最小化这种不确定性的最佳行动。本文扩展了一种基于预测学习的机器人控制方法，该方法利用动态内部模拟进行预测。预测模块通过对多种可能的未来进行采样并替换为导致未来不确定性较低的未来之一，来精炼模型的隐藏状态。该模型的适应性在一项开门任务中进行了评估。门可以通过推、拉或滑动打开，但机器人无法在视觉上区分哪种方式，并且需要在运行时进行适应。结果显示，所提出的模型通过与门的互动自适应地分歧了其运动，而传统方法未能稳定地分歧。模型在任务执行过程中每个时间步的RNN隐藏状态的Lyapunov指数上进行了分析，这反映了可能的分歧。结果表明，预测模块使模型倾向于考虑未来后果，这导致了嵌入在机器人控制器策略中的不确定性，而不是结果观察。这有利于实现自适应行为，从而在探索过程中推导出多样化的动作。

更新时间: 2024-10-01 15:13:27

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.00774v1

BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data

Large language models (LLMs) have become increasingly pivotal across various domains, especially in handling complex data types. This includes structured data processing, as exemplified by ChartQA and ChatGPT-Ada, and multimodal unstructured data processing as seen in Visual Question Answering (VQA). These areas have attracted significant attention from both industry and academia. Despite this, there remains a lack of unified evaluation methodologies for these diverse data handling scenarios. In response, we introduce BabelBench, an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution. BabelBench incorporates a dataset comprising 247 meticulously curated problems that challenge the models with tasks in perception, commonsense reasoning, logical reasoning, and so on. Besides the basic capabilities of multimodal understanding, structured data processing as well as code generation, these tasks demand advanced capabilities in exploration, planning, reasoning and debugging. Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement. The insights derived from our comprehensive analysis offer valuable guidance for future research within the community. The benchmark data can be found at https://github.com/FFD8FFE/babelbench.

Updated: 2024-10-01 15:11:24

标题: BabelBench：用于多模态和多结构数据的代码驱动分析的全面基准

摘要: 大型语言模型（LLMs）在各个领域变得日益关键，特别是在处理复杂数据类型方面。这包括结构化数据处理，如ChartQA和ChatGPT-Ada所示，以及在视觉问答（VQA）中看到的多模态非结构化数据处理。这些领域吸引了工业界和学术界的广泛关注。尽管如此，针对这些不同数据处理场景仍然缺乏统一的评估方法。为此，我们引入了BabelBench，一个创新的基准框架，用于评估LLMs在处理多模态多结构数据和代码执行方面的能力。BabelBench包含一个包含247个精心策划问题的数据集，这些问题挑战模型进行感知、常识推理、逻辑推理等任务。除了多模态理解、结构化数据处理以及代码生成的基本能力之外，这些任务还要求具有探索、规划、推理和调试等高级能力。我们在BabelBench上的实验结果表明，即使像ChatGPT 4这样的尖端模型也有很大的改进空间。我们综合分析得出的见解为未来研究提供了有价值的指导。基准数据可在https://github.com/FFD8FFE/babelbench 找到。

更新时间: 2024-10-01 15:11:24

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2410.00773v1

Outlier Gradient Analysis: Efficiently Improving Deep Learning Model Performance via Hessian-Free Influence Functions

A core data-centric learning challenge is the identification of training samples that are detrimental to model performance. Influence functions serve as a prominent tool for this task and offer a robust framework for assessing training data influence on model predictions. Despite their widespread use, their high computational cost associated with calculating the inverse of the Hessian matrix pose constraints, particularly when analyzing large-sized deep models. In this paper, we establish a bridge between identifying detrimental training samples via influence functions and outlier gradient detection. This transformation not only presents a straightforward and Hessian-free formulation but also provides insights into the role of the gradient in sample impact. Through systematic empirical evaluations, we first validate the hypothesis of our proposed outlier gradient analysis approach on synthetic datasets. We then demonstrate its effectiveness in detecting mislabeled samples in vision models and selecting data samples for improving performance of natural language processing transformer models. We also extend its use to influential sample identification for fine-tuning Large Language Models.

Updated: 2024-10-01 15:07:09

标题: 异常值梯度分析：通过无Hessian影响函数有效提升深度学习模型性能

摘要: 核心数据中心学习挑战之一是识别对模型性能有害的训练样本。影响函数作为一种突出的工具，为评估训练数据对模型预测的影响提供了一个强大的框架。尽管它们被广泛使用，但与计算海森矩阵的逆相关的高计算成本会带来约束，特别是在分析大型深度模型时。在本文中，我们建立了通过影响函数识别有害训练样本和异常梯度检测之间的桥梁。这种转变不仅提出了一个简单且无海森矩阵的公式，还揭示了梯度在样本影响中的作用。通过系统的实证评估，我们首先在合成数据集上验证了我们提出的异常梯度分析方法的假设。然后，我们展示了它在视觉模型中检测错误标记样本和选择数据样本以改进自然语言处理变压器模型性能的有效性。我们还将其扩展到对大型语言模型进行微调时的重要样本识别。

更新时间: 2024-10-01 15:07:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.03869v3

OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer

In this paper, we introduce OmniHands, a universal approach to recovering interactive hand meshes and their relative movement from monocular or multi-view inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a universal architecture with novel tokenization and contextual feature fusion strategies, capable of adapting to a variety of tasks. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such tokenization indicates the relative relationship of two hands, it also supports more effective feature fusion. To this end, we further develop a 4D Interaction Reasoning (FIR) module to fuse hand tokens in 4D with attention and decode them into 3D hand meshes and relative temporal movements. The efficacy of our approach is validated on several benchmark datasets. The results on in-the-wild videos and real-world scenarios demonstrate the superior performances of our approach for interactive hand reconstruction. More video results can be found on the project page: https://OmniHand.github.io.

Updated: 2024-10-01 15:04:23

标题: OmniHands：通过多功能变换器实现稳健的4D手部网格恢复

摘要: 在本文中，我们介绍了OmniHands，一种通用方法，用于从单目或多视角输入中恢复交互式手部网格及其相对运动。我们的方法解决了先前方法的两个主要局限性：缺乏处理各种手部图像输入的统一解决方案以及忽视图像中两只手的位置关系。为了克服这些挑战，我们开发了一种具有新型标记化和上下文特征融合策略的通用架构，能够适应各种任务。具体而言，我们提出了一种关系感知的双手标记化（RAT）方法，将位置关系信息嵌入手部标记中。通过这种方式，我们的网络可以处理单手和双手输入，并明确利用相对手部位置，有助于在现实场景中重建复杂的手部交互。由于这种标记化指示了两只手的相对关系，它还支持更有效的特征融合。为此，我们进一步开发了一个4D交互推理（FIR）模块，用注意力融合手部标记，并将它们解码为3D手部网格和相对时间运动。我们的方法的有效性已在几个基准数据集上得到验证。在野外视频和现实场景中的结果展示了我们的方法在交互式手部重建方面的卓越性能。更多视频结果可在项目页面上找到：https://OmniHand.github.io。

更新时间: 2024-10-01 15:04:23

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2405.20330v3

Using Steganography and Watermarking For Medical Image Integrity

Medical imaging has kept up with the digital age. Medical images such as x-rays are no longer keep on film or; even made with film. Rather, they are digital. In addition, they are transmitted for reasons of consultation and telehealth as well as archived. Transmission and retrieval of these images presents an integrity issue, with a high level of integrity being needed. Very small artifacts in a digital medical image can have significant importance, making or changing a diagnosis. It is imperative that the integrity of a medical image, especially in a Region of Interest be identifiable and preserved. Watermarking and steganography are used for the purposes of authenticating images, especially for copyright purposes. These techniques can be applied to medical images. However, these techniques can interfere with the integrity of the picture. While such distortion may be acceptable in other domains, in the medical domain this distortion is not acceptable. High accuracy is imperative for diagnosis. This paper discusses the techniques used, their advantages and shortcomings as well as methods of overcoming obstacles to integrity.

Updated: 2024-10-01 15:03:42

标题: 使用隐写术和水印技术保障医学图像的完整性

摘要: 医学影像学已经跟上了数字时代的步伐。医学影像，如X光片，不再存储在胶片上，甚至不再用胶片制作。相反，它们是数字化的。此外，它们被传输用于会诊和远程医疗，并进行存档。这些图像的传输和检索存在完整性问题，需要高水平的完整性。数字医学图像中的微小伪影可能具有重要意义，可能会制定或改变诊断。医学图像的完整性，尤其是在感兴趣区域内的完整性，必须可识别和保留。数字水印和隐写术用于验证图像，尤其是出于版权目的。这些技术可以应用于医学图像。然而，这些技术可能会干扰图像的完整性。虽然这种失真在其他领域可能是可以接受的，在医学领域中，这种失真是不可接受的。对于诊断，高准确性至关重要。本文讨论了使用的技术、它们的优点和缺点，以及克服完整性障碍的方法。

更新时间: 2024-10-01 15:03:42

领域: cs.CR,cs.GR,E.3; H.3; I.3; J.2

下载: http://arxiv.org/abs/2410.09071v1

OLAPH: Improving Factuality in Biomedical Long-form Question Answering

In the medical domain, numerous scenarios necessitate the long-form generation ability of large language models (LLMs). Specifically, when addressing patients' questions, it is essential that the model's response conveys factual claims, highlighting the need for an automated method to evaluate those claims. Thus, we introduce MedLFQA, a benchmark dataset reconstructed using long-form question-answering datasets related to the biomedical domain. We use MedLFQA to facilitate a cost-effective automatic evaluations of factuality. We also propose OLAPH, a simple and novel framework that utilizes cost-effective and multifaceted automatic evaluation to construct a synthetic preference set and answers questions in our preferred manner. Our framework leads us to train LLMs step-by-step to reduce hallucinations and include crucial medical claims. We highlight that, even on evaluation metrics not used during training, LLMs trained with our OLAPH framework demonstrate significant performance improvement in factuality. Our findings reveal that a 7B LLM trained with our OLAPH framework can provide long answers comparable to the medical experts' answers in terms of factuality. We believe that our work could shed light on gauging the long-text generation ability of LLMs in the medical domain. Our code and datasets are available.

Updated: 2024-10-01 15:03:14

标题: OLAPH：改善生物医学长篇问题回答中的真实性

摘要: 在医学领域，许多情景需要大型语言模型（LLMs）的长篇生成能力。具体来说，在回答患者问题时，模型的回应传达事实性主张至关重要，这突显了需要一种自动方法来评估这些主张的必要性。因此，我们介绍了MedLFQA，这是一个使用与生物医学领域相关的长篇问答数据集重建的基准数据集。我们使用MedLFQA来促进一种成本效益的事实性自动评估。我们还提出了OLAPH，这是一个简单而新颖的框架，利用成本效益和多方面的自动评估来构建一个合成的偏好集，并以我们偏好的方式回答问题。我们的框架引导我们逐步训练LLMs以减少幻觉并包含关键的医学主张。我们强调，即使在培训过程中未使用的评估指标上，使用我们的OLAPH框架训练的LLMs在事实性方面表现出显著的性能提升。我们的研究结果显示，使用我们的OLAPH框架训练的7B LLM能够提供与医学专家答案相比在事实性方面相当的长篇回答。我们相信我们的工作可以帮助评估LLMs在医学领域的长文本生成能力。我们的代码和数据集可供使用。

更新时间: 2024-10-01 15:03:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.12701v2

HOLA-Drone: Hypergraphic Open-ended Learning for Zero-Shot Multi-Drone Cooperative Pursuit

Zero-shot coordination (ZSC) is a significant challenge in multi-agent collaboration, aiming to develop agents that can coordinate with unseen partners they have not encountered before. Recent cutting-edge ZSC methods have primarily focused on two-player video games such as OverCooked!2 and Hanabi. In this paper, we extend the scope of ZSC research to the multi-drone cooperative pursuit scenario, exploring how to construct a drone agent capable of coordinating with multiple unseen partners to capture multiple evaders. We propose a novel Hypergraphic Open-ended Learning Algorithm (HOLA-Drone) that continuously adapts the learning objective based on our hypergraphic-form game modeling, aiming to improve cooperative abilities with multiple unknown drone teammates. To empirically verify the effectiveness of HOLA-Drone, we build two different unseen drone teammate pools to evaluate their performance in coordination with various unseen partners. The experimental results demonstrate that HOLA-Drone outperforms the baseline methods in coordination with unseen drone teammates. Furthermore, real-world experiments validate the feasibility of HOLA-Drone in physical systems. Videos can be found on the project homepage~\url{https://sites.google.com/view/hola-drone}.

Updated: 2024-10-01 14:46:42

标题: HOLA-Drone：零感知多无人机协同追逐的超图开放式学习

摘要: 零射击协调（ZSC）是多智能体协作中的一个重要挑战，旨在开发能够与之前未曾遇见的伙伴协调的智能体。最近的前沿ZSC方法主要集中在两人游戏如《煮糊涂了！2》和《花火》上。在本文中，我们将ZSC研究范围扩展到多无人机协作追逐场景，探讨如何构建一个能够与多个未知伙伴协调以捕捉多个逃犯的无人机智能体。我们提出了一种新颖的超图开放式学习算法（HOLA-Drone），根据我们的超图形式游戏建模不断调整学习目标，旨在提高与多个未知无人机队友的协作能力。为了从经验上验证HOLA-Drone的有效性，我们构建了两个不同的未知无人机队友池，评估它们与各种未知伙伴的协调表现。实验结果表明，HOLA-Drone在与未知无人机队友的协调中优于基准方法。此外，现实世界的实验证实了HOLA-Drone在物理系统中的可行性。视频可以在项目主页上找到\url{https://sites.google.com/view/hola-drone}。

更新时间: 2024-10-01 14:46:42

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2409.08767v2

Evidence Is All You Need: Ordering Imaging Studies via Language Model Alignment with the ACR Appropriateness Criteria

Diagnostic imaging studies are an increasingly important component of the workup and management of acutely presenting patients. However, ordering appropriate imaging studies according to evidence-based medical guidelines is a challenging task with a high degree of variability between healthcare providers. To address this issue, recent work has investigated if generative AI and large language models can be leveraged to help clinicians order relevant imaging studies for patients. However, it is challenging to ensure that these tools are correctly aligned with medical guidelines, such as the American College of Radiology's Appropriateness Criteria (ACR AC). In this study, we introduce a framework to intelligently leverage language models by recommending imaging studies for patient cases that are aligned with evidence-based guidelines. We make available a novel dataset of patient "one-liner" scenarios to power our experiments, and optimize state-of-the-art language models to achieve an accuracy on par with clinicians in image ordering. Finally, we demonstrate that our language model-based pipeline can be used as intelligent assistants by clinicians to support image ordering workflows and improve the accuracy of imaging study ordering according to the ACR AC. Our work demonstrates and validates a strategy to leverage AI-based software to improve trustworthy clinical decision making in alignment with expert evidence-based guidelines.

Updated: 2024-10-01 14:44:52

标题: 证据就是你需要的一切：通过语言模型与ACR适宜性标准对齐来订购成像研究

摘要: 诊断成像研究越来越成为急诊患者筛查和管理的重要组成部分。然而，根据基于证据的医学指南订购适当的成像研究是一项具有挑战性的任务，不同医疗保健提供者之间存在较高程度的变异性。为解决这一问题，最近的研究调查了生成式人工智能和大型语言模型是否可以利用来帮助临床医生为患者订购相关的成像研究。然而，确保这些工具正确符合医学指南（如美国放射学会的适宜性标准）是具有挑战性的。在本研究中，我们介绍了一个框架，通过推荐与基于证据的指南一致的患者病例进行成像研究。我们提供了一个新颖的患者“一句话”情景数据集来支持我们的实验，并优化了最先进的语言模型，使其在成像订购方面的准确性与临床医生相当。最后，我们展示了我们基于语言模型的管道可以被用作智能助手，帮助临床医生支持成像研究工作流程，并根据ACR AC提高成像研究订购的准确性。我们的工作展示并验证了一种利用基于人工智能的软件来改善与专家基于证据的指南一致的可信临床决策制定的策略。

更新时间: 2024-10-01 14:44:52

领域: cs.LG,cs.CL,cs.CY

下载: http://arxiv.org/abs/2409.19177v2

FELRec: Efficient Handling of Item Cold-Start With Dynamic Representation Learning in Recommender Systems

Recommender systems suffer from the cold-start problem whenever a new user joins the platform or a new item is added to the catalog. To address item cold-start, we propose to replace the embedding layer in sequential recommenders with a dynamic storage that has no learnable weights and can keep an arbitrary number of representations. In this paper, we present FELRec, a large embedding network that refines the existing representations of users and items in a recursive manner, as new information becomes available. In contrast to similar approaches, our model represents new users and items without side information and time-consuming finetuning, instead it runs a single forward pass over a sequence of existing representations. During item cold-start, our method outperforms similar method by 29.50%-47.45%. Further, our proposed model generalizes well to previously unseen datasets in zero-shot settings. The source code is publicly available at https://github.com/kweimann/FELRec .

Updated: 2024-10-01 14:39:12

标题: FELRec：在推荐系统中通过动态表示学习高效处理物品冷启动

摘要: 推荐系统在新用户加入平台或新项目添加到目录时会遇到冷启动问题。为了解决项目冷启动问题，我们提出用一个没有可学习权重且可以保留任意数量表示的动态存储器替换顺序推荐系统中的嵌入层。在本文中，我们提出了FELRec，一个大型嵌入网络，以递归方式优化用户和项目的现有表示，随着新信息的出现。与类似方法不同，我们的模型表示新用户和项目而不需要附加信息和耗时的微调，而是在现有表示序列上运行单个前向传递。在项目冷启动期间，我们的方法的表现比类似方法提高了29.50%-47.45%。此外，我们提出的模型在零样本设置下很好地泛化到以前未见的数据集。源代码可以在https://github.com/kweimann/FELRec 上公开获取。

更新时间: 2024-10-01 14:39:12

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2210.16928v2

Ladder Bottom-up Convolutional Bidirectional Variational Autoencoder for Image Translation of Dotted Arabic Expiration Dates

This paper proposes an approach of Ladder Bottom-up Convolutional Bidirectional Variational Autoencoder (LCBVAE) architecture for the encoder and decoder, which is trained on the image translation of the dotted Arabic expiration dates by reconstructing the Arabic dotted expiration dates into filled-in expiration dates. We employed a customized and adapted version of Convolutional Recurrent Neural Network CRNN model to meet our specific requirements and enhance its performance in our context, and then trained the custom CRNN model with the filled-in images from the year of 2019 to 2027 to extract the expiration dates and assess the model performance of LCBVAE on the expiration date recognition. The pipeline of (LCBVAE+CRNN) can be then integrated into an automated sorting systems for extracting the expiry dates and sorting the products accordingly during the manufacture stage. Additionally, it can overcome the manual entry of expiration dates that can be time-consuming and inefficient at the merchants. Due to the lack of the availability of the dotted Arabic expiration date images, we created an Arabic dot-matrix True Type Font (TTF) for the generation of the synthetic images. We trained the model with unrealistic synthetic dates of 60,000 images and performed the testing on a realistic synthetic date of 3000 images from the year of 2019 to 2027, represented as yyyy/mm/dd. In our study, we demonstrated the significance of latent bottleneck layer with improving the generalization when the size is increased up to 1024 in downstream transfer learning tasks as for image translation. The proposed approach achieved an accuracy of 97% on the image translation with using the LCBVAE architecture that can be generalized for any downstream learning tasks as for image translation and reconstruction.

Updated: 2024-10-01 14:35:59

标题: 梯形底部上卷积双向变分自动编码器用于点状阿拉伯文过期日期图像翻译

摘要: 本文提出了一种梯形自底向上卷积双向变分自动编码器（LCBVAE）架构的方法，用于编码器和解码器，通过对点状阿拉伯到期日期的图像翻译进行训练，将阿拉伯点状到期日期重构为填充的到期日期。我们采用了一个定制和适应的卷积循环神经网络CRNN模型，以满足我们的特定要求并增强其在我们的环境中的性能，然后使用2019年至2027年的填充图像对定制CRNN模型进行训练，提取到期日期并评估LCBVAE在到期日期识别上的模型性能。 (LCBVAE+CRNN)的流程可以集成到自动分类系统中，用于在制造阶段提取到期日期并相应地对产品进行分类。此外，它可以克服商家手动输入到期日期的耗时和低效问题。由于缺乏点状阿拉伯到期日期图像的可用性，我们创建了一个用于生成合成图像的阿拉伯点阵True Type Font（TTF）。我们训练了模型，使用60,000张不真实的合成日期图像进行训练，并在2019年至2027年的3000张真实合成日期图像上进行测试，表示为yyyy/mm/dd。在我们的研究中，我们展示了潜在瓶颈层在增加到1024时，对改善下游迁移学习任务（如图像翻译）的泛化的重要性。提出的方法在使用LCBVAE架构进行图像翻译时实现了97%的准确率，可以推广到任何下游学习任务，如图像翻译和重建。

更新时间: 2024-10-01 14:35:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2310.14069v2

MobileMEF: Fast and Efficient Method for Multi-Exposure Fusion

Recent advances in camera design and imaging technology have enabled the capture of high-quality images using smartphones. However, due to the limited dynamic range of digital cameras, the quality of photographs captured in environments with highly imbalanced lighting often results in poor-quality images. To address this issue, most devices capture multi-exposure frames and then use some multi-exposure fusion method to merge those frames into a final fused image. Nevertheless, most traditional and current deep learning approaches are unsuitable for real-time applications on mobile devices due to their heavy computational and memory requirements. We propose a new method for multi-exposure fusion based on an encoder-decoder deep learning architecture with efficient building blocks tailored for mobile devices. This efficient design makes our model capable of processing 4K resolution images in less than 2 seconds on mid-range smartphones. Our method outperforms state-of-the-art techniques regarding full-reference quality measures and computational efficiency (runtime and memory usage), making it ideal for real-time applications on hardware-constrained devices. Our code is available at: https://github.com/LucasKirsten/MobileMEF.

Updated: 2024-10-01 14:26:16

标题: MobileMEF: 快速高效的多曝光融合方法

摘要: 最近摄像机设计和成像技术的进步使得使用智能手机能够捕捉高质量的图像。然而，由于数字相机的动态范围有限，在光线高度不平衡的环境中拍摄的照片质量通常会导致图像质量较差。为了解决这个问题，大多数设备捕捉多曝光帧，然后使用一些多曝光融合方法将这些帧合并成最终的融合图像。然而，大多数传统和当前的深度学习方法由于计算和内存要求较高，不适合在移动设备上进行实时应用。我们提出了一种基于编码器-解码器深度学习架构的多曝光融合新方法，该方法采用了专为移动设备设计的高效构建块。这种高效设计使得我们的模型能够在中档智能手机上在不到2秒的时间内处理4K分辨率图像。我们的方法在全参考质量度量和计算效率（运行时间和内存使用）方面胜过现有技术，使其非常适合在硬件受限设备上的实时应用。我们的代码可在以下链接找到：https://github.com/LucasKirsten/MobileMEF。

更新时间: 2024-10-01 14:26:16

领域: eess.IV,cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.07932v2

LTLf Synthesis on First-Order Action Theories

Golog is an expressive high-level agent language that includes nondeterministic operators which allow to leave some of the decisions to be made only at execution time. This so-called program realization is typically implemented by means of search, or in an incremental online fashion. In this paper, we consider the more realistic case where parts of the non-determinism are under the control of the environment. Program realization then becomes a synthesis problem, where a successful realization executes the program and satisfies the temporal goal for all possible environment actions. We consider Golog programs in combination with an expressive class of first-order action theories that allow for an unbounded number of objects and non-local effects, together with a temporal goal specified in a first-order extension of LTLf. We solve the synthesis problem by constructing a game arena that captures all possible executions of the program while tracking the satisfaction of the temporal goal and then solving the resulting two-player game. We evaluate the approach in two domains, showing the general feasibility of the approach.

Updated: 2024-10-01 14:15:14

标题: 基于一阶动作理论的LTLf合成

摘要: Golog是一种表达力强的高级代理语言，包括允许将一些决策留给执行时才做出的非确定性运算符。这种所谓的程序实现通常通过搜索或增量在线方式来实现。在本文中，我们考虑了更现实的情况，即非确定性的部分受环境控制。程序实现则变成了一个综合问题，其中成功的实现执行程序并满足所有可能环境操作的时间目标。我们考虑了与一类表达力强的一阶动作理论结合的Golog程序，该理论允许无限数量的对象和非局部效果，以及在LTLf的一阶扩展中指定的时间目标。我们通过构建捕捉程序所有可能执行并跟踪时间目标满足情况的游戏竞技场来解决综合问题，然后解决结果为两人游戏。我们在两个领域评估了这种方法，展示了该方法的一般可行性。

更新时间: 2024-10-01 14:15:14

领域: cs.AI,cs.LO

下载: http://arxiv.org/abs/2410.00726v1

Enhancing GANs with Contrastive Learning-Based Multistage Progressive Finetuning SNN and RL-Based External Optimization

The application of deep learning in cancer research, particularly in early diagnosis, case understanding, and treatment strategy design, emphasizes the need for high-quality data. Generative AI, especially Generative Adversarial Networks (GANs), has emerged as a leading solution to challenges like class imbalance, robust learning, and model training, while addressing issues stemming from patient privacy and the scarcity of real data. Despite their promise, GANs face several challenges, both inherent and specific to histopathology data. Inherent issues include training imbalance, mode collapse, linear learning from insufficient discriminator feedback, and hard boundary convergence due to stringent feedback. Histopathology data presents a unique challenge with its complex representation, high spatial resolution, and multiscale features. To address these challenges, we propose a framework consisting of two components. First, we introduce a contrastive learning-based Multistage Progressive Finetuning Siamese Neural Network (MFT-SNN) for assessing the similarity between histopathology patches. Second, we implement a Reinforcement Learning-based External Optimizer (RL-EO) within the GAN training loop, serving as a reward signal generator. The modified discriminator loss function incorporates a weighted reward, guiding the GAN to maximize this reward while minimizing loss. This approach offers an external optimization guide to the discriminator, preventing generator overfitting and ensuring smooth convergence. Our proposed solution has been benchmarked against state-of-the-art (SOTA) GANs and a Denoising Diffusion Probabilistic model, outperforming previous SOTA across various metrics, including FID score, KID score, Perceptual Path Length, and downstream classification tasks.

Updated: 2024-10-01 14:14:32

标题: 利用对比学习的多阶段渐进微调SNN和基于RL的外部优化增强GANs

摘要: 深度学习在癌症研究中的应用，特别是在早期诊断、病例理解和治疗策略设计方面，强调了高质量数据的需求。生成式人工智能，特别是生成对抗网络（GANs），已成为解决诸如类别不平衡、稳健学习和模型训练等挑战的主要解决方案，同时解决了源自患者隐私和真实数据稀缺性的问题。尽管它们具有潜力，但GANs面临着一些挑战，既有固有的问题，也有特定于组织病理学数据的问题。固有问题包括训练不平衡、模式坍塌、线性学习不足的鉴别器反馈和由于严格反馈而导致的硬边界收敛。组织病理学数据以其复杂的表示、高空间分辨率和多尺度特征呈现了独特的挑战。为了解决这些挑战，我们提出了一个由两个组件组成的框架。首先，我们引入了基于对比学习的多阶段渐进微调连体神经网络（MFT-SNN），用于评估组织病理学块之间的相似性。其次，我们在GAN训练循环中实现了基于强化学习的外部优化器（RL-EO），作为奖励信号生成器。修改后的鉴别器损失函数结合了一个加权奖励，引导GAN最大化这个奖励同时最小化损失。这种方法为鉴别器提供了外部优化指导，防止生成器过拟合并确保平稳收敛。我们提出的解决方案已经与最先进的（SOTA）GANs和去噪扩散概率模型进行了基准测试，在各种指标上表现优于以往的SOTA，包括FID分数、KID分数、感知路径长度和下游分类任务。

更新时间: 2024-10-01 14:14:32

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.20340v2

Low-Energy On-Device Personalization for MCUs

Microcontroller Units (MCUs) are ideal platforms for edge applications due to their low cost and energy consumption, and are widely used in various applications, including personalized machine learning tasks, where customized models can enhance the task adaptation. However, existing approaches for local on-device personalization mostly support simple ML architectures or require complex local pre-training/training, leading to high energy consumption and negating the low-energy advantage of MCUs. In this paper, we introduce $MicroT$, an efficient and low-energy MCU personalization approach. $MicroT$ includes a robust, general, but tiny feature extractor, developed through self-supervised knowledge distillation, which trains a task-specific head to enable independent on-device personalization with minimal energy and computational requirements. MicroT implements an MCU-optimized early-exit inference mechanism called stage-decision to further reduce energy costs. This mechanism allows for user-configurable exit criteria (stage-decision ratio) to adaptively balance energy cost with model performance. We evaluated MicroT using two models, three datasets, and two MCU boards. $MicroT$ outperforms traditional transfer learning (TTL) and two SOTA approaches by 2.12 - 11.60% across two models and three datasets. Targeting widely used energy-aware edge devices, MicroT's on-device training requires no additional complex operations, halving the energy cost compared to SOTA approaches by up to 2.28X while keeping SRAM usage below 1MB. During local inference, MicroT reduces energy cost by 14.17% compared to TTL across two boards and two datasets, highlighting its suitability for long-term use on energy-aware resource-constrained MCUs.

Updated: 2024-10-01 14:08:10

标题: 低能耗的MCU设备个性化

摘要: 微控制器单元（MCUs）是边缘应用的理想平台，因为它们具有低成本和低能耗，并广泛用于各种应用，包括个性化的机器学习任务，其中定制模型可以增强任务适应性。然而，现有的用于本地设备个性化的方法大多支持简单的ML架构或需要复杂的本地预训练/训练，导致能耗高，抵消了MCUs的低能耗优势。在本文中，我们介绍了$MicroT$，一种高效且低能耗的MCU个性化方法。$MicroT$包括一个强大、通用但微小的特征提取器，通过自监督知识蒸馏开发，训练一个任务特定的头部，以实现独立的本地设备个性化，能够以最小的能量和计算要求。MicroT实现了一个经过MCU优化的早期退出推理机制，称为阶段决策，以进一步降低能量成本。这个机制允许用户配置退出标准（阶段决策比率），以自适应地平衡能耗和模型性能。我们使用两个模型、三个数据集和两个MCU板评估了MicroT。在两个模型和三个数据集中，$MicroT$相比传统迁移学习（TTL）和两种最先进的方法表现优异，提高了2.12％至11.60％。针对广泛使用的能量感知边缘设备，MicroT的本地设备训练不需要额外的复杂操作，与SOTA方法相比，能耗降低了最高达2.28倍，同时将SRAM使用量保持在1MB以下。在本地推理过程中，MicroT将能耗降低了14.17％，相较于TTL，在两个板和两个数据集中，突出了其适用于能量感知资源受限MCU的长期使用。

更新时间: 2024-10-01 14:08:10

领域: cs.LG,cs.AR

下载: http://arxiv.org/abs/2403.08040v4

Contrastive Abstraction for Reinforcement Learning

Learning agents with reinforcement learning is difficult when dealing with long trajectories that involve a large number of states. To address these learning problems effectively, the number of states can be reduced by abstract representations that cluster states. In principle, deep reinforcement learning can find abstract states, but end-to-end learning is unstable. We propose contrastive abstraction learning to find abstract states, where we assume that successive states in a trajectory belong to the same abstract state. Such abstract states may be basic locations, achieved subgoals, inventory, or health conditions. Contrastive abstraction learning first constructs clusters of state representations by contrastive learning and then applies modern Hopfield networks to determine the abstract states. The first phase of contrastive abstraction learning is self-supervised learning, where contrastive learning forces states with sequential proximity to have similar representations. The second phase uses modern Hopfield networks to map similar state representations to the same fixed point, i.e.\ to an abstract state. The level of abstraction can be adjusted by determining the number of fixed points of the modern Hopfield network. Furthermore, \textit{contrastive abstraction learning} does not require rewards and facilitates efficient reinforcement learning for a wide range of downstream tasks. Our experiments demonstrate the effectiveness of contrastive abstraction learning for reinforcement learning.

Updated: 2024-10-01 13:56:09

标题: 增强学习的对比抽象

摘要: 使用强化学习的学习代理在处理涉及大量状态的长轨迹时是困难的。为了有效解决这些学习问题，可以通过聚类状态的抽象表示来减少状态的数量。原则上，深度强化学习可以找到抽象状态，但端到端学习是不稳定的。我们提出对比抽象学习来找到抽象状态，假设轨迹中连续的状态属于同一抽象状态。这样的抽象状态可能是基本位置、实现的子目标、库存或健康状况。对比抽象学习首先通过对比学习构建状态表示的簇，然后应用现代Hopfield网络来确定抽象状态。对比抽象学习的第一阶段是自监督学习，对比学习迫使具有顺序接近性的状态具有相似的表示。第二阶段使用现代Hopfield网络将相似的状态表示映射到相同的固定点，即抽象状态。通过确定现代Hopfield网络的固定点数量可以调整抽象级别。此外，对比抽象学习不需要奖励，并且有助于广泛范围的下游任务的高效强化学习。我们的实验证明了对比抽象学习在强化学习中的有效性。

更新时间: 2024-10-01 13:56:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.00704v1

Creative Problem Solving in Large Language and Vision Models -- What Would it Take?

We advocate for a strong integration of Computational Creativity (CC) with research in large language and vision models (LLVMs) to address a key limitation of these models, i.e., creative problem solving. We present preliminary experiments showing how CC principles can be applied to address this limitation. Our goal is to foster discussions on creative problem solving in LLVMs and CC at prestigious ML venues. Our code is available at: https://github.com/lnairGT/creative-problem-solving-LLMs

Updated: 2024-10-01 13:46:04

标题: 大型语言和视觉模型中的创造性问题解决 -- 需要什么？

摘要: 我们主张将计算创造力（CC）与大型语言和视觉模型（LLVMs）的研究强烈整合，以解决这些模型的一个关键局限，即创造性问题解决能力。我们提出初步实验证明了CC原则如何可以应用来解决这一限制。我们的目标是促进在ML领域的著名会议上对LLVMs和CC中创造性问题解决的讨论。我们的代码可在以下链接找到：https://github.com/lnairGT/creative-problem-solving-LLMs

更新时间: 2024-10-01 13:46:04

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.01453v3

Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity

The minimax sample complexity of group distributionally robust optimization (GDRO) has been determined up to a $\log(K)$ factor, for $K$ the number of groups. In this work, we venture beyond the minimax perspective via a novel notion of sparsity that we dub $(\lambda, \beta)$-sparsity. In short, this condition means that at any parameter $\theta$, there is a set of at most $\beta$ groups whose risks at $\theta$ all are at least $\lambda$ larger than the risks of the other groups. To find an $\epsilon$-optimal $\theta$, we show via a novel algorithm and analysis that the $\epsilon$-dependent term in the sample complexity can swap a linear dependence on $K$ for a linear dependence on the potentially much smaller $\beta$. This improvement leverages recent progress in sleeping bandits, showing a fundamental connection between the two-player zero-sum game optimization framework for GDRO and per-action regret bounds in sleeping bandits. The aforementioned result assumes having a particular $\lambda$ as input. Perhaps surprisingly, we next show an adaptive algorithm which, up to log factors, gets sample complexity that adapts to the best $(\lambda, \beta)$-sparsity condition that holds. Finally, for a particular input $\lambda$, we also show how to get a dimension-free sample complexity result.

Updated: 2024-10-01 13:45:55

标题: 在群体分布鲁棒优化中超越极小化速率：通过一种新的稀疏概念

摘要: 群分布鲁棒优化（GDRO）的极小最大样本复杂性已经确定了一个$\log(K)$因子，其中$K$是群组的数量。在这项工作中，我们通过一个新颖的稀疏性概念超越极小最大视角，我们将其称为$(\lambda, \beta)$-稀疏性。简而言之，这个条件意味着在任何参数$\theta$处，存在一个至多包含$\beta$个群组的集合，这些群组在$\theta$处的风险都至少比其他群组的风险大$\lambda$。为了找到一个$\epsilon$-最优的$\theta$，我们通过一种新颖的算法和分析表明，样本复杂性中的$\epsilon$相关项可以将对$K$的线性依赖替换为对可能小得多的$\beta$的线性依赖。这种改进利用了最近在睡眠赌徒方面的进展，展示了GDRO的双人零和博弈优化框架与睡眠赌徒中每次行动遗憾上限之间的基本联系。前述结果假设具有特定的$\lambda$作为输入。也许令人惊讶的是，我们接下来展示了一种自适应算法，该算法在对数因子的范围内获得了适应于最佳$(\lambda, \beta)$-稀疏性条件的样本复杂性。最后，对于特定输入的$\lambda$，我们还展示了如何获得一个无维度的样本复杂性结果。

更新时间: 2024-10-01 13:45:55

领域: cs.LG,cs.AI,math.OC

下载: http://arxiv.org/abs/2410.00690v1

Classifier-free graph diffusion for molecular property targeting

This work focuses on the task of property targeting: that is, generating molecules conditioned on target chemical properties to expedite candidate screening for novel drug and materials development. DiGress is a recent diffusion model for molecular graphs whose distinctive feature is allowing property targeting through classifier-based (CB) guidance. While CB guidance may work to generate molecular-like graphs, we hint at the fact that its assumptions apply poorly to the chemical domain. Based on this insight we propose a classifier-free DiGress (FreeGress), which works by directly injecting the conditioning information into the training process. CF guidance is convenient given its less stringent assumptions and since it does not require to train an auxiliary property regressor, thus halving the number of trainable parameters in the model. We empirically show that our model yields up to 79% improvement in Mean Absolute Error with respect to DiGress on property targeting tasks on QM9 and ZINC-250k benchmarks. As an additional contribution, we propose a simple yet powerful approach to improve chemical validity of generated samples, based on the observation that certain chemical properties such as molecular weight correlate with the number of atoms in molecules.

Updated: 2024-10-01 13:45:04

标题: 无分类器的分子性能定位图扩散

摘要: 这项工作侧重于属性定向任务：即根据目标化学性质生成分子，以加快新药物和材料开发的候选者筛选过程。DiGress是一种最近的分子图扩散模型，其独特特征是通过基于分类器的指导实现属性定向。虽然基于分类器的指导可以用于生成类似分子的图，但我们暗示其假设在化学领域中表现不佳。基于这一洞察力，我们提出了一种无分类器的DiGress（FreeGress），它通过直接将条件信息注入训练过程来工作。无分类器指导是方便的，因为它的假设较不严格，而且不需要训练辅助属性回归器，从而将模型中可训练参数数量减半。我们在QM9和ZINC-250k基准测试中实验证明，我们的模型在属性定向任务上相对于DiGress的平均绝对误差可以提高高达79%。作为额外的贡献，我们提出了一种简单而强大的方法来提高生成样本的化学有效性，这种方法基于观察到的分子量等某些化学性质与分子中的原子数相关。

更新时间: 2024-10-01 13:45:04

领域: cs.LG,q-bio.BM

下载: http://arxiv.org/abs/2312.17397v2

Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation

This paper addresses the challenge of accurately translating technical terms, which are crucial for clear communication in specialized fields. We introduce the Parenthetical Terminology Translation (PTT) task, designed to mitigate potential inaccuracies by displaying the original term in parentheses alongside its translation. To implement this approach, we generated a representative PTT dataset using a collaborative approach with large language models and applied knowledge distillation to fine-tune traditional Neural Machine Translation (NMT) models and small-sized Large Language Models (sLMs). Additionally, we developed a novel evaluation metric to assess both overall translation accuracy and the correct parenthetical presentation of terms. Our findings indicate that sLMs did not consistently outperform NMT models, with fine-tuning proving more effective than few-shot prompting, particularly in models with continued pre-training in the target language. These insights contribute to the advancement of more reliable terminology translation methodologies.

Updated: 2024-10-01 13:40:28

标题: 高效的技术术语翻译：一种括号术语翻译的知识蒸馏方法

摘要: 这篇论文探讨了准确翻译技术术语的挑战，这对于专业领域中清晰沟通至关重要。我们引入了括号术语翻译（PTT）任务，旨在通过在翻译旁边显示原始术语的括号来减少潜在的不准确性。为了实施这一方法，我们利用大型语言模型和应用知识蒸馏的协作方法生成了代表性的PTT数据集，并对传统的神经机器翻译（NMT）模型和小型大语言模型（sLMs）进行微调。此外，我们开发了一种新颖的评估指标，用于评估整体翻译准确性和术语正确的括号呈现。我们的研究结果表明，sLMs并未始终优于NMT模型，微调比少量提示更有效，特别是在目标语言持续预训练的模型中。这些发现有助于推动更可靠的术语翻译方法的进步。

更新时间: 2024-10-01 13:40:28

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.00683v1

Advanced Arabic Alphabet Sign Language Recognition Using Transfer Learning and Transformer Models

This paper presents an Arabic Alphabet Sign Language recognition approach, using deep learning methods in conjunction with transfer learning and transformer-based models. We study the performance of the different variants on two publicly available datasets, namely ArSL2018 and AASL. This task will make full use of state-of-the-art CNN architectures like ResNet50, MobileNetV2, and EfficientNetB7, and the latest transformer models such as Google ViT and Microsoft Swin Transformer. These pre-trained models have been fine-tuned on the above datasets in an attempt to capture some unique features of Arabic sign language motions. Experimental results present evidence that the suggested methodology can receive a high recognition accuracy, by up to 99.6\% and 99.43\% on ArSL2018 and AASL, respectively. That is far beyond the previously reported state-of-the-art approaches. This performance opens up even more avenues for communication that may be more accessible to Arabic-speaking deaf and hard-of-hearing, and thus encourages an inclusive society.

Updated: 2024-10-01 13:39:26

标题: 使用迁移学习和Transformer模型的高级阿拉伯字母手语识别

摘要: 本文介绍了一种阿拉伯字母手语识别方法，利用深度学习方法结合迁移学习和基于转换器的模型。我们研究了不同变体在两个公开数据集上的表现，即ArSL2018和AASL。这项任务将充分利用ResNet50、MobileNetV2和EfficientNetB7等最先进的CNN架构，以及Google ViT和Microsoft Swin Transformer等最新的转换器模型。这些预训练模型已经在上述数据集上进行微调，以捕捉一些阿拉伯手语动作的独特特征。实验结果表明，建议的方法可以实现高达99.6%和99.43%的识别准确率，分别在ArSL2018和AASL上。这远远超过了先前报道的最先进方法。这种表现为更多阿拉伯语言聋哑和听力困难的人打开了更多交流途径，从而促进了一个包容的社会。

更新时间: 2024-10-01 13:39:26

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.00681v1

Learning Confidence Bounds for Classification with Imbalanced Data

Class imbalance poses a significant challenge in classification tasks, where traditional approaches often lead to biased models and unreliable predictions. Undersampling and oversampling techniques have been commonly employed to address this issue, yet they suffer from inherent limitations stemming from their simplistic approach such as loss of information and additional biases respectively. In this paper, we propose a novel framework that leverages learning theory and concentration inequalities to overcome the shortcomings of traditional solutions. We focus on understanding the uncertainty in a class-dependent manner, as captured by confidence bounds that we directly embed into the learning process. By incorporating class-dependent estimates, our method can effectively adapt to the varying degrees of imbalance across different classes, resulting in more robust and reliable classification outcomes. We empirically show how our framework provides a promising direction for handling imbalanced data in classification tasks, offering practitioners a valuable tool for building more accurate and trustworthy models.

Updated: 2024-10-01 13:35:15

标题: 学习不平衡数据分类的置信区间

摘要: 类别不平衡在分类任务中构成了重要挑战，传统方法通常会导致偏见模型和不可靠的预测。欠采样和过采样技术通常被用来解决这个问题，但它们都存在固有的局限性，源自它们的简单方法，比如信息丢失和额外偏见。在本文中，我们提出了一个新颖的框架，利用学习理论和集中不等式来克服传统解决方案的缺点。我们着重于以类别相关的方式理解不确定性，这由我们直接嵌入学习过程中的置信区间所捕获。通过结合类别相关的估计，我们的方法可以有效地适应不同类别之间的不平衡程度变化，从而产生更加健壮和可靠的分类结果。我们通过实证分析展示了我们的框架如何为处理分类任务中的不平衡数据提供了一个有前途的方向，为从业者提供了一个建立更准确和可信任模型的宝贵工具。

更新时间: 2024-10-01 13:35:15

领域: cs.LG

下载: http://arxiv.org/abs/2407.11878v2

Counterfactual Explanations for Medical Image Classification and Regression using Diffusion Autoencoder

Counterfactual explanations (CEs) aim to enhance the interpretability of machine learning models by illustrating how alterations in input features would affect the resulting predictions. Common CE approaches require an additional model and are typically constrained to binary counterfactuals. In contrast, we propose a novel method that operates directly on the latent space of a generative model, specifically a Diffusion Autoencoder (DAE). This approach offers inherent interpretability by enabling the generation of CEs and the continuous visualization of the model's internal representation across decision boundaries. Our method leverages the DAE's ability to encode images into a semantically rich latent space in an unsupervised manner, eliminating the need for labeled data or separate feature extraction models. We show that these latent representations are helpful for medical condition classification and the ordinal regression of severity pathologies, such as vertebral compression fractures (VCF) and diabetic retinopathy (DR). Beyond binary CEs, our method supports the visualization of ordinal CEs using a linear model, providing deeper insights into the model's decision-making process and enhancing interpretability. Experiments across various medical imaging datasets demonstrate the method's advantages in interpretability and versatility. The linear manifold of the DAE's latent space allows for meaningful interpolation and manipulation, making it a powerful tool for exploring medical image properties. Our code is available at https://doi.org/10.5281/zenodo.13859266.

Updated: 2024-10-01 13:34:36

标题: 使用扩散自动编码器对医学图像分类和回归进行因果解释

摘要: 反事实解释（CEs）旨在通过说明输入特征的变化如何影响结果预测来增强机器学习模型的可解释性。常见的CE方法需要额外的模型，并且通常受限于二进制反事实。相比之下，我们提出了一种新颖的方法，直接在生成模型的潜在空间上操作，特别是扩散自动编码器（DAE）。这种方法通过允许生成CE并在决策边界上连续可视化模型的内部表示，提供了固有的可解释性。我们的方法利用DAE以无监督的方式将图像编码到语义丰富的潜在空间中的能力，消除了对标记数据或单独特征提取模型的需求。我们表明这些潜在表示对于医疗状况分类和严重病理的有序回归，如椎体压缩骨折（VCF）和糖尿病视网膜病变（DR），是有帮助的。除了二进制CE，我们的方法还支持使用线性模型可视化有序CE，提供更深入的洞察模型的决策过程，并增强可解释性。通过各种医学影像数据集上的实验，证明了该方法在可解释性和多功能性上的优势。DAE潜在空间的线性流形允许有意义的插值和操作，使其成为探索医学图像特性的强大工具。我们的代码可在https://doi.org/10.5281/zenodo.13859266 上获取。

更新时间: 2024-10-01 13:34:36

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2408.01571v2

User-Guided Verification of Security Protocols via Sound Animation

Current formal verification of security protocols relies on specialized researchers and complex tools, inaccessible to protocol designers who informally evaluate their work with emulators. This paper addresses this gap by embedding symbolic analysis into the design process. Our approach implements the Dolev-Yao attack model using a variant of CSP based on Interaction Trees (ITrees) to compile protocols into animators -- executable programs that designers can use for debugging and inspection. To guarantee the soundness of our compilation, we mechanised our approach in the theorem prover Isabelle/HOL. As traditionally done with symbolic tools, we refer to the Diffie-Hellman key exchange and the Needham-Schroeder public-key protocol (and Lowe's patched variant). We demonstrate how our animator can easily reveal the mechanics of attacks and verify corrections. This work facilitates security integration at the design level and supports further security property analysis and software-engineered integrations.

Updated: 2024-10-01 13:34:35

标题: 用户引导的安全协议验证通过声音动画

摘要: 目前安全协议的形式验证依赖于专门的研究人员和复杂的工具，这些工具对于协议设计者来说是不可及的，他们通常使用模拟器来非正式地评估他们的工作。本文通过将符号分析嵌入到设计过程中来弥补这一差距。我们的方法实现了使用基于交互树（ITrees）的CSP变体来编译协议到动画程序——设计者可以用于调试和检查的可执行程序中的Dolev-Yao攻击模型。为了保证我们编译的正确性，我们在定理证明器Isabelle/HOL中将我们的方法机械化。与传统的符号工具一样，我们参考Diffie-Hellman密钥交换和Needham-Schroeder公钥协议（以及Lowe的修补变体）。我们展示了我们的动画程序如何轻松揭示攻击的机制并验证修正。这项工作促进了安全性在设计层面的集成，并支持进一步的安全性属性分析和软件工程集成。

更新时间: 2024-10-01 13:34:35

领域: cs.CR,cs.LO

下载: http://arxiv.org/abs/2410.00676v1

Gradient-Free Training of Recurrent Neural Networks using Random Perturbations

Recurrent neural networks (RNNs) hold immense potential for computations due to their Turing completeness and sequential processing capabilities, yet existing methods for their training encounter efficiency challenges. Backpropagation through time (BPTT), the prevailing method, extends the backpropagation (BP) algorithm by unrolling the RNN over time. However, this approach suffers from significant drawbacks, including the need to interleave forward and backward phases and store exact gradient information. Furthermore, BPTT has been shown to struggle to propagate gradient information for long sequences, leading to vanishing gradients. An alternative strategy to using gradient-based methods like BPTT involves stochastically approximating gradients through perturbation-based methods. This learning approach is exceptionally simple, necessitating only forward passes in the network and a global reinforcement signal as feedback. Despite its simplicity, the random nature of its updates typically leads to inefficient optimization, limiting its effectiveness in training neural networks. In this study, we present a new approach to perturbation-based learning in RNNs whose performance is competitive with BPTT, while maintaining the inherent advantages over gradient-based learning. To this end, we extend the recently introduced activity-based node perturbation (ANP) method to operate in the time domain, leading to more efficient learning and generalization. We subsequently conduct a range of experiments to validate our approach. Our results show similar performance, convergence time and scalability compared to BPTT, strongly outperforming standard node and weight perturbation methods. These findings suggest that perturbation-based learning methods offer a versatile alternative to gradient-based methods for training RNNs which can be ideally suited for neuromorphic computing applications

Updated: 2024-10-01 13:33:09

标题: 无梯度训练递归神经网络使用随机扰动

摘要: 递归神经网络（RNNs）由于其图灵完备性和顺序处理能力而具有巨大的计算潜力，然而现有的训练方法面临效率挑战。通过时间反向传播（BPTT）是主要方法，通过展开RNN随时间进行反向传播（BP）算法。然而，这种方法存在显著缺点，包括需要交替进行前向和后向阶段以及存储准确的梯度信息。此外，已经证明BPTT在传播长序列的梯度信息时存在困难，导致梯度消失。一种替代使用像BPTT这样的基于梯度的方法的策略是通过基于扰动的方法随机近似梯度。这种学习方法异常简单，只需要网络中的前向传递和全局强化信号作为反馈。尽管其简单性，其更新的随机性通常导致优化效率低下，限制了其在训练神经网络中的有效性。在本研究中，我们提出了一种新的基于扰动的RNN学习方法，其性能与BPTT竞争力强，同时保持了对基于梯度的学习的固有优势。为此，我们将最近引入的基于活动的节点扰动（ANP）方法扩展到时间域中，从而实现更高效的学习和泛化。随后我们进行了一系列实验来验证我们的方法。我们的结果显示与BPTT相比，具有类似的性能、收敛时间和可扩展性，明显优于标准的节点和权重扰动方法。这些发现表明，基于扰动的学习方法为训练RNN提供了一种多功能的替代方案，可以理想地适用于神经形态计算应用。

更新时间: 2024-10-01 13:33:09

领域: cs.LG

下载: http://arxiv.org/abs/2405.08967v3

A transformer-based deep reinforcement learning approach to spatial navigation in a partially observable Morris Water Maze

Navigation is a fundamental cognitive skill extensively studied in neuroscientific experiments and has lately gained substantial interest in artificial intelligence research. Recreating the task solved by rodents in the well-established Morris Water Maze (MWM) experiment, this work applies a transformer-based architecture using deep reinforcement learning -- an approach previously unexplored in this context -- to navigate a 2D version of the maze. Specifically, the agent leverages a decoder-only transformer architecture serving as a deep Q-network performing effective decision making in the partially observable environment. We demonstrate that the proposed architecture enables the agent to efficiently learn spatial navigation strategies, overcoming challenges associated with a limited field of vision, corresponding to the visual information available to a rodent in the MWM. Demonstrating the potential of transformer-based models for enhancing navigation performance in partially observable environments, this work suggests promising avenues for future research in artificial agents whose behavior resembles that of biological agents. Finally, the flexibility of the transformer architecture in supporting varying input sequence lengths opens opportunities for gaining increased understanding of the artificial agent's inner representation of the environment.

Updated: 2024-10-01 13:22:56

标题: 一个基于变压器的深度强化学习方法用于部分可观察的Morris水迷宫中的空间导航

摘要: 导航是一种基本的认知技能，在神经科学实验中得到了广泛研究，最近在人工智能研究中引起了相当大的兴趣。本文重建了老鼠在著名的莫里斯水迷宫（MWM）实验中解决的任务，应用了基于变压器的架构和深度强化学习，这是以前在这个领域尚未探索的方法，来导航迷宫的二维版本。具体来说，代理利用一个仅解码器的变压器架构，作为一个深度Q网络，在部分可观察环境中进行有效的决策。我们证明了所提出的架构使代理能够高效地学习空间导航策略，克服了与受限视野相关的挑战，对应于老鼠在MWM中可用的视觉信息。展示了基于变压器模型在部分可观测环境中增强导航性能的潜力，这项工作为未来研究提供了有前途的途径，即人工代理的行为类似于生物代理。最后，变压器架构在支持不同输入序列长度方面的灵活性为增加对人工代理对环境的内部表征的理解提供了机会。

更新时间: 2024-10-01 13:22:56

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.12820v1

HUMAP: Hierarchical Uniform Manifold Approximation and Projection

Dimensionality reduction (DR) techniques help analysts to understand patterns in high-dimensional spaces. These techniques, often represented by scatter plots, are employed in diverse science domains and facilitate similarity analysis among clusters and data samples. For datasets containing many granularities or when analysis follows the information visualization mantra, hierarchical DR techniques are the most suitable approach since they present major structures beforehand and details on demand. This work presents HUMAP, a novel hierarchical dimensionality reduction technique designed to be flexible on preserving local and global structures and preserve the mental map throughout hierarchical exploration. We provide empirical evidence of our technique's superiority compared with current hierarchical approaches and show a case study applying HUMAP for dataset labelling.

Updated: 2024-10-01 13:22:32

标题: HUMAP：分层统一流形逼近和投影

摘要: 降维技术（DR）帮助分析员理解高维空间中的模式。这些技术通常以散点图表示，在各种科学领域中被应用，促进簇和数据样本之间的相似性分析。对于包含许多粒度或遵循信息可视化原则的分析，层次化降维技术是最合适的方法，因为它们提前展示主要结构并根据需求提供细节。本文介绍了HUMAP，一种新颖的层次化降维技术，旨在灵活地保留局部和全局结构，并通过层次探索保持心理地图。我们提供了实证证据证明我们的技术相对于当前的层次化方法的优越性，并展示了一个应用HUMAP进行数据集标注的案例研究。

更新时间: 2024-10-01 13:22:32

领域: cs.LG,cs.GR

下载: http://arxiv.org/abs/2106.07718v4

Enhancing the analysis of murine neonatal ultrasonic vocalizations: Development, evaluation, and application of different mathematical models

Rodents employ a broad spectrum of ultrasonic vocalizations (USVs) for social communication. As these vocalizations offer valuable insights into affective states, social interactions, and developmental stages of animals, various deep learning approaches have aimed to automate both the quantitative (detection) and qualitative (classification) analysis of USVs. Here, we present the first systematic evaluation of different types of neural networks for USV classification. We assessed various feedforward networks, including a custom-built, fully-connected network and convolutional neural network, different residual neural networks (ResNets), an EfficientNet, and a Vision Transformer (ViT). Paired with a refined, entropy-based detection algorithm (achieving recall of 94.9% and precision of 99.3%), the best architecture (achieving 86.79% accuracy) was integrated into a fully automated pipeline capable of analyzing extensive USV datasets with high reliability. Additionally, users can specify an individual minimum accuracy threshold based on their research needs. In this semi-automated setup, the pipeline selectively classifies calls with high pseudo-probability, leaving the rest for manual inspection. Our study focuses exclusively on neonatal USVs. As part of an ongoing phenotyping study, our pipeline has proven to be a valuable tool for identifying key differences in USVs produced by mice with autism-like behaviors.

Updated: 2024-10-01 13:18:54

标题: 增强小鼠新生儿超声声音分析：不同数学模型的开发、评估和应用

摘要: 啮齿动物利用广泛的超声波声音（USVs）进行社交沟通。由于这些声音提供了有关情感状态、社交互动和动物发展阶段的宝贵见解，因此各种深度学习方法旨在自动化USVs的定量（检测）和定性（分类）分析。在这里，我们首次系统评估了不同类型的神经网络用于USV分类。我们评估了各种前馈网络，包括自定义构建的全连接网络和卷积神经网络，不同的残差神经网络（ResNets），EfficientNet和Vision Transformer（ViT）。结合一个精细的基于熵的检测算法（实现了94.9%的召回率和99.3%的精度），最佳架构（达到86.79%的准确度）被集成到一个完全自动化的管道中，能够以高可靠性分析大量的USV数据集。此外，用户可以根据其研究需要指定个体最低准确度阈值。在这种半自动化设置中，该管道选择性地对具有高伪概率的呼叫进行分类，将其余部分留给手动检查。我们的研究专注于新生USVs。作为正在进行的表型研究的一部分，我们的管道已被证明是识别患有类似自闭症行为的小鼠产生的USVs的关键差异的有价值工具。

更新时间: 2024-10-01 13:18:54

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.12957v3

Vicious Classifiers: Assessing Inference-time Data Reconstruction Risk in Edge Computing

Privacy-preserving inference in edge computing paradigms encourages the users of machine-learning services to locally run a model on their private input and only share the models outputs for a target task with the server. We study how a vicious server can reconstruct the input data by observing only the models outputs while keeping the target accuracy very close to that of a honest server by jointly training a target model (to run at users' side) and an attack model for data reconstruction (to secretly use at servers' side). We present a new measure to assess the inference-time reconstruction risk. Evaluations on six benchmark datasets show the model's input can be approximately reconstructed from the outputs of a single inference. We propose a primary defense mechanism to distinguish vicious versus honest classifiers at inference time. By studying such a risk associated with emerging ML services our work has implications for enhancing privacy in edge computing. We discuss open challenges and directions for future studies and release our code as a benchmark for the community at https://github.com/mmalekzadeh/vicious-classifiers .

Updated: 2024-10-01 13:18:41

标题: 恶意分类器：评估边缘计算中推理时数据重建风险

摘要: 隐私保护推理在边缘计算范式中鼓励机器学习服务的用户在他们的私人输入上本地运行模型，并仅与服务器共享目标任务的模型输出。我们研究了一个恶意服务器如何通过观察模型输出而重建输入数据，同时保持目标准确度非常接近诚实服务器的准确度，方法是联合训练目标模型（在用户端运行）和攻击模型（用于在服务器端秘密使用进行数据重建）。我们提出了一种新的度量来评估推理时间的重建风险。对六个基准数据集的评估显示，可以从单次推理的输出中大致重建模型的输入。我们提出了一种主要的防御机制，用于在推理时间区分恶意与诚实的分类器。通过研究与新兴ML服务相关的风险，我们的工作对增强边缘计算中的隐私具有影响。我们讨论未来研究的挑战和方向，并将我们的代码发布为社区的基准，网址为https://github.com/mmalekzadeh/vicious-classifiers。

更新时间: 2024-10-01 13:18:41

领域: cs.LG,cs.CR,cs.IT,math.IT

下载: http://arxiv.org/abs/2212.04223v3

Enhancing Fairness through Reweighting: A Path to Attain the Sufficiency Rule

We introduce an innovative approach to enhancing the empirical risk minimization (ERM) process in model training through a refined reweighting scheme of the training data to enhance fairness. This scheme aims to uphold the sufficiency rule in fairness by ensuring that optimal predictors maintain consistency across diverse sub-groups. We employ a bilevel formulation to address this challenge, wherein we explore sample reweighting strategies. Unlike conventional methods that hinge on model size, our formulation bases generalization complexity on the space of sample weights. We discretize the weights to improve training speed. Empirical validation of our method showcases its effectiveness and robustness, revealing a consistent improvement in the balance between prediction performance and fairness metrics across various experiments.

Updated: 2024-10-01 13:18:35

标题: 通过重新加权提高公平性：实现足够规则的途径

摘要: 我们介绍了一种创新的方法，通过对训练数据进行精细重新加权来增强模型训练中的经验风险最小化（ERM）过程，以增强公平性。该方案旨在通过确保最优预测器在不同子群体中保持一致性，来维护公平性的充分性规则。我们采用一个双层公式来解决这一挑战，其中我们探索样本重新加权策略。与依赖模型大小的传统方法不同，我们的公式将泛化复杂性基于样本权重空间。我们对权重进行离散化以提高训练速度。我们的方法的经验验证展示了其有效性和稳健性，揭示了在各种实验中预测性能和公平性指标之间平衡的持续改善。

更新时间: 2024-10-01 13:18:35

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2408.14126v2

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

Low-Rank Adaptation (LoRA) has emerged as a popular technique for fine-tuning large language models (LLMs) to various domains due to its modular design and widespread availability on platforms like Huggingface. This modularity has sparked interest in combining multiple LoRAs to enhance LLM capabilities. However, existing methods for LoRA composition primarily focus on task-specific adaptations that require additional training, and current model merging techniques often fail to fully leverage LoRA's modular nature, leading to parameter interference and performance degradation. In this paper, we investigate the feasibility of disassembling and reassembling multiple LoRAs at a finer granularity, analogous to assembling LEGO blocks. We introduce the concept of Minimal Semantic Units (MSUs), where the parameters corresponding to each rank in LoRA function as independent units. These MSUs demonstrate permutation invariance and concatenation-summation equivalence properties, enabling flexible combinations to create new LoRAs. Building on these insights, we propose the LoRA-LEGO framework. This framework conducts rank-wise parameter clustering by grouping MSUs from different LoRAs into $k$ clusters. The centroid of each cluster serves as a representative MSU, enabling the assembly of a merged LoRA with an adjusted rank of $k$. Additionally, we apply a dual reweighting strategy to optimize the scale of the merged LoRA. Experiments across various benchmarks demonstrate that our method outperforms existing approaches in LoRA merging.

Updated: 2024-10-01 13:16:45

标题: 将LoRA合并如同玩乐高积木：通过排名聚类将LoRA的模块化推向极致

摘要: 低秩适应（LoRA）已经成为一种流行的技术，用于微调大型语言模型（LLMs）以适应各种领域，这是由于其模块化设计和在像Huggingface这样的平台上广泛可用。这种模块化性引起了将多个LoRA组合以增强LLM功能的兴趣。然而，现有的LoRA组合方法主要集中在需要额外训练的任务特定适应上，当前的模型融合技术通常未能充分利用LoRA的模块化特性，导致参数干扰和性能下降。在本文中，我们研究了以类似于组装乐高积木的方式将多个LoRA进行拆解和重新组装的可行性。我们引入了最小语义单元（MSUs）的概念，其中LoRA中每个秩对应的参数作为独立单元。这些MSUs表现出置换不变性和连接-求和等价性属性，使得能够灵活组合以创建新的LoRAs。基于这些见解，我们提出了LoRA-LEGO框架。该框架通过将来自不同LoRAs的MSUs分组为$k$个簇来进行秩-wise参数聚类。每个簇的质心作为代表性MSU，使得能够组装具有$k$个调整秩的合并LoRA。此外，我们应用了双重重新加权策略来优化合并LoRA的尺度。在各种基准测试中的实验证明，我们的方法在LoRA融合方面优于现有方法。

更新时间: 2024-10-01 13:16:45

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.16167v2

Integrating PETs into Software Applications: A Game-Based Learning Approach

The absence of data protection measures in software applications leads to data breaches, threatening end-user privacy and causing instabilities in organisations that developed those software. Privacy Enhancing Technologies (PETs) emerge as promising safeguards against data breaches. PETs minimise threats to personal data while enabling software to extract valuable insights from them. However, software developers often lack the adequate knowledge and awareness to develop PETs integrated software. This issue is exacerbated by insufficient PETs related learning approaches customised for software developers. Therefore, we propose "PETs-101", a novel game-based learning framework that motivates developers to integrate PETs into software. By doing so, it aims to improve developers' privacy-preserving software development behaviour rather than simply delivering the learning content on PETs. In future, the proposed framework will be empirically investigated and used as a foundation for developing an educational gaming intervention that trains developers to put PETs into practice.

Updated: 2024-10-01 13:15:46

标题: 将PETs整合到软件应用程序中：基于游戏的学习方法

摘要: 软件应用中缺乏数据保护措施会导致数据泄露，威胁终端用户隐私，并导致开发这些软件的组织不稳定。隐私增强技术（PETs）作为数据泄露的有希望的防护措施出现。PETs在最小化个人数据威胁的同时，使软件能够从中提取有价值的洞见。然而，软件开发人员通常缺乏开发集成PETs的充分知识和意识。这一问题受到缺乏定制给软件开发人员的PETs相关学习方法的加剧。因此，我们提出了“PETs-101”，一个新颖的基于游戏的学习框架，激励开发者将PETs集成到软件中。通过这样做，它旨在提高开发者的隐私保护软件开发行为，而不仅仅是传递有关PETs的学习内容。未来，所提出的框架将进行实证调查，并作为开发教育游戏干预的基础，培训开发者将PETs付诸实践。

更新时间: 2024-10-01 13:15:46

领域: cs.CR

下载: http://arxiv.org/abs/2410.00661v1

Multimodal Coherent Explanation Generation of Robot Failures

The explainability of a robot's actions is crucial to its acceptance in social spaces. Explaining why a robot fails to complete a given task is particularly important for non-expert users to be aware of the robot's capabilities and limitations. So far, research on explaining robot failures has only considered generating textual explanations, even though several studies have shown the benefits of multimodal ones. However, a simple combination of multiple modalities may lead to semantic incoherence between the information across different modalities - a problem that is not well-studied. An incoherent multimodal explanation can be difficult to understand, and it may even become inconsistent with what the robot and the human observe and how they perform reasoning with the observations. Such inconsistencies may lead to wrong conclusions about the robot's capabilities. In this paper, we introduce an approach to generate coherent multimodal explanations by checking the logical coherence of explanations from different modalities, followed by refinements as required. We propose a classification approach for coherence assessment, where we evaluate if an explanation logically follows another. Our experiments suggest that fine-tuning a neural network that was pre-trained to recognize textual entailment, performs well for coherence assessment of multimodal explanations. Code & data: https://pradippramanick.github.io/coherent-explain/.

Updated: 2024-10-01 13:15:38

标题: 机器人故障的多模态连贯解释生成

摘要: 机器人行为的可解释性对其在社交空间中的接受至关重要。解释机器人为何未能完成给定任务对于非专业用户来说尤为重要，以便了解机器人的能力和局限性。到目前为止，关于解释机器人失败的研究只考虑生成文本解释，尽管一些研究表明多模态解释的好处。然而，简单地结合多种模态可能会导致不同模态之间信息的语义不一致-这是一个尚未得到很好研究的问题。不一致的多模态解释可能难以理解，甚至可能与机器人和人类观察到的以及他们如何对观察进行推理的情况不一致。这种不一致可能会导致对机器人能力的错误结论。在本文中，我们介绍了一种通过检查不同模态的解释的逻辑一致性，并根据需要进行细化的方法来生成连贯的多模态解释。我们提出了一种用于一致性评估的分类方法，通过评估一个解释是否在逻辑上跟随另一个解释。我们的实验表明，对经过预训练以识别文本蕴涵的神经网络进行微调，可以很好地用于评估多模态解释的一致性。代码和数据可在https://pradippramanick.github.io/coherent-explain/找到。

更新时间: 2024-10-01 13:15:38

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.00659v1

Explainable Multi-Stakeholder Job Recommender Systems

Public opinion on recommender systems has become increasingly wary in recent years. In line with this trend, lawmakers have also started to become more critical of such systems, resulting in the introduction of new laws focusing on aspects such as privacy, fairness, and explainability for recommender systems and AI at large. These concepts are especially crucial in high-risk domains such as recruitment. In recruitment specifically, decisions carry substantial weight, as the outcomes can significantly impact individuals' careers and companies' success. Additionally, there is a need for a multi-stakeholder approach, as these systems are used by job seekers, recruiters, and companies simultaneously, each with its own requirements and expectations. In this paper, I summarize my current research on the topic of explainable, multi-stakeholder job recommender systems and set out a number of future research directions.

Updated: 2024-10-01 13:12:30

标题: 可解释的多利益相关者工作推荐系统

摘要: 近年来，公众对推荐系统的看法变得越来越谨慎。与这一趋势相一致，立法者也开始对这类系统持更加批判的态度，导致出台了针对隐私、公平性和可解释性等方面的新法律，重点针对推荐系统和人工智能的整体。这些概念在招聘等高风险领域尤为关键。在招聘中，决策具有重大影响，因为结果可以显著影响个人的职业生涯和公司的成功。此外，需要采取多利益相关者的方法，因为这些系统同时被求职者、招聘人员和公司使用，每个群体都有自己的需求和期望。在本文中，我总结了我目前关于可解释性、多利益相关者的工作推荐系统的研究，并提出了一些未来研究方向。

更新时间: 2024-10-01 13:12:30

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2410.00654v1

BMFT: Achieving Fairness via Bias-based Weight Masking Fine-tuning

Developing models with robust group fairness properties is paramount, particularly in ethically sensitive domains such as medical diagnosis. Recent approaches to achieving fairness in machine learning require a substantial amount of training data and depend on model retraining, which may not be practical in real-world scenarios. To mitigate these challenges, we propose Bias-based Weight Masking Fine-Tuning (BMFT), a novel post-processing method that enhances the fairness of a trained model in significantly fewer epochs without requiring access to the original training data. BMFT produces a mask over model parameters, which efficiently identifies the weights contributing the most towards biased predictions. Furthermore, we propose a two-step debiasing strategy, wherein the feature extractor undergoes initial fine-tuning on the identified bias-influenced weights, succeeded by a fine-tuning phase on a reinitialised classification layer to uphold discriminative performance. Extensive experiments across four dermatological datasets and two sensitive attributes demonstrate that BMFT outperforms existing state-of-the-art (SOTA) techniques in both diagnostic accuracy and fairness metrics. Our findings underscore the efficacy and robustness of BMFT in advancing fairness across various out-of-distribution (OOD) settings. Our code is available at: https://github.com/vios-s/BMFT

Updated: 2024-10-01 13:10:40

标题: BMFT：通过基于偏差的权重遮罩微调实现公平性

摘要: 在伦理敏感领域，如医学诊断中，开发具有强大群体公平性属性的模型至关重要。最近在机器学习中实现公平性的方法需要大量的训练数据，并且依赖于模型重新训练，这在实际情况下可能并不实际。为了缓解这些挑战，我们提出了一种名为Bias-based Weight Masking Fine-Tuning (BMFT)的新颖后处理方法，可以在较少的时期内显著增强训练模型的公平性，而无需访问原始训练数据。BMFT会在模型参数上产生一个掩码，有效地识别出对有偏见预测贡献最大的权重。此外，我们提出了一个两步去偏差策略，其中特征提取器在识别到有偏见影响的权重上进行初始微调，然后在重新初始化的分类层上进行微调，以保持辨别性能。通过对四个皮肤科数据集和两个敏感属性的广泛实验表明，BMFT在诊断准确性和公平性指标上胜过现有的最先进技术。我们的研究结果强调了BMFT在促进在各种分布之外 (OOD) 设置中的公平性方面的功效和稳健性。我们的代码可以在以下网址找到：https://github.com/vios-s/BMFT

更新时间: 2024-10-01 13:10:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.06890v2

LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data

Multimodal Deep Learning enhances decision-making by integrating diverse information sources, such as texts, images, audio, and videos. To develop trustworthy multimodal approaches, it is essential to understand how uncertainty impacts these models. We propose LUMA, a unique benchmark dataset, featuring audio, image, and textual data from 50 classes, for learning from uncertain and multimodal data. It extends the well-known CIFAR 10/100 dataset with audio samples extracted from three audio corpora, and text data generated using the Gemma-7B Large Language Model (LLM). The LUMA dataset enables the controlled injection of varying types and degrees of uncertainty to achieve and tailor specific experiments and benchmarking initiatives. LUMA is also available as a Python package including the functions for generating multiple variants of the dataset with controlling the diversity of the data, the amount of noise for each modality, and adding out-of-distribution samples. A baseline pre-trained model is also provided alongside three uncertainty quantification methods: Monte-Carlo Dropout, Deep Ensemble, and Reliable Conflictive Multi-View Learning. This comprehensive dataset and its benchmarking tools are intended to promote and support the development, evaluation, and benchmarking of trustworthy and robust multimodal deep learning approaches. We anticipate that the LUMA dataset will help the ICLR community to design more trustworthy and robust machine learning approaches for safety critical applications.

Updated: 2024-10-01 13:07:02

标题: LUMA：一份用于学习不确定和多模态数据的基准数据集

摘要: 多模态深度学习通过整合各种信息源（如文本、图像、音频和视频）增强了决策能力。为了开发可信赖的多模态方法，了解不确定性如何影响这些模型至关重要。我们提出了LUMA，一个独特的基准数据集，其中包含来自50个类别的音频、图像和文本数据，用于从不确定和多模态数据中学习。它通过从三个音频语料库提取音频样本和使用Gemma-7B大型语言模型（LLM）生成文本数据，扩展了著名的CIFAR 10/100数据集。LUMA数据集使得可以控制注入不同类型和程度的不确定性，以实现和定制特定的实验和基准倡议。LUMA还作为一个Python包提供，包括用于生成数据集的多个变体的函数，控制数据的多样性、每种模态的噪声量以及添加超出分布范围的样本。此外，还提供了一个基准预训练模型，以及三种不确定性量化方法：蒙特卡罗辍学、深度集合和可靠冲突多视图学习。这个全面的数据集及其基准工具旨在促进和支持可信赖和健壮的多模态深度学习方法的开发、评估和基准化。我们预计，LUMA数据集将帮助ICLR社区设计更可信赖和健壮的机器学习方法，用于安全关键应用。

更新时间: 2024-10-01 13:07:02

领域: cs.LG,cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2406.09864v2

On the Maximum Distance Sublattice Problem and Closest Vector Problem

In this paper, we introduce the Maximum Distance Sublattice Problem (MDSP). We observed that the problem of solving an instance of the Closest Vector Problem (CVP) in a lattice $\mathcal{L}$ is the same as solving an instance of MDSP in the dual lattice of $\mathcal{L}$. We give an alternate reduction between the CVP and MDSP. This alternate reduction does not use the concept of dual lattice.

Updated: 2024-10-01 13:03:59

标题: 关于最大距离子格问题和最近向量问题

摘要: 在本文中，我们介绍了最大距离子晶格问题（MDSP）。我们观察到，在一个晶格$\mathcal{L}$中解决最近向量问题（CVP）的问题与在$\mathcal{L}$的对偶晶格中解决MDSP的问题是相同的。我们给出了CVP和MDSP之间的另一种减少关系。这种替代性减少不使用对偶晶格的概念。

更新时间: 2024-10-01 13:03:59

领域: cs.CC,cs.CR,cs.DS

下载: http://arxiv.org/abs/1811.03019v2

LASMP: Language Aided Subset Sampling Based Motion Planner

This paper presents the Language Aided Subset Sampling Based Motion Planner (LASMP), a system that helps mobile robots plan their movements by using natural language instructions. LASMP uses a modified version of the Rapidly Exploring Random Tree (RRT) method, which is guided by user-provided commands processed through a language model (RoBERTa). The system improves efficiency by focusing on specific areas of the robot's workspace based on these instructions, making it faster and less resource-intensive. Compared to traditional RRT methods, LASMP reduces the number of nodes needed by 55% and cuts random sample queries by 80%, while still generating safe, collision-free paths. Tested in both simulated and real-world environments, LASMP has shown better performance in handling complex indoor scenarios. The results highlight the potential of combining language processing with motion planning to make robot navigation more efficient.

Updated: 2024-10-01 13:03:15

标题: LASMP：基于语言辅助子集抽样的运动规划器

摘要: 本文介绍了基于语言辅助的子集抽样运动规划器（LASMP），这是一个帮助移动机器人通过使用自然语言指令来规划移动的系统。LASMP使用了经过修改的快速探索随机树（RRT）方法，该方法受用户提供的命令引导，通过语言模型（RoBERTa）进行处理。该系统通过根据这些指令集中关注机器人工作空间的特定区域来提高效率，从而使其更快速、资源消耗更少。与传统的RRT方法相比，LASMP减少了所需节点数量的55%，减少了80%的随机样本查询，同时仍能生成安全、无碰撞的路径。在模拟和真实环境中进行测试，LASMP在处理复杂的室内场景方面表现出更好的性能。结果突显了将语言处理与运动规划相结合以使机器人导航更加高效的潜力。

更新时间: 2024-10-01 13:03:15

领域: cs.RO,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2410.00649v1

Exploring Utility in a Real-World Warehouse Optimization Problem: Formulation Based on Quantum Annealers and Preliminary Results

In the current NISQ-era, one of the major challenges faced by researchers and practitioners lies in figuring out how to combine quantum and classical computing in the most efficient and innovative way. In this paper, we present a mechanism coined as Quantum Initialization for Warehouse Optimization Problem that resorts to D-Wave's Quantum Annealer. The module has been specifically designed to be embedded into already existing classical software dedicated to the optimization of a real-world industrial problem. We preliminary tested the implemented mechanism through a two-phase experiment against the classical version of the software.

Updated: 2024-10-01 13:02:24

标题: 在一个真实仓库优化问题中探索效用：基于量子退火器的公式化和初步结果

摘要: 在当前的NISQ时代，研究人员和实践者面临的主要挑战之一是如何以最有效和创新的方式将量子计算和经典计算结合起来。在本文中，我们提出了一种被称为仓库优化问题的量子初始化机制，该机制利用了D-Wave的量子退火器。该模块专门设计用于嵌入到已经存在的经典软件中，以优化一个现实世界的工业问题。我们通过两阶段实验对实施的机制进行了初步测试，与软件的经典版本进行比较。

更新时间: 2024-10-01 13:02:24

领域: cs.ET,cs.AI

下载: http://arxiv.org/abs/2409.09706v2

Backdoor Attacks for LLMs with Weak-To-Strong Knowledge Distillation

Despite being widely applied due to their exceptional capabilities, Large Language Models (LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce targeted vulnerabilities into LLMs by poisoning training samples and full-parameter fine-tuning. However, this kind of backdoor attack is limited since they require significant computational resources, especially as the size of LLMs increases. Besides, parameter-efficient fine-tuning (PEFT) offers an alternative but the restricted parameter updating may impede the alignment of triggers with target labels. In this study, we first verify that backdoor attacks with PEFT may encounter challenges in achieving feasible performance. To address these issues and improve the effectiveness of backdoor attacks with PEFT, we propose a novel backdoor attack algorithm from weak to strong based on feature alignment-enhanced knowledge distillation (W2SAttack). Specifically, we poison small-scale language models through full-parameter fine-tuning to serve as the teacher model. The teacher model then covertly transfers the backdoor to the large-scale student model through feature alignment-enhanced knowledge distillation, which employs PEFT. Theoretical analysis reveals that W2SAttack has the potential to augment the effectiveness of backdoor attacks. We demonstrate the superior performance of W2SAttack on classification tasks across four language models, four backdoor attack algorithms, and two different architectures of teacher models. Experimental results indicate success rates close to 100% for backdoor attacks targeting PEFT.

Updated: 2024-10-01 13:01:40

标题: 使用弱到强知识蒸馏的LLM后门攻击

摘要: 尽管由于其出色的能力而被广泛应用，但大型语言模型（LLMs）已被证明容易受到后门攻击的影响。这些攻击通过在LLMs中植入有针对性的漏洞，通过污染训练样本和完整参数微调来实现。然而，这种后门攻击是有限的，因为它们需要大量的计算资源，特别是随着LLMs的规模增加。此外，参数高效微调（PEFT）提供了一种替代方法，但是受限的参数更新可能会阻碍触发器与目标标签的对齐。在这项研究中，我们首先验证了使用PEFT的后门攻击可能在实现可行性表现方面遇到挑战。为了解决这些问题并提高使用PEFT的后门攻击的效果，我们提出了一种基于特征对齐增强知识蒸馏的从弱到强的后门攻击算法（W2SAttack）。具体来说，我们通过完整参数微调对小规模语言模型进行污染，以作为教师模型。教师模型然后通过特征对齐增强知识蒸馏将后门悄悄转移到大规模学生模型中，该方法采用PEFT。理论分析表明，W2SAttack有潜力增强后门攻击的效果。我们展示了W2SAttack在四种语言模型、四种后门攻击算法和两种不同教师模型架构的分类任务中的优越性能。实验结果表明，针对PEFT的后门攻击成功率接近100%。

更新时间: 2024-10-01 13:01:40

领域: cs.CR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.17946v2

SDC-HSDD-NDSA: Structure Detecting Cluster by Hierarchical Secondary Directed Differential with Normalized Density and Self-Adaption

Density-based clustering could be the most popular clustering algorithm since it can identify clusters of arbitrary shape as long as they are separated by low-density regions. However, a high-density region that is not separated by low-density ones might also have different structures belonging to multiple clusters. As far as we know, all previous density-based clustering algorithms fail to detect such structures. In this paper, we provide a novel density-based clustering scheme that can not only detect clusters separated by low-density regions but also detect structures in high-density regions not separated by low-density ones. The algorithm employs secondary directed differential, hierarchy, normalized density, as well as the self-adaption coefficient, and thus is called Structure Detecting Cluster by Hierarchical Secondary Directed Differential with Normalized Density and Self-Adaption, dubbed by SDC-HSDD-NDSA. The algorithm is run on several datasets to verify its effectiveness, robustness, as well as granularity independence, and results demonstrate that it has the ability that previous ones do not have. The Python code is on https://github.com/Hao-B-Shu/SDC-HSDD-NDSA.

Updated: 2024-10-01 12:45:01

标题: SDC-HSDD-NDSA：具有标准化密度和自适应性的分层次次级定向差异结构检测聚类

摘要: 基于密度的聚类可能是最受欢迎的聚类算法，因为它可以识别任意形状的簇，只要它们被低密度区域分隔开。然而，一个未被低密度区域分隔的高密度区域可能也具有属于多个簇的不同结构。据我们所知，所有先前的基于密度的聚类算法都无法检测这种结构。在本文中，我们提供了一种新颖的基于密度的聚类方案，不仅可以检测被低密度区域分隔开的簇，还可以检测未被低密度区域分隔开的高密度区域中的结构。该算法采用二次定向差分、层次结构、归一化密度以及自适应系数，因此被称为基于层次二次定向差分及归一化密度和自适应的结构检测聚类算法（SDC-HSDD-NDSA）。该算法在多个数据集上运行，验证了其有效性、鲁棒性以及粒度独立性，并结果表明其具有先前算法所没有的能力。Python 代码可以在 https://github.com/Hao-B-Shu/SDC-HSDD-NDSA 找到。

更新时间: 2024-10-01 12:45:01

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2307.00677v3

On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability

Recent advancements in Large Language Models (LLMs) have showcased their ability to perform complex reasoning tasks, but their effectiveness in planning remains underexplored. In this study, we evaluate the planning capabilities of OpenAI's o1 models across a variety of benchmark tasks, focusing on three key aspects: feasibility, optimality, and generalizability. Through empirical evaluations on constraint-heavy tasks (e.g., $\textit{Barman}$, $\textit{Tyreworld}$) and spatially complex environments (e.g., $\textit{Termes}$, $\textit{Floortile}$), we highlight o1-preview's strengths in self-evaluation and constraint-following, while also identifying bottlenecks in decision-making and memory management, particularly in tasks requiring robust spatial reasoning. Our results reveal that o1-preview outperforms GPT-4 in adhering to task constraints and managing state transitions in structured environments. However, the model often generates suboptimal solutions with redundant actions and struggles to generalize effectively in spatially complex tasks. This pilot study provides foundational insights into the planning limitations of LLMs, offering key directions for future research on improving memory management, decision-making, and generalization in LLM-based planning.

Updated: 2024-10-01 12:43:09

标题: 关于OpenAI的o1模型的规划能力：可行性、最优性和泛化性

摘要: 最近大型语言模型（LLMs）的最新进展展示了它们在执行复杂推理任务方面的能力，但它们在规划方面的有效性仍未得到充分探讨。在本研究中，我们评估了OpenAI的o1模型在各种基准任务中的规划能力，重点关注三个关键方面：可行性、最优性和泛化能力。通过对约束重的任务（例如$\textit{Barman}$、$\textit{Tyreworld}$）和空间复杂环境（例如$\textit{Termes}$、$\textit{Floortile}$）的实证评估，我们突出了o1-preview在自我评估和遵循约束方面的优势，同时也确定了在需要强大空间推理的任务中决策和内存管理的瓶颈。我们的结果显示，o1-preview在遵守任务约束和在结构化环境中管理状态转换方面优于GPT-4。然而，该模型经常生成具有冗余动作的次优解，并在空间复杂任务中难以有效泛化。这项初步研究为LLMs的规划限制提供了基础见解，为未来在改进基于LLM的规划中的内存管理、决策制定和泛化方面的研究提供了关键方向。

更新时间: 2024-10-01 12:43:09

领域: cs.AI,cs.LG,cs.RO

下载: http://arxiv.org/abs/2409.19924v2

Model-independent variable selection via the rule-based variable priority

While achieving high prediction accuracy is a fundamental goal in machine learning, an equally important task is finding a small number of features with high explanatory power. One popular selection technique is permutation importance, which assesses a variable's impact by measuring the change in prediction error after permuting the variable. However, this can be problematic due to the need to create artificial data, a problem shared by other methods as well. Another problem is that variable selection methods can be limited by being model-specific. We introduce a new model-independent approach, Variable Priority (VarPro), which works by utilizing rules without the need to generate artificial data or evaluate prediction error. The method is relatively easy to use, requiring only the calculation of sample averages of simple statistics, and can be applied to many data settings, including regression, classification, and survival. We investigate the asymptotic properties of VarPro and show, among other things, that VarPro has a consistent filtering property for noise variables. Empirical studies using synthetic and real-world data show the method achieves a balanced performance and compares favorably to many state-of-the-art procedures currently used for variable selection.

Updated: 2024-10-01 12:42:24

标题: 基于规则的变量优先级模型无关变量选择

摘要: 在机器学习中，高预测准确性是一个基本目标，同样重要的任务是找到具有高解释力的少数特征。一种流行的选择技术是排列重要性，通过在对变量进行置换后测量预测误差的变化来评估变量的影响。然而，这可能存在问题，因为需要创建人工数据，其他方法也存在这个问题。另一个问题是变量选择方法可能受限于特定模型。我们引入了一种新的模型无关方法，Variable Priority（VarPro），通过利用规则而无需生成人工数据或评估预测误差来工作。该方法相对易于使用，只需要计算简单统计量的样本平均值，并可应用于许多数据设置，包括回归、分类和生存。我们研究了VarPro的渐近性质，并展示了VarPro对噪声变量具有一致的过滤特性等结果。使用合成和真实世界数据的实证研究表明，该方法实现了平衡的性能，并与目前用于变量选择的许多最先进的程序相比表现优越。

更新时间: 2024-10-01 12:42:24

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2409.09003v3

Statistical signatures of abstraction in deep neural networks

We study how abstract representations emerge in a Deep Belief Network (DBN) trained on benchmark datasets. Our analysis targets the principles of learning in the early stages of information processing, starting from the "primordial soup" of the under-sampling regime. As the data is processed by deeper and deeper layers, features are detected and removed, transferring more and more "context-invariant" information to deeper layers. We show that the representation approaches an universal model -- the Hierarchical Feature Model (HFM) -- determined by the principle of maximal relevance. Relevance quantifies the uncertainty on the model of the data, thus suggesting that "meaning" -- i.e. syntactic information -- is that part of the data which is not yet captured by a model. Our analysis shows that shallow layers are well described by pairwise Ising models, which provide a representation of the data in terms of generic, low order features. We also show that plasticity increases with depth, in a similar way as it does in the brain. These findings suggest that DBNs are capable of extracting a hierarchy of features from the data which is consistent with the principle of maximal relevance.

Updated: 2024-10-01 12:39:15

标题: 深度神经网络中抽象化的统计特征

摘要: 我们研究了在基准数据集上训练的深度信念网络（DBN）中抽象表示是如何产生的。我们的分析着眼于信息处理早期阶段的学习原则，从欠采样制度的“原始汤”开始。随着数据被更深层次处理，特征被检测并移除，将更多“上下文不变”的信息传递给更深层次。我们展示了表示逼近了一种通用模型——分层特征模型（HFM），由最大相关性原则确定。相关性量化了数据模型的不确定性，因此表明“意义”——即句法信息——是尚未被模型捕捉的数据部分。我们的分析表明，浅层由成对的伊辛模型描述得很好，这些模型以通用的、低阶特征来表示数据。我们还展示了可塑性随深度增加而增加，类似于大脑的情况。这些发现表明，DBN能够从数据中提取一层层特征的层次结构，与最大相关性原则一致。

更新时间: 2024-10-01 12:39:15

领域: cs.LG,cond-mat.dis-nn,physics.data-an,stat.ML

下载: http://arxiv.org/abs/2407.01656v2

Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning

In-context learning, a paradigm bridging the gap between pre-training and fine-tuning, has demonstrated high efficacy in several NLP tasks, especially in few-shot settings. Despite being widely applied, in-context learning is vulnerable to malicious attacks. In this work, we raise security concerns regarding this paradigm. Our studies demonstrate that an attacker can manipulate the behavior of large language models by poisoning the demonstration context, without the need for fine-tuning the model. Specifically, we design a new backdoor attack method, named ICLAttack, to target large language models based on in-context learning. Our method encompasses two types of attacks: poisoning demonstration examples and poisoning demonstration prompts, which can make models behave in alignment with predefined intentions. ICLAttack does not require additional fine-tuning to implant a backdoor, thus preserving the model's generality. Furthermore, the poisoned examples are correctly labeled, enhancing the natural stealth of our attack method. Extensive experimental results across several language models, ranging in size from 1.3B to 180B parameters, demonstrate the effectiveness of our attack method, exemplified by a high average attack success rate of 95.0% across the three datasets on OPT models.

Updated: 2024-10-01 12:38:03

标题: 大型语言模型的通用漏洞：针对上下文学习的后门攻击

摘要: 在上下文学习中，这一范式弥合了预训练和微调之间的差距，在几个NLP任务中表现出高效性，特别是在少样本设置中。尽管被广泛应用，但上下文学习容易受到恶意攻击。在本研究中，我们提出了关于这一范式的安全问题。我们的研究表明，攻击者可以通过操纵演示上下文来影响大型语言模型的行为，而无需微调模型。具体来说，我们设计了一种新的后门攻击方法，名为ICLAttack，针对基于上下文学习的大型语言模型。我们的方法包括两种攻击类型：毒化演示示例和毒化演示提示，这可以使模型按照预定义的意图行事。ICLAttack不需要额外的微调来植入后门，从而保留了模型的普适性。此外，毒化的示例被正确标记，增强了我们攻击方法的自然隐蔽性。跨越了几个语言模型的广泛实验结果，这些模型的参数规模从13亿到180亿不等，展示了我们攻击方法的有效性，以OPT模型上三个数据集的高平均攻击成功率为例，达到了95.0%。

更新时间: 2024-10-01 12:38:03

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2401.05949v5

Measuring Orthogonality in Representations of Generative Models

In unsupervised representation learning, models aim to distill essential features from high-dimensional data into lower-dimensional learned representations, guided by inductive biases. Understanding the characteristics that make a good representation remains a topic of ongoing research. Disentanglement of independent generative processes has long been credited with producing high-quality representations. However, focusing solely on representations that adhere to the stringent requirements of most disentanglement metrics, may result in overlooking many high-quality representations, well suited for various downstream tasks. These metrics often demand that generative factors be encoded in distinct, single dimensions aligned with the canonical basis of the representation space. Motivated by these observations, we propose two novel metrics: Importance-Weighted Orthogonality (IWO) and Importance-Weighted Rank (IWR). These metrics evaluate the mutual orthogonality and rank of generative factor subspaces. Throughout extensive experiments on common downstream tasks, over several benchmark datasets and models, IWO and IWR consistently show stronger correlations with downstream task performance than traditional disentanglement metrics. Our findings suggest that representation quality is closer related to the orthogonality of independent generative processes rather than their disentanglement, offering a new direction for evaluating and improving unsupervised learning models.

Updated: 2024-10-01 12:26:24

标题: 衡量生成模型表示中的正交性

摘要: 在无监督表示学习中，模型旨在通过感性偏差，将高维数据中的关键特征提炼成较低维度的学习表示。理解良好表示的特征仍然是一个持续研究的话题。解开独立生成过程的纷乱因素长期以来被认为能产生高质量的表示。然而，仅关注符合大多数解缠度指标严格要求的表示，可能会忽视许多适合各种下游任务的高质量表示。这些指标经常要求生成因素被编码在与表示空间的规范基对齐的不同的单个维度中。在这些观察的基础上，我们提出了两个新颖的指标：重要性加权正交性（IWO）和重要性加权秩（IWR）。这些指标评估了生成因子子空间的互相正交性和秩。通过对常见下游任务、多个基准数据集和模型进行广泛实验，IWO和IWR一贯展现出比传统解缠度指标更强的与下游任务性能相关性。我们的研究结果表明，表示质量更接近于独立生成过程的正交性，而不是它们的解缠度，为评估和改进无监督学习模型提供了新的方向。

更新时间: 2024-10-01 12:26:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.03728v2

Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Captures

Volumetric modeling and neural radiance field representations have revolutionized 3D face capture and photorealistic novel view synthesis. However, these methods often require hundreds of multi-view input images and are thus inapplicable to cases with less than a handful of inputs. We present a novel volumetric prior on human faces that allows for high-fidelity expressive face modeling from as few as three input views captured in the wild. Our key insight is that an implicit prior trained on synthetic data alone can generalize to extremely challenging real-world identities and expressions and render novel views with fine idiosyncratic details like wrinkles and eyelashes. We leverage a 3D Morphable Face Model to synthesize a large training set, rendering each identity with different expressions, hair, clothing, and other assets. We then train a conditional Neural Radiance Field prior on this synthetic dataset and, at inference time, fine-tune the model on a very sparse set of real images of a single subject. On average, the fine-tuning requires only three inputs to cross the synthetic-to-real domain gap. The resulting personalized 3D model reconstructs strong idiosyncratic facial expressions and outperforms the state-of-the-art in high-quality novel view synthesis of faces from sparse inputs in terms of perceptual and photo-metric quality.

Updated: 2024-10-01 12:24:50

标题: Cafca：来自偶然少拍摄的表情面孔的高质量新视图合成

摘要: 体积建模和神经辐射场表示已经彻底改变了3D人脸捕捉和逼真的新视角合成。然而，这些方法通常需要数百张多视角输入图像，因此无法应用于少量输入的情况。我们提出了一种新颖的人脸体积先验，允许从野外捕捉的仅三个输入视角实现高保真度的表现力人脸建模。我们的关键见解是，仅在合成数据上训练的隐式先验可以推广到极具挑战性的真实世界身份和表情，并呈现出细微的特征细节，如皱纹和睫毛。我们利用3D可塑人脸模型合成一个大型训练集，为每个身份呈现不同的表情、发型、服装和其他资产。然后，在这个合成数据集上训练一个条件神经辐射场先验，并在推断时间，对单个主题的非常稀疏的真实图像集进行微调。平均而言，微调只需要三个输入来跨越从合成到真实的领域差距。结果，个性化的3D模型重建了强烈的个性化面部表情，并在感知和光度质量方面优于高质量新视角合成中的最新技术。

更新时间: 2024-10-01 12:24:50

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.00630v1

BlendScape: Enabling End-User Customization of Video-Conferencing Environments through Generative AI

Today's video-conferencing tools support a rich range of professional and social activities, but their generic meeting environments cannot be dynamically adapted to align with distributed collaborators' needs. To enable end-user customization, we developed BlendScape, a rendering and composition system for video-conferencing participants to tailor environments to their meeting context by leveraging AI image generation techniques. BlendScape supports flexible representations of task spaces by blending users' physical or digital backgrounds into unified environments and implements multimodal interaction techniques to steer the generation. Through an exploratory study with 15 end-users, we investigated whether and how they would find value in using generative AI to customize video-conferencing environments. Participants envisioned using a system like BlendScape to facilitate collaborative activities in the future, but required further controls to mitigate distracting or unrealistic visual elements. We implemented scenarios to demonstrate BlendScape's expressiveness for supporting environment design strategies from prior work and propose composition techniques to improve the quality of environments.

Updated: 2024-10-01 12:07:57

标题: 混合景观：通过生成人工智能实现视频会议环境的最终用户定制

摘要: 当今的视频会议工具支持丰富的专业和社交活动，但它们的通用会议环境无法动态地适应分布式合作者的需求。为了实现最终用户定制，我们开发了BlendScape，这是一个渲染和合成系统，供视频会议参与者利用AI图像生成技术调整环境以适应他们的会议背景。BlendScape通过将用户的物理或数字背景融合到统一环境中，支持任务空间的灵活表示，并实施多模态交互技术来引导生成过程。通过与15名最终用户的探索性研究，我们调查了他们是否以及如何在使用生成式AI来定制视频会议环境中发现价值。参与者设想使用像BlendScape这样的系统来促进未来的协作活动，但需要进一步的控制来减轻分散注意力或不切实际的视觉元素。我们实施了场景来展示BlendScape支持环境设计策略的表现力，并提出合成技术来改善环境的质量。

更新时间: 2024-10-01 12:07:57

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2403.13947v2

Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

Mamba and Vision Mamba (Vim) models have shown their potential as an alternative to methods based on Transformer architecture. This work introduces Fast Mamba for Vision (Famba-V), a cross-layer token fusion technique to enhance the training efficiency of Vim models. The key idea of Famba-V is to identify and fuse similar tokens across different Vim layers based on a suit of cross-layer strategies instead of simply applying token fusion uniformly across all the layers that existing works propose. We evaluate the performance of Famba-V on CIFAR-100. Our results show that Famba-V is able to enhance the training efficiency of Vim models by reducing both training time and peak memory usage during training. Moreover, the proposed cross-layer strategies allow Famba-V to deliver superior accuracy-efficiency trade-offs. These results all together demonstrate Famba-V as a promising efficiency enhancement technique for Vim models.

Updated: 2024-10-01 12:03:49

标题: Famba-V: 基于跨层令牌融合的快速视觉曼巴

摘要: Mamba 和 Vision Mamba (Vim) 模型已经展示出作为基于 Transformer 架构的方法的替代潜力。本文介绍了快速 Mamba for Vision (Famba-V)，一种跨层令牌融合技术，以增强 Vim 模型的训练效率。Famba-V 的关键思想是基于一套跨层策略识别和融合不同 Vim 层中相似的令牌，而不是简单地在所有层中均匀应用令牌融合，这是现有工作提出的。我们在 CIFAR-100 上评估了 Famba-V 的性能。我们的结果显示，Famba-V 能够通过减少训练时间和训练期间的峰值内存使用量来增强 Vim 模型的训练效率。此外，所提出的跨层策略使 Famba-V 能够提供更优越的准确性和效率权衡。所有这些结果共同证明，Famba-V 是一种有前途的 Vim 模型效率增强技术。

更新时间: 2024-10-01 12:03:49

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.09808v2

Is Tokenization Needed for Masked Particle Modelling?

In this work, we significantly enhance masked particle modeling (MPM), a self-supervised learning scheme for constructing highly expressive representations of unordered sets relevant to developing foundation models for high-energy physics. In MPM, a model is trained to recover the missing elements of a set, a learning objective that requires no labels and can be applied directly to experimental data. We achieve significant performance improvements over previous work on MPM by addressing inefficiencies in the implementation and incorporating a more powerful decoder. We compare several pre-training tasks and introduce new reconstruction methods that utilize conditional generative models without data tokenization or discretization. We show that these new methods outperform the tokenized learning objective from the original MPM on a new test bed for foundation models for jets, which includes using a wide variety of downstream tasks relevant to jet physics, such as classification, secondary vertex finding, and track identification.

Updated: 2024-10-01 11:40:11

标题: 需要为掩模粒子建模进行标记化吗？

摘要: 在这项工作中，我们显著增强了掩模粒子建模（MPM），这是一种自监督学习方案，用于构建与高能物理基础模型相关的无序集合的高度表达式表示。在MPM中，模型被训练来恢复集合中缺失的元素，这是一个不需要标签并且可以直接应用于实验数据的学习目标。通过解决实现中的低效率问题并整合更强大的解码器，我们在MPM的先前工作基础上取得了显著的性能改进。我们比较了几个预训练任务，并引入了利用条件生成模型的新重建方法，而不需要数据标记化或离散化。我们展示了这些新方法在关于喷流基础模型的新测试平台上优于原始MPM的标记化学习目标，该平台包括使用与喷流物理相关的各种下游任务，如分类、次级顶点查找和轨道识别。

更新时间: 2024-10-01 11:40:11

领域: hep-ph,cs.LG

下载: http://arxiv.org/abs/2409.12589v2

Towards Symbolic XAI -- Explanation Through Human Understandable Logical Relationships Between Features

Explainable Artificial Intelligence (XAI) plays a crucial role in fostering transparency and trust in AI systems, where traditional XAI approaches typically offer one level of abstraction for explanations, often in the form of heatmaps highlighting single or multiple input features. However, we ask whether abstract reasoning or problem-solving strategies of a model may also be relevant, as these align more closely with how humans approach solutions to problems. We propose a framework, called Symbolic XAI, that attributes relevance to symbolic queries expressing logical relationships between input features, thereby capturing the abstract reasoning behind a model's predictions. The methodology is built upon a simple yet general multi-order decomposition of model predictions. This decomposition can be specified using higher-order propagation-based relevance methods, such as GNN-LRP, or perturbation-based explanation methods commonly used in XAI. The effectiveness of our framework is demonstrated in the domains of natural language processing (NLP), vision, and quantum chemistry (QC), where abstract symbolic domain knowledge is abundant and of significant interest to users. The Symbolic XAI framework provides an understanding of the model's decision-making process that is both flexible for customization by the user and human-readable through logical formulas.

Updated: 2024-10-01 11:35:49

标题: 朝向符号XAI - 通过特征之间人类可理解的逻辑关系解释

摘要: 可解释人工智能（XAI）在促进人工智能系统的透明度和信任方面发挥着至关重要的作用，传统的XAI方法通常提供解释的一个抽象层次，通常以突出单个或多个输入特征的热图的形式呈现。然而，我们是否可以询问模型的抽象推理或问题解决策略是否也相关，因为这些更符合人类解决问题的方式。我们提出了一种称为符号XAI的框架，该框架将属性归因于表达输入特征之间逻辑关系的符号查询，从而捕捉模型预测背后的抽象推理。该方法建立在模型预测的简单但通用的多阶分解之上。该分解可以使用基于高阶传播的相关性方法（如GNN-LRP）或常用于XAI的扰动解释方法来指定。我们的框架在自然语言处理（NLP）、视觉和量子化学（QC）领域展示了其有效性，其中丰富的抽象符号领域知识对用户具有重要意义。符号XAI框架提供了对模型决策过程的理解，既可由用户自定义，又可通过逻辑公式以人类可读的方式呈现。

更新时间: 2024-10-01 11:35:49

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.17198v2

FLeNS: Federated Learning with Enhanced Nesterov-Newton Sketch

Federated learning faces a critical challenge in balancing communication efficiency with rapid convergence, especially for second-order methods. While Newton-type algorithms achieve linear convergence in communication rounds, transmitting full Hessian matrices is often impractical due to quadratic complexity. We introduce Federated Learning with Enhanced Nesterov-Newton Sketch (FLeNS), a novel method that harnesses both the acceleration capabilities of Nesterov's method and the dimensionality reduction benefits of Hessian sketching. FLeNS approximates the centralized Newton's method without relying on the exact Hessian, significantly reducing communication overhead. By combining Nesterov's acceleration with adaptive Hessian sketching, FLeNS preserves crucial second-order information while preserving the rapid convergence characteristics. Our theoretical analysis, grounded in statistical learning, demonstrates that FLeNS achieves super-linear convergence rates in communication rounds - a notable advancement in federated optimization. We provide rigorous convergence guarantees and characterize tradeoffs between acceleration, sketch size, and convergence speed. Extensive empirical evaluation validates our theoretical findings, showcasing FLeNS's state-of-the-art performance with reduced communication requirements, particularly in privacy-sensitive and edge-computing scenarios. The code is available at https://github.com/sunnyinAI/FLeNS

Updated: 2024-10-01 11:20:53

标题: FLeNS：具有增强Nesterov-Newton Sketch的联邦学习

摘要: 联邦学习在平衡通信效率和快速收敛方面面临着一个关键挑战，特别是对于二阶方法。虽然牛顿型算法在通信轮数上实现了线性收敛，但由于二次复杂度，传输完整的Hessian矩阵通常是不切实际的。我们引入了具有增强Nesterov-Newton Sketch（FLeNS）的联邦学习方法，这是一种利用Nesterov方法的加速能力和Hessian sketching的降维优势的新方法。FLeNS在不依赖于精确Hessian的情况下近似中心化Newton方法，显著减少了通信开销。通过将Nesterov的加速与自适应Hessian sketching结合起来，FLeNS保留了关键的二阶信息，同时保持了快速收敛特性。我们的理论分析基于统计学习，证明了FLeNS在通信轮数上实现了超线性收敛速度-这是在联邦优化中的一个显著进步。我们提供了严格的收敛保证，并表征了加速、图形大小和收敛速度之间的权衡。广泛的实证评估验证了我们的理论发现，展示了FLeNS在减少通信需求方面的最新性能，特别是在隐私敏感和边缘计算场景中。代码可在https://github.com/sunnyinAI/FLeNS 上找到。

更新时间: 2024-10-01 11:20:53

领域: cs.LG,cs.CV,cs.DC,math.OC,I.2.6; C.1.4; D.1.3; I.5.1; H.3.4

下载: http://arxiv.org/abs/2409.15216v2

GERA: Geometric Embedding for Efficient Point Registration Analysis

Point cloud registration aims to provide estimated transformations to align point clouds, which plays a crucial role in pose estimation of various navigation systems, such as surgical guidance systems and autonomous vehicles. Despite the impressive performance of recent models on benchmark datasets, many rely on complex modules like KPConv and Transformers, which impose significant computational and memory demands. These requirements hinder their practical application, particularly in resource-constrained environments such as mobile robotics. In this paper, we propose a novel point cloud registration network that leverages a pure MLP architecture, constructing geometric information offline. This approach eliminates the computational and memory burdens associated with traditional complex feature extractors and significantly reduces inference time and resource consumption. Our method is the first to replace 3D coordinate inputs with offline-constructed geometric encoding, improving generalization and stability, as demonstrated by Maximum Mean Discrepancy (MMD) comparisons. This efficient and accurate geometric representation marks a significant advancement in point cloud analysis, particularly for applications requiring fast and reliability.

Updated: 2024-10-01 11:19:56

标题: GERA：几何嵌入用于高效的点配准分析

摘要: 点云配准旨在提供估计的变换以对齐点云，这在各种导航系统的姿势估计中扮演着至关重要的角色，例如外科引导系统和自主车辆。尽管最近模型在基准数据集上表现出色，但许多模型仰赖复杂的模块，如KPConv和Transformers，这些模块对计算和内存需求较高。这些要求阻碍了它们在实际应用中的应用，特别是在资源受限的环境中，如移动机器人技术。在本文中，我们提出了一种新颖的点云配准网络，利用纯MLP架构，离线构建几何信息。这种方法消除了与传统复杂特征提取器相关的计算和内存负担，并显著减少了推理时间和资源消耗。我们的方法是首个将3D坐标输入替换为离线构建的几何编码的方法，通过最大均值差异（MMD）比较证明了其改善了泛化性和稳定性。这种高效和准确的几何表示标志着点云分析的重要进展，特别适用于需要快速和可靠性的应用。

更新时间: 2024-10-01 11:19:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.00589v1

CompassDock: Comprehensive Accurate Assessment Approach for Deep Learning-Based Molecular Docking in Inference and Fine-Tuning

Datasets used for molecular docking, such as PDBBind, contain technical variability - they are noisy. Although the origins of the noise have been discussed, a comprehensive analysis of the physical, chemical, and bioactivity characteristics of the datasets is still lacking. To address this gap, we introduce the Comprehensive Accurate Assessment (Compass). Compass integrates two key components: PoseCheck, which examines ligand strain energy, protein-ligand steric clashes, and interactions, and AA-Score, a new empirical scoring function for calculating binding affinity energy. Together, these form a unified workflow that assesses both the physical/chemical properties and bioactivity favorability of ligands and protein-ligand interactions. Our analysis of the PDBBind dataset using Compass reveals substantial noise in the ground truth data. Additionally, we propose CompassDock, which incorporates the Compass module with DiffDock, the state-of-the-art deep learning-based molecular docking method, to enable accurate assessment of docked ligands during inference. Finally, we present a new paradigm for enhancing molecular docking model performance by fine-tuning with Compass Scores, which encompass binding affinity energy, strain energy, and the number of steric clashes identified by Compass. Our results show that, while fine-tuning without Compass improves the percentage of docked poses with RMSD < 2{\AA}, it leads to a decrease in physical/chemical and bioactivity favorability. In contrast, fine-tuning with Compass shows a limited improvement in RMSD < 2{\AA} but enhances the physical/chemical and bioactivity favorability of the ligand conformation. The source code is available publicly at https://github.com/BIMSBbioinfo/CompassDock.

Updated: 2024-10-01 11:14:40

标题: CompassDock: 基于深度学习的分子对接推断和微调的全面准确评估方法

摘要: 用于分子对接的数据集，例如PDBBind，包含技术变异性 - 它们存在噪音。尽管噪音的来源已经讨论过，但对数据集的物理、化学和生物活性特性的全面分析仍然缺乏。为了填补这一空白，我们引入了Comprehensive Accurate Assessment（Compass）。Compass集成了两个关键组件：PoseCheck，用于检查配体应变能、蛋白质-配体的立体位阻和相互作用，以及AA-Score，一种用于计算结合亲和能量的新的经验评分函数。这两者共同形成了一个统一的工作流程，评估了配体和蛋白质-配体相互作用的物理/化学特性和生物活性偏好。我们使用Compass对PDBBind数据集进行的分析揭示了地面真实数据中存在大量噪音。此外，我们提出了CompassDock，它将Compass模块与DiffDock结合，后者是基于最先进的深度学习的分子对接方法，以便在推理过程中准确评估对接的配体。最后，我们提出了一种通过使用Compass Scores进行微调来提高分子对接模型性能的新范式，这些分数包括结合亲和能量、应变能以及由Compass识别的立体位阻的数量。我们的结果表明，虽然没有使用Compass进行微调可以提高RMSD < 2Å的对接姿态的百分比，但会导致物理/化学和生物活性偏好的降低。相反，使用Compass进行微调虽然对RMSD < 2Å的改进有限，但增强了配体构象的物理/化学和生物活性偏好。源代码可以公开访问https://github.com/BIMSBbioinfo/CompassDock。

更新时间: 2024-10-01 11:14:40

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2406.06841v2

A Generative Approach to Control Complex Physical Systems

Controlling the evolution of complex physical systems is a fundamental task across science and engineering. Classical techniques suffer from limited applicability or huge computational costs. On the other hand, recent deep learning and reinforcement learning-based approaches often struggle to optimize long-term control sequences under the constraints of system dynamics. In this work, we introduce Diffusion Physical systems Control (DiffPhyCon), a new class of method to address the physical systems control problem. DiffPhyCon excels by simultaneously minimizing both the learned generative energy function and the predefined control objectives across the entire trajectory and control sequence. Thus, it can explore globally and plan near-optimal control sequences. Moreover, we enhance DiffPhyCon with prior reweighting, enabling the discovery of control sequences that significantly deviate from the training distribution. We test our method on three tasks: 1D Burgers' equation, 2D jellyfish movement control, and 2D high-dimensional smoke control, where our generated jellyfish dataset is released as a benchmark for complex physical system control research. Our method outperforms widely applied classical approaches and state-of-the-art deep learning and reinforcement learning methods. Notably, DiffPhyCon unveils an intriguing fast-close-slow-open pattern observed in the jellyfish, aligning with established findings in the field of fluid dynamics. The project website, jellyfish dataset, and code can be found at https://github.com/AI4Science-WestlakeU/diffphycon.

Updated: 2024-10-01 11:10:59

标题: 控制复杂物理系统的生成式方法

摘要: 控制复杂物理系统的演变是跨科学和工程领域的基本任务。传统技术存在适用性有限或计算成本巨大的问题。另一方面，最近基于深度学习和强化学习的方法通常难以在系统动力学约束下优化长期控制序列。在这项工作中，我们介绍了扩散物理系统控制（DiffPhyCon），这是一种解决物理系统控制问题的新方法。DiffPhyCon通过同时最小化学习生成能量函数和整个轨迹和控制序列上的预定义控制目标，表现出色。因此，它可以全局探索并规划接近最优的控制序列。此外，我们通过先验重新加权增强了DiffPhyCon，使其能够发现与训练分布显著偏离的控制序列。我们在三个任务上测试了我们的方法：1D Burgers方程，2D海蜇运动控制和2D高维烟雾控制，我们生成的海蜇数据集被发布为复杂物理系统控制研究的基准。我们的方法胜过广泛应用的传统方法和最先进的深度学习和强化学习方法。值得注意的是，DiffPhyCon揭示了海蜇中观察到的一个有趣的快速关闭-缓慢打开模式，与流体动力学领域的已知发现相一致。项目网站、海蜇数据集和代码可以在https://github.com/AI4Science-WestlakeU/diffphycon找到。

更新时间: 2024-10-01 11:10:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.06494v2

Enhancing Image Classification in Small and Unbalanced Datasets through Synthetic Data Augmentation

Accurate and robust medical image classification is a challenging task, especially in application domains where available annotated datasets are small and present high imbalance between target classes. Considering that data acquisition is not always feasible, especially for underrepresented classes, our approach introduces a novel synthetic augmentation strategy using class-specific Variational Autoencoders (VAEs) and latent space interpolation to improve discrimination capabilities. By generating realistic, varied synthetic data that fills feature space gaps, we address issues of data scarcity and class imbalance. The method presented in this paper relies on the interpolation of latent representations within each class, thus enriching the training set and improving the model's generalizability and diagnostic accuracy. The proposed strategy was tested in a small dataset of 321 images created to train and validate an automatic method for assessing the quality of cleanliness of esophagogastroduodenoscopy images. By combining real and synthetic data, an increase of over 18\% in the accuracy of the most challenging underrepresented class was observed. The proposed strategy not only benefited the underrepresented class but also led to a general improvement in other metrics, including a 6\% increase in global accuracy and precision.

Updated: 2024-10-01 11:08:24

标题: 通过合成数据增强在小型和不平衡数据集中增强图像分类

摘要: 准确且稳健的医学图像分类是一项具有挑战性的任务，特别是在可用的注释数据集较小且目标类之间存在高度不平衡的应用领域。考虑到数据获取并非总是可行，特别是对于少数类别来说，我们的方法引入了一种新颖的合成增强策略，利用特定类别的变分自动编码器（VAEs）和潜在空间插值来提高判别能力。通过生成真实且多样化的合成数据来填补特征空间中的空缺，我们解决了数据稀缺和类别不平衡的问题。本文提出的方法依赖于在每个类别内插入潜在表示，从而丰富训练集并提高模型的泛化能力和诊断准确性。所提出的策略在一个小型数据集上进行了测试，该数据集包含321张图像，用于训练和验证一种用于评估食管胃十二指肠镜图像清洁度的自动方法。通过结合真实数据和合成数据，观察到最具挑战性的少数类别准确度增加了超过18\%。该策略不仅造福于少数类别，还导致其他指标的整体改善，包括全局准确度和精确度增加了6\%。

更新时间: 2024-10-01 11:08:24

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2409.10286v2

Interactive Explainable Anomaly Detection for Industrial Settings

Being able to recognise defects in industrial objects is a key element of quality assurance in production lines. Our research focuses on visual anomaly detection in RGB images. Although Convolutional Neural Networks (CNNs) achieve high accuracies in this task, end users in industrial environments receive the model's decisions without additional explanations. Therefore, it is of interest to enrich the model's outputs with further explanations to increase confidence in the model and speed up anomaly detection. In our work, we focus on (1) CNN-based classification models and (2) the further development of a model-agnostic explanation algorithm for black-box classifiers. Additionally, (3) we demonstrate how we can establish an interactive interface that allows users to further correct the model's output. We present our NearCAIPI Interaction Framework, which improves AI through user interaction, and show how this approach increases the system's trustworthiness. We also illustrate how NearCAIPI can integrate human feedback into an interactive process chain.

Updated: 2024-10-01 11:06:38

标题: 工业环境下的交互式可解释异常检测

摘要: 能够识别工业物体中的缺陷是生产线质量保证的关键要素。我们的研究重点是在RGB图像中进行视觉异常检测。尽管卷积神经网络（CNNs）在这一任务中取得了很高的准确性，但工业环境中的最终用户接收到模型的决策时并未附带额外的解释。因此，有必要丰富模型的输出以增加对模型的信心并加速异常检测。在我们的工作中，我们专注于（1）基于CNN的分类模型和（2）为黑盒分类器进一步开发模型不可知的解释算法。此外，（3）我们展示了如何建立一个交互界面，允许用户进一步纠正模型的输出。我们提出了我们的NearCAIPI交互框架，通过用户交互改进人工智能，并展示了这种方法如何增加系统的可信度。我们还阐明了NearCAIPI如何将人类反馈集成到交互式流程链中。

更新时间: 2024-10-01 11:06:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.12817v1

Scalable Data Assimilation with Message Passing

Data assimilation is a core component of numerical weather prediction systems. The large quantity of data processed during assimilation requires the computation to be distributed across increasingly many compute nodes, yet existing approaches suffer from synchronisation overhead in this setting. In this paper, we exploit the formulation of data assimilation as a Bayesian inference problem and apply a message-passing algorithm to solve the spatial inference problem. Since message passing is inherently based on local computations, this approach lends itself to parallel and distributed computation. In combination with a GPU-accelerated implementation, we can scale the algorithm to very large grid sizes while retaining good accuracy and compute and memory requirements.

Updated: 2024-10-01 11:01:37

标题: 使用消息传递实现可扩展的数据同化

摘要: 数据同化是数值天气预报系统的核心组成部分。在同化过程中处理的大量数据需要计算分布在越来越多的计算节点上，然而现有的方法在这种情况下存在同步开销。在本文中，我们利用数据同化作为贝叶斯推断问题的表述，并应用消息传递算法来解决空间推断问题。由于消息传递本质上是基于局部计算的，这种方法适用于并行和分布式计算。结合GPU加速实现，我们可以将算法扩展到非常大的网格尺寸，同时保持良好的准确性和计算及内存需求。

更新时间: 2024-10-01 11:01:37

领域: cs.LG,cs.DC,stat.AP

下载: http://arxiv.org/abs/2404.12968v2

Cheap Talking Algorithms

We simulate behaviour of two independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We adopt memoryless algorithms to capture learning in a static game where a large population interacts anonymously. We show that sender and receiver converge to Nash equilibrium play. The level of informativeness of the sender's cheap talk decreases as the bias increases and, at intermediate level of the bias, it matches the level predicted by the Pareto optimal equilibrium or by the second best one. Conclusions are robust to alternative specifications of the learning hyperparameters and of the game.

Updated: 2024-10-01 10:46:26

标题: 廉价的对话算法

摘要: 我们模拟了两个独立的强化学习算法在克劳福德和索贝尔（1982）的战略信息传递游戏中的行为。我们采用无记忆算法来捕捉在一个大型人口匿名交互的静态游戏中的学习过程。我们展示了发送方和接收方会收敛到纳什均衡策略。发送方的廉价交流的信息性水平随偏见增加而降低，在偏见的中间水平，它与帕累托最优均衡或次优均衡的水平相匹配。结论对于学习超参数和游戏的替代规范是稳健的。

更新时间: 2024-10-01 10:46:26

领域: econ.TH,cs.AI

下载: http://arxiv.org/abs/2310.07867v6

See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning

Brain CT report generation is significant to aid physicians in diagnosing cranial diseases. Recent studies concentrate on handling the consistency between visual and textual pathological features to improve the coherence of report. However, there exist some challenges: 1) Redundant visual representing: Massive irrelevant areas in 3D scans distract models from representing salient visual contexts. 2) Shifted semantic representing: Limited medical corpus causes difficulties for models to transfer the learned textual representations to generative layers. This study introduces a Pathological Clue-driven Representation Learning (PCRL) model to build cross-modal representations based on pathological clues and naturally adapt them for accurate report generation. Specifically, we construct pathological clues from perspectives of segmented regions, pathological entities, and report themes, to fully grasp visual pathological patterns and learn cross-modal feature representations. To adapt the representations for the text generation task, we bridge the gap between representation learning and report generation by using a unified large language model (LLM) with task-tailored instructions. These crafted instructions enable the LLM to be flexibly fine-tuned across tasks and smoothly transfer the semantic representation for report generation. Experiments demonstrate that our method outperforms previous methods and achieves SoTA performance. Our code is available at "https://github.com/Chauncey-Jheng/PCRL-MRG".

Updated: 2024-10-01 10:42:32

标题: 看细节说清楚：基于病理线索驱动的表示学习实现脑CT报告生成

摘要: 脑CT报告生成对于帮助医生诊断颅内疾病至关重要。最近的研究集中于处理视觉和文本病理特征之间的一致性，以提高报告的连贯性。然而，存在一些挑战：1）冗余的视觉表示：3D扫描中的大量无关区域会使模型分散注意力，无法准确表示显著的视觉背景。2）移位的语义表示：有限的医学语料库使模型难以将学习的文本表示转移到生成层。本研究引入了一种基于病理线索驱动的表示学习（PCRL）模型，基于病理线索构建跨模态表示，并自然地使其适应准确的报告生成。具体来说，我们从分割区域、病理实体和报告主题的角度构建病理线索，充分把握视觉病理模式并学习跨模态特征表示。为了使表示适应文本生成任务，我们通过使用统一的大型语言模型（LLM）与定制的任务说明来弥合表示学习和报告生成之间的差距。这些精心设计的说明使LLM能够灵活地在任务之间进行微调，并顺利地将语义表示转移到报告生成中。实验证明，我们的方法优于先前的方法，并实现了最优表现。我们的代码可以在"https://github.com/Chauncey-Jheng/PCRL-MRG"上找到。

更新时间: 2024-10-01 10:42:32

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.19676v2

Asynchronous Approximate Agreement with Quadratic Communication

We consider an asynchronous network of $n$ message-sending parties, up to $t$ of which are byzantine. We study approximate agreement, where the parties obtain approximately equal outputs in the convex hull of their inputs. In their seminal work, Abraham, Amit and Dolev [OPODIS '04] achieve this with the optimal resilience $t < \frac{n}{3}$ with a protocol where each party reliably broadcasts its input every iteration. This takes $\Theta(n^2)$ messages per reliable broadcast, or $\Theta(n^3)$ messages per iteration. In this work, we present optimally resilient asynchronous approximate agreement protocols where we forgo reliable broadcast to require communication proportional to $n^2$ instead of $n^3$. We begin with a protocol for $\omega$-dimensional barycentric agreement with $\mathcal{O}(\omega n^2)$ small messages that does not use reliable broadcast. Then, we achieve edge agreement in a tree of diameter $D$ with $\lceil \log_2 D \rceil$ iterations of a multivalued graded consensus variant. This results in a $\mathcal{O}(\log\frac{1}{\varepsilon})$-round protocol for $\varepsilon$-agreement in $[0, 1]$ with $\mathcal{O}(n^2\log\frac{1}{\varepsilon})$ messages and $\mathcal{O}(n^2\log\frac{1}{\varepsilon}\log\log\frac{1}{\varepsilon})$ bits of communication, improving over the state of the art which matches this complexity only when the inputs are all either $0$ or $1$. Finally, we extend our edge agreement protocol for edge agreement in $\mathbb{Z}$ and thus $\varepsilon$-agreement in $\mathbb{R}$ with quadratic communication, in $\mathcal{O}(\log\frac{M}{\varepsilon})$ rounds where $M$ is the maximum honest input magnitude.

Updated: 2024-10-01 10:36:52

标题: 异步近似一致性与二次通信

摘要: 我们考虑了一个由$n$个发送消息的参与方组成的异步网络，其中最多有$t$个拜占庭参与方。我们研究了近似协议，其中参与方在其输入的凸包中获得近似相等的输出。在Abraham、Amit和Dolev在其具有开创性意义的工作中[OPODIS '04]，他们通过一个协议实现了最佳弹性$t < \frac{n}{3}$，其中每个参与方在每次迭代中可靠地广播其输入。这需要$\Theta(n^2)$个消息来进行可靠广播，或者每次迭代需要$\Theta(n^3)$个消息。在这项工作中，我们提出了一种具有最佳弹性的异步近似协议，我们放弃了可靠广播以要求通信量与$n^2$成比例，而不是$n^3$。我们首先提出了一个$\omega$维重心协议，其中包含$\mathcal{O}(\omega n^2)$个小消息，不使用可靠广播。然后，我们在直径为$D$的树中通过多值评分共识变体的$\lceil \log_2 D \rceil$次迭代实现了边协议。这导致了一个$\mathcal{O}(\log\frac{1}{\varepsilon})$轮的协议，用于在$[0, 1]$中获得$\varepsilon$-协议，其中包含$\mathcal{O}(n^2\log\frac{1}{\varepsilon})$个消息和$\mathcal{O}(n^2\log\frac{1}{\varepsilon}\log\log\frac{1}{\varepsilon})$比特的通信，改进了目前的技术水平，只有在输入全部为$0$或$1$时才能匹配这种复杂性。最后，我们将我们的边协议扩展到$\mathbb{Z}$中的边协议，从而在$\mathbb{R}$中实现$\varepsilon$-协议，通信量为二次，需要$\mathcal{O}(\log\frac{M}{\varepsilon})$轮，其中$M$是最大诚实输入幅度。

更新时间: 2024-10-01 10:36:52

领域: cs.DC,cs.CR

下载: http://arxiv.org/abs/2408.05495v2

AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation

The impressive performance of proprietary LLMs like GPT4 in code generation has led to a trend to replicate these capabilities in open-source models through knowledge distillation (e.g. Code Evol-Instruct). However, these efforts often neglect the crucial aspect of response quality, relying heavily on teacher models for direct response distillation. This paradigm, especially for complex instructions, can degrade the quality of synthesized data, compromising the knowledge distillation process. To this end, our study introduces the Adaptive Modular Response Evolution (AMR-Evol) framework, which employs a two-stage process to refine response distillation. The first stage, modular decomposition, breaks down the direct response into more manageable sub-modules. The second stage, adaptive response evolution, automatically evolves the response with the related function modules. Our experiments with three popular code benchmarks (HumanEval, MBPP, and EvalPlus) attest to the superiority of the AMR-Evol framework over baseline response distillation methods. By comparing with the open-source Code LLMs trained on a similar scale of data, we observed performance enhancements: more than +3.0 points on HumanEval-Plus and +1.0 points on MBPP-Plus, which underscores the effectiveness of our framework. Our codes are available at https://github.com/ChiYeungLaw/AMR-Evol.

Updated: 2024-10-01 10:12:38

标题: AMR-Evol：自适应模块响应演化引发了更好的知识蒸馏，用于代码生成中的大型语言模型

摘要: 专有LLM（如GPT4）在代码生成方面的出色表现已经导致了一种趋势，即通过知识蒸馏（例如Code Evol-Instruct）在开源模型中复制这些能力。然而，这些努力经常忽视了响应质量这一关键方面，过分依赖教师模型进行直接响应蒸馏。特别是对于复杂的指令，这种范式可能会降低合成数据的质量，从而损害知识蒸馏过程。为此，我们的研究引入了自适应模块响应进化（AMR-Evol）框架，采用两阶段过程来改进响应蒸馏。第一阶段，模块化分解，将直接响应分解成更易管理的子模块。第二阶段，自适应响应进化，自动演化出相关的功能模块的响应。我们在三个流行的代码基准测试（HumanEval、MBPP和EvalPlus）上的实验证明了AMR-Evol框架优于基线响应蒸馏方法。通过与在类似数据规模上训练的开源代码LLM进行比较，我们观察到性能的提升：在HumanEval-Plus上超过+3.0分，在MBPP-Plus上超过+1.0分，这凸显了我们框架的有效性。我们的代码可以在https://github.com/ChiYeungLaw/AMR-Evol找到。

更新时间: 2024-10-01 10:12:38

领域: cs.CL,cs.AI,cs.SE

下载: http://arxiv.org/abs/2410.00558v1

Bone: Block Affine Transformation as Parameter Efficient Fine-tuning Methods for Large Language Models

Low-Rank Adaptation (LoRA) has achieved remarkable training results by freezing the original weights and training only low-rank matrices, establishing itself as the predominant fine-tuning method for LLMs. In pursuit of performance closer to full-parameter training, a series of LoRA variants have emerged, such as LoRA+, PISSA, Olora, and LoRA-GA. However, these improvements complicate the initial setup of model training and increase initialization time. More importantly, they overlook the internal interactions of the original weight information. To address these issues, we introduce a novel theory, ``Weight Guide'' aimed at continuously guiding trainable matrices through the original weights during training to enhance the utilization of weight information. Based on this theory, we designed a new PEFT technique called Bone (\textbf{B}l\textbf{o}ck Affi\textbf{ne}), which not only enhances the utilization of original weight information but also emphasizes the internal connections between weights, leading to faster convergence and better data fitting. Experimental comparisons across two different LLM architectures (LLaMA2, RWKV6) and various parameter scales demonstrate that the Bone structure can achieve rapid convergence and superior data fitting without the need for complex initialization. For example, when fine-tuning LLaMA2-7B on the MetaMathQA dataset and validating on GSM8k and math benchmarks, Bone achieved fine-tuning scores of 49.36 and 8.8, respectively, outperforming PISSA by 5.84\% and 1.96\%.

Updated: 2024-10-01 10:00:49

标题: 骨头：将块仿射变换作为大型语言模型参数高效微调的方法

摘要: 低秩适应（LoRA）通过冻结原始权重并仅训练低秩矩阵，取得了显著的训练结果，成为LLMs的主要微调方法。为了实现更接近全参数训练的性能，出现了一系列LoRA变种，如LoRA+、PISSA、Olora和LoRA-GA。然而，这些改进使模型训练的初始设置变得复杂，并增加了初始化时间。更重要的是，它们忽视了原始权重信息的内部相互作用。为了解决这些问题，我们提出了一个新的理论，“Weight Guide”，旨在通过训练过程中持续引导可训练矩阵通过原始权重，以增强权重信息的利用。基于这一理论，我们设计了一种新的PEFT技术，称为Bone（块仿射），它不仅增强了原始权重信息的利用，还强调了权重之间的内部连接，从而实现更快的收敛和更好的数据拟合。通过对两种不同LLM体系结构（LLaMA2、RWKV6）和各种参数规模的实验比较表明，Bone结构可以实现快速收敛和优越的数据拟合，而无需复杂的初始化。例如，当在MetaMathQA数据集上微调LLaMA2-7B，并在GSM8k和数学基准测试上进行验证时，Bone分别取得了49.36和8.8的微调得分，优于PISSA 5.84%和1.96%。

更新时间: 2024-10-01 10:00:49

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.15371v2

Cubic power functions with optimal second-order differential uniformity

We discuss the second-order differential uniformity of vectorial Boolean functions. The closely related notion of second-order zero differential uniformity has recently been studied in connection to resistance to the boomerang attack. We prove that monomial functions with univariate form $x^d$ where $d=2^{2k}+2^k+1$ and $\gcd(k,n)=1$ have optimal second-order differential uniformity. Computational results suggest that, up to affine equivalence, these might be the only optimal cubic power functions. We begin work towards generalising such conditions to all monomial functions of algebraic degree 3. We also discuss further questions arising from computational results.

Updated: 2024-10-01 09:46:02

标题: 带有最佳二阶微分均匀性的立方幂函数

摘要: 我们讨论了向量布尔函数的二阶微分均匀性。最近研究了与反弹攻击抵抗相关的二阶零微分均匀性概念。我们证明，具有单变量形式$x^d$的单项式函数，其中$d=2^{2k}+2^k+1$且$\gcd(k,n)=1$，具有最佳的二阶微分均匀性。计算结果表明，这些函数可能是唯一的最佳三次幂函数。我们开始着手将这些条件推广到所有代数度为3的单项式函数。我们还讨论了计算结果引发的进一步问题。

更新时间: 2024-10-01 09:46:02

领域: cs.IT,cs.CR,math.IT,math.NT,94D10 (Primary) 11T06, 94A60 (Secondary)

下载: http://arxiv.org/abs/2409.03467v2

Finding Shared Decodable Concepts and their Negations in the Brain

Prior work has offered evidence for functional localization in the brain; different anatomical regions preferentially activate for certain types of visual input. For example, the fusiform face area preferentially activates for visual stimuli that include a face. However, the spectrum of visual semantics is extensive, and only a few semantically-tuned patches of cortex have so far been identified in the human brain. Using a multimodal (natural language and image) neural network architecture (CLIP) we train a highly accurate contrastive model that maps brain responses during naturalistic image viewing to CLIP embeddings. We then use a novel adaptation of the DBSCAN clustering algorithm to cluster the parameters of these participant-specific contrastive models. This reveals what we call Shared Decodable Concepts (SDCs): clusters in CLIP space that are decodable from common sets of voxels across multiple participants. Examining the images most and least associated with each SDC cluster gives us additional insight into the semantic properties of each SDC. We note SDCs for previously reported visual features (e.g. orientation tuning in early visual cortex) as well as visual semantic concepts such as faces, places and bodies. In cases where our method finds multiple clusters for a visuo-semantic concept, the least associated images allow us to dissociate between confounding factors. For example, we discovered two clusters of food images, one driven by color, the other by shape. We also uncover previously unreported areas such as regions of extrastriate body area (EBA) tuned for legs/hands and sensitivity to numerosity in right intraparietal sulcus, and more. Thus, our contrastive-learning methodology better characterizes new and existing visuo-semantic representations in the brain by leveraging multimodal neural network representations and a novel adaptation of clustering algorithms.

Updated: 2024-10-01 09:43:43

标题: 在大脑中找到共享可解码概念及其否定

摘要: 以前的研究为大脑功能定位提供了证据；不同的解剖区域倾向于为特定类型的视觉输入激活。例如，颞下回面部区域偏好于激活包含面部的视觉刺激。然而，视觉语义的范围很广，目前在人类大脑中仅发现了少数语义调谐皮层区域。我们使用一个多模态（自然语言和图像）神经网络架构（CLIP），训练一个高度准确的对比模型，将自然图像观看期间的大脑反应映射到CLIP嵌入中。然后，我们使用一种新的DBSCAN聚类算法的改编来聚类这些参与者特定对比模型的参数。这揭示了我们所称的共享可解码概念（SDCs）：在多个参与者之间的共同体积集合中可以从CLIP空间中解码的聚类。检查与每个SDC聚类最相关和最不相关的图像，可以让我们进一步了解每个SDC的语义属性。我们注意到以前报道的视觉特征的SDCs（例如早期视觉皮层的方向调谐）以及视觉语义概念，如面部、地点和身体。在我们的方法发现一个视觉-语义概念的多个聚类的情况下，最不相关的图像可以让我们区分混淆因素。例如，我们发现两个食物图像的聚类，一个由颜色驱动，另一个由形状驱动。我们还发现以前未报告的区域，如针对腿/手的外后视体区域（EBA）以及对右侧顶顶沟数量的敏感性，等等。因此，我们的对比学习方法通过利用多模态神经网络表征和一种新颖的聚类算法的改编，更好地表征了大脑中新的和现有的视觉-语义表示。

更新时间: 2024-10-01 09:43:43

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.17663v2

Differentially Private Active Learning: Balancing Effective Data Selection and Privacy

Active learning (AL) is a widely used technique for optimizing data labeling in machine learning by iteratively selecting, labeling, and training on the most informative data. However, its integration with formal privacy-preserving methods, particularly differential privacy (DP), remains largely underexplored. While some works have explored differentially private AL for specialized scenarios like online learning, the fundamental challenge of combining AL with DP in standard learning settings has remained unaddressed, severely limiting AL's applicability in privacy-sensitive domains. This work addresses this gap by introducing differentially private active learning (DP-AL) for standard learning settings. We demonstrate that naively integrating DP-SGD training into AL presents substantial challenges in privacy budget allocation and data utilization. To overcome these challenges, we propose step amplification, which leverages individual sampling probabilities in batch creation to maximize data point participation in training steps, thus optimizing data utilization. Additionally, we investigate the effectiveness of various acquisition functions for data selection under privacy constraints, revealing that many commonly used functions become impractical. Our experiments on vision and natural language processing tasks show that DP-AL can improve performance for specific datasets and model architectures. However, our findings also highlight the limitations of AL in privacy-constrained environments, emphasizing the trade-offs between privacy, model accuracy, and data selection accuracy.

Updated: 2024-10-01 09:34:06

标题: 差分隐私主动学习：平衡有效数据选择和隐私

摘要: 主动学习（AL）是一种广泛使用的技术，通过迭代选择、标记和训练最具信息量的数据来优化机器学习中的数据标记。然而，它与正式隐私保护方法，特别是差分隐私（DP）的整合仍然大多未被探索。虽然一些作品已经探讨了针对专门场景如在线学习的差分私有AL，但将AL与DP在标准学习设置中结合的基本挑战仍未得到解决，严重限制了AL在隐私敏感领域的适用性。本文通过引入差分私有主动学习（DP-AL）来填补这一空白。我们证明了将DP-SGD训练天然地整合到AL中在隐私预算分配和数据利用方面存在重大挑战。为了克服这些挑战，我们提出了步骤放大，利用批量创建中的个体抽样概率来最大化数据点参与训练步骤，从而优化数据利用。此外，我们研究了在隐私约束下各种获取函数对数据选择的有效性，揭示了许多常用函数变得不切实际。我们在视觉和自然语言处理任务上的实验表明，DP-AL可以改善特定数据集和模型架构的性能。然而，我们的研究结果也突出了AL在隐私受限环境中的局限性，强调了隐私、模型准确性和数据选择准确性之间的权衡。

更新时间: 2024-10-01 09:34:06

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.00542v1

Tax Policy Handbook for Crypto Assets

The Financial system has witnessed rapid technological changes. The rise of Bitcoin and other crypto assets based on Distributed Ledger Technology mark a fundamental change in the way people transact and transmit value over a decentralized network, spread across geographies. This has created regulatory and tax policy blind spots, as governments and tax administrations take time to understand and provide policy responses to this innovative, revolutionary, and fast-paced technology. Due to the breakneck speed of innovation in blockchain technology and advent of Decentralized Finance, Decentralized Autonomous Organizations and the Metaverse, it is unlikely that the policy interventions and guidance by regulatory authorities or tax administrations would be ahead or in sync with the pace of innovation. This paper tries to explain the principles on which crypto assets function, their underlying technology and relates them to the tax issues and taxable events which arise within this ecosystem. It also provides instances of tax and regulatory policy responses already in effect in various jurisdictions, including the recent changes in reporting standards by the FATF and the OECD. This paper tries to explain the rationale behind existing laws and policies and the challenges in their implementation. It also attempts to present a ballpark estimate of tax potential of this asset class and suggests creation of global public digital infrastructure that can address issues related to pseudonymity and extra-territoriality. The paper analyses both direct and indirect taxation issues related to crypto assets and discusses more recent aspects like proof-of-stake and maximal extractable value in greater detail.

Updated: 2024-10-01 09:26:28

标题: 加密资产税收政策手册

摘要: 金融系统已经见证了快速的技术变革。比特币和其他基于分布式账本技术的加密资产的兴起标志着人们在分散网络上进行交易和传输价值的方式发生了根本性变化，这种网络遍布各个地理区域。这已经造成监管和税收政策的盲点，因为政府和税务管理部门需要时间来理解并对这种创新、革命性和快节奏的技术提供政策响应。由于区块链技术和去中心化金融、去中心化自治组织和元宇宙的出现，创新速度极快，监管机构或税务管理部门提供的政策干预和指导不太可能超前或与创新步伐同步。本文试图解释加密资产运作的原则，它们的基础技术并将其与在这一生态系统中出现的税收问题和应税事件联系起来。它还提供了已在各个司法管辖区生效的税收和监管政策响应的例子，包括FATF和OECD最近对报告标准的变化。本文试图解释现行法律和政策背后的理由以及实施中的挑战。它还试图提出对这一资产类别的税收潜力的球场估计，并建议创建一个可以解决涉及匿名性和超领地性问题的全球公共数字基础设施。本文分析了与加密资产相关的直接和间接税收问题，并更详细地讨论了最近的诸如权益证明和最大可提取价值等方面。

更新时间: 2024-10-01 09:26:28

领域: q-fin.GN,cs.CR

下载: http://arxiv.org/abs/2403.15074v3

Arges: Spatio-Temporal Transformer for Ulcerative Colitis Severity Assessment in Endoscopy Videos

Accurate assessment of disease severity from endoscopy videos in ulcerative colitis (UC) is crucial for evaluating drug efficacy in clinical trials. Severity is often measured by the Mayo Endoscopic Subscore (MES) and Ulcerative Colitis Endoscopic Index of Severity (UCEIS) score. However, expert MES/UCEIS annotation is time-consuming and susceptible to inter-rater variability, factors addressable by automation. Automation attempts with frame-level labels face challenges in fully-supervised solutions due to the prevalence of video-level labels in clinical trials. CNN-based weakly-supervised models (WSL) with end-to-end (e2e) training lack generalization to new disease scores and ignore spatio-temporal information crucial for accurate scoring. To address these limitations, we propose "Arges", a deep learning framework that utilizes a transformer with positional encoding to incorporate spatio-temporal information from frame features to estimate disease severity scores in endoscopy video. Extracted features are derived from a foundation model (ArgesFM), pre-trained on a large diverse dataset from multiple clinical trials (61M frames, 3927 videos). We evaluate four UC disease severity scores, including MES and three UCEIS component scores. Test set evaluation indicates significant improvements, with F1 scores increasing by 4.1% for MES and 18.8%, 6.6%, 3.8% for the three UCEIS component scores compared to state-of-the-art methods. Prospective validation on previously unseen clinical trial data further demonstrates the model's successful generalization.

Updated: 2024-10-01 09:23:14

标题: Arges：用于溃疡性结肠炎严重程度评估的内窥镜视频时空变换器

摘要: 在溃疡性结肠炎（UC）内窥镜视频中准确评估疾病严重程度对于评估临床试验中药物疗效至关重要。严重程度通常通过Mayo内镜子得分（MES）和溃疡性结肠炎内镜指数（UCEIS）评分来衡量。然而，专家MES/UCEIS注释耗时且容易受到评分者之间的可变性影响，这些因素可以通过自动化解决。由于临床试验中普遍使用视频级别标签，基于帧级标签的自动化尝试在完全监督的解决方案中面临挑战。基于CNN的弱监督模型（WSL）通过端到端（e2e）训练缺乏对新疾病评分的泛化性，并忽略了对准确评分至关重要的时空信息。为了解决这些限制，我们提出了“Arges”，这是一个深度学习框架，利用了带有位置编码的变压器，从帧特征中融入时空信息，以估计内窥镜视频中的疾病严重程度得分。提取的特征来自一个基础模型（ArgesFM），该模型在来自多个临床试验的大型多样化数据集（6100万帧，3927个视频）上进行了预训练。我们评估了四个UC疾病严重程度评分，包括MES和三个UCEIS组分得分。测试集评估显示显著提高，MES的F1得分提高了4.1%，三个UCEIS组分得分分别提高了18.8%，6.6%和3.8%，比起最先进的方法。对于以前未见过的临床试验数据的前瞻性验证进一步证明了模型的成功泛化性。

更新时间: 2024-10-01 09:23:14

领域: eess.IV,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2410.00536v1

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Large model inference is shifting from cloud to edge due to concerns about the privacy of user interaction data. However, edge devices often struggle with limited computing power, memory, and bandwidth, requiring collaboration across multiple devices to run and speed up LLM inference. Pipeline parallelism, the mainstream solution, is inefficient for single-user scenarios, while tensor parallelism struggles with frequent communications. In this paper, we argue that tensor parallelism can be more effective than pipeline on low-resource devices, and present a compute- and memory-efficient tensor parallel inference system, named TPI-LLM, to serve 70B-scale models. TPI-LLM keeps sensitive raw data local in the users' devices and introduces a sliding window memory scheduler to dynamically manage layer weights during inference, with disk I/O latency overlapped with the computation and communication. This allows larger models to run smoothly on memory-limited devices. We analyze the communication bottleneck and find that link latency, not bandwidth, emerges as the main issue, so a star-based allreduce algorithm is implemented. Through extensive experiments on both emulated and real testbeds, TPI-LLM demonstrated over 80% less time-to-first-token and token latency compared to Accelerate, and over 90% compared to Transformers and Galaxy, while cutting the peak memory footprint of Llama 2-70B by 90%, requiring only 3.1 GB of memory for 70B-scale models.

Updated: 2024-10-01 09:18:56

标题: TPI-LLM：在低资源边缘设备上高效地为规模达70B的LLMs提供服务

摘要: 大型模型推断由于对用户交互数据隐私的担忧，正从云端转移到边缘。然而，边缘设备通常面临计算能力、内存和带宽有限的困难，需要跨多个设备进行协作以运行和加速LLM推断。管道并行是主流解决方案，但对于单用户场景效率低，而张量并行则在频繁通信方面存在困难。在本文中，我们认为张量并行在低资源设备上可能比管道更有效，并提出了一种名为TPI-LLM的计算和内存高效的张量并行推断系统，可为70亿规模的模型提供服务。TPI-LLM在用户设备上保留敏感原始数据，并引入滑动窗口内存调度器来动态管理推断期间的层权重，将磁盘I/O延迟与计算和通信重叠。这使得更大的模型可以在内存有限的设备上平稳运行。我们分析了通信瓶颈，并发现链接延迟而不是带宽成为主要问题，因此实施了基于星形的全局归约算法。通过对模拟和真实测试平台的广泛实验，TPI-LLM相比于Accelerate，首次令牌时间和令牌延迟降低了80%以上，相比于Transformers和Galaxy，降低了90%以上，同时将Llama 2-70B的峰值内存占用减少了90%，仅需要3.1GB的内存来运行70亿规模的模型。

更新时间: 2024-10-01 09:18:56

领域: cs.DC,cs.AI,68T50,I.2.11

下载: http://arxiv.org/abs/2410.00531v1

Separation and Collapse of Equilibria Inequalities on AND-OR Trees without Shape Constraints

Herein, we investigate the zero-error randomized complexity, which is the least cost against the worst input, of AND-OR tree computation by imposing various restrictions on the algorithm to find the Boolean value of the root of that tree and no restrictions on the tree shape. When a tree satisfies a certain condition regarding its symmetry, directional algorithms proposed by Saks and Wigderson (1986), special randomized algorithms, are known to achieve the randomized complexity. Furthermore, there is a known example of a tree that is so unbalanced that no directional algorithm achieves the randomized complexity (Vereshchagin 1998). In this study, we aim to identify where deviations arise between the general randomized Boolean decision tree and its special case, directional algorithms. In this paper, we show that for any AND-OR tree, randomized depth-first algorithms, which form a broader class compared with directional algorithms, have the same equilibrium as that of the directional algorithms. Thus, we get the collapse result on equilibria inequalities that holds for an arbitrary AND-OR tree. This implies that there exists a case where even depth-first algorithms cannot be the fastest, leading to the separation result on equilibria inequality. Additionally, a new algorithm is introduced as a key concept for proof of the separation result.

Updated: 2024-10-01 09:11:53

标题: AND-OR树上的不等式分离和坍塌，无形状约束

摘要: 在这里，我们研究了零误随机复杂度，即通过对算法施加各种限制来找到树根的布尔值的AND-OR树计算的最小成本。当树满足对称性的某种条件时，Saks和Wigderson（1986）提出的定向算法，即特殊的随机算法，已知可以实现零误随机复杂度。此外，已知存在一种树的例子，该树不平衡以至于没有定向算法可以达到零误随机复杂度（Vereshchagin 1998）。在这项研究中，我们的目标是确定一般随机布尔决策树和其特殊情况定向算法之间的偏差出现在哪里。本文中，我们展示了对于任何AND-OR树，随机深度优先算法（与定向算法相比为更广泛的类别）具有与定向算法相同的均衡。因此，我们得到了关于均衡不等式的崩溃结果，该结果适用于任意AND-OR树。这意味着存在一种情况，即使深度优先算法也不能是最快的，从而导致均衡不等式的分离结果。此外，引入了一种新算法作为分离结果证明的关键概念。

更新时间: 2024-10-01 09:11:53

领域: cs.AI,68T20, 68Q17, 03D15, 91A60,I.2.8; F.2.2

下载: http://arxiv.org/abs/2405.20138v2

The Uniqueness of LLaMA3-70B Series with Per-Channel Quantization

We have observed a distinctive quantization-related behavior in the LLaMA3/3.1-70B models that is absent in both the LLaMA2-70B and LLaMA3/3.1/3.2-1B/3B/8B/405B models. Quantization is a crucial technique for deploying large language models (LLMs) efficiently. The impact of W8A8 post-training quantization on model accuracy, especially on the recently released LLaMA3/3.1 model series, remains contentious. In this paper, we explore three key questions: What makes the LLaMA3-70B model series uniquely vulnerable to quantization? Why is this the case? And how can the issue be addressed? We empirically investigate multiple LLMs featured on an open LLM leaderboard, discovering that the LLaMA3-70B model series have a unique accuracy degradation behavior with W8A8 per-channel post-training quantization. In contrast, other model series such as LLaMA2, LLaMA3/3.1-8B, LLaMA3.2, Qwen, Mixtral, Mistral, Phi-3, and Falcon demonstrate robust performance with W8A8. Contrary to previous assertions attributing degradation to the large dynamic range of activations, our findings indicate that the weight distribution of the LLaMA3-70B is the primary factor behind the vulnerability. By meticulously analyzing the distinct characteristics of weight distributions across Transformer blocks, we propose two solutions that make different tradeoffs in hardware/software overhead. First, we propose a mixed strategy where less than 3\% of the layers employ finer per-group W8A8 quantization granularity. Second, we introduce a bi-smoothing strategy that balances quantization errors between weights and activations while maintaining per-channel quantization throughout. Experimental results demonstrate that both strategies effectively preserve the accuracy of the entire LLaMA3-70B model series under W8A8 quantization, achieving performance on par with their FP16 counterparts.

Updated: 2024-10-01 09:05:45

标题: LLaMA3-70B系列与逐通道量化的独特性

摘要: 我们观察到LLaMA3/3.1-70B模型中存在一种与量化相关的独特行为，而在LLaMA2-70B和LLaMA3/3.1/3.2-1B/3B/8B/405B模型中不存在。量化是部署大型语言模型（LLMs）的关键技术。W8A8后训练量化对模型准确性的影响，特别是最近发布的LLaMA3/3.1模型系列，仍存在争议。在本文中，我们探讨了三个关键问题：什么让LLaMA3-70B模型系列在量化方面独特脆弱？为什么会这样？问题如何解决？我们在一个开放的LLM排行榜上对多个LLM进行了实证研究，发现LLaMA3-70B模型系列在W8A8每通道后训练量化中具有独特的准确性降级行为。相比之下，其他模型系列如LLaMA2、LLaMA3/3.1-8B、LLaMA3.2、Qwen、Mixtral、Mistral、Phi-3和Falcon在W8A8下表现稳健。与之前认为降级是由于激活的大动态范围相冲突的说法相反，我们的研究结果表明，LLaMA3-70B的权重分布是脆弱性的主要因素。通过精细分析Transformer块之间权重分布的独特特征，我们提出了两种在硬件/软件开销方面进行不同权衡的解决方案。首先，我们提出了一种混合策略，其中不到3%的层采用更细的每组W8A8量化粒度。其次，我们引入了一种双平滑策略，平衡了权重和激活之间的量化误差，同时保持了全通道量化。实验结果表明，这两种策略有效地保持了整个LLaMA3-70B模型系列在W8A8量化下的准确性，实现了与其FP16对应物相当的性能。

更新时间: 2024-10-01 09:05:45

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2408.15301v2

Exploring the Learning Capabilities of Language Models using LEVERWORLDS

Learning a model of a stochastic setting often involves learning both general structure rules and specific properties of the instance. This paper investigates the interplay between learning the general and the specific in various learning methods, with emphasis on sample efficiency. We design a framework called {\sc LeverWorlds}, which allows the generation of simple physics-inspired worlds that follow a similar generative process with different distributions, and their instances can be expressed in natural language. These worlds allow for controlled experiments to assess the sample complexity of different learning methods. We experiment with classic learning algorithms as well as Transformer language models, both with fine-tuning and In-Context Learning (ICL). Our general finding is that (1) Transformers generally succeed in the task; but (2) they are considerably less sample efficient than classic methods that make stronger assumptions about the structure, such as Maximum Likelihood Estimation and Logistic Regression. This finding is in tension with the recent tendency to use Transformers as general-purpose estimators. We propose an approach that leverages the ICL capabilities of contemporary language models to apply simple algorithms for this type of data. Our experiments show that models currently struggle with the task but show promising potential.

Updated: 2024-10-01 09:02:13

标题: 使用LEVERWORLDS探索语言模型的学习能力

摘要: 学习随机环境的模型通常涉及学习一般结构规则和实例的特定属性。本文研究了在各种学习方法中学习一般和特定之间的相互作用，重点在于样本效率。我们设计了一个名为{\sc LeverWorlds}的框架，允许生成遵循类似生成过程但具有不同分布的简单物理启发式世界，它们的实例可以用自然语言表达。这些世界允许进行受控实验来评估不同学习方法的样本复杂度。我们对经典学习算法进行了实验，还进行了Transformer语言模型的实验，包括微调和上下文学习（ICL）。我们的一般发现是，（1）Transformers通常成功完成任务；但是（2）它们的样本效率明显低于对结构做出更强假设的经典方法，如最大似然估计和逻辑回归。这一发现与最近使用Transformers作为通用估计器的趋势相矛盾。我们提出了一种利用当代语言模型的ICL能力来应用简单算法处理这种类型数据的方法。我们的实验表明，目前模型在这项任务上存在困难，但显示出有希望的潜力。

更新时间: 2024-10-01 09:02:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.00519v1

Future You: A Conversation with an AI-Generated Future Self Reduces Anxiety, Negative Emotions, and Increases Future Self-Continuity

We introduce "Future You," an interactive, brief, single-session, digital chat intervention designed to improve future self-continuity--the degree of connection an individual feels with a temporally distant future self--a characteristic that is positively related to mental health and wellbeing. Our system allows users to chat with a relatable yet AI-powered virtual version of their future selves that is tuned to their future goals and personal qualities. To make the conversation realistic, the system generates a "synthetic memory"--a unique backstory for each user--that creates a throughline between the user's present age (between 18-30) and their life at age 60. The "Future You" character also adopts the persona of an age-progressed image of the user's present self. After a brief interaction with the "Future You" character, users reported decreased anxiety, and increased future self-continuity. This is the first study successfully demonstrating the use of personalized AI-generated characters to improve users' future self-continuity and wellbeing.

Updated: 2024-10-01 09:00:57

标题: 未来的你：与由人工智能生成的未来自己对话可减轻焦虑，消除负面情绪，并增强未来自我的连续性

摘要: 我们介绍了“未来自己”，这是一种互动的、简短的、单次的数字聊天干预，旨在提高未来自我连续性——个体与时间上遥远未来自我的联系程度——这一特征与心理健康和福祉呈正相关。我们的系统允许用户与一个可关联但又经过人工智能调整的虚拟未来自我的版本进行交流，该版本根据用户的未来目标和个人品质进行调整。为了使对话更加真实，系统生成了“合成记忆”——为每个用户创造了一个独特的背景故事——从而在用户当前年龄（18-30岁之间）与他们60岁时的生活之间建立起联系。 “未来自己”人物还采用了用户当前自己的年龄进展图像的人格。与“未来自己”人物进行简短互动后，用户报告称焦虑减少，未来自我的连续性增加。这是第一项成功证明使用个性化的AI生成人物来提高用户未来自我的连续性和福祉的研究。

更新时间: 2024-10-01 09:00:57

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2405.12514v4

Human-Robot Collaborative Minimum Time Search through Sub-priors in Ant Colony Optimization

Human-Robot Collaboration (HRC) has evolved into a highly promising issue owing to the latest breakthroughs in Artificial Intelligence (AI) and Human-Robot Interaction (HRI), among other reasons. This emerging growth increases the need to design multi-agent algorithms that can manage also human preferences. This paper presents an extension of the Ant Colony Optimization (ACO) meta-heuristic to solve the Minimum Time Search (MTS) task, in the case where humans and robots perform an object searching task together. The proposed model consists of two main blocks. The first one is a convolutional neural network (CNN) that provides the prior probabilities about where an object may be from a segmented image. The second one is the Sub-prior MTS-ACO algorithm (SP-MTS-ACO), which takes as inputs the prior probabilities and the particular search preferences of the agents in different sub-priors to generate search plans for all agents. The model has been tested in real experiments for the joint search of an object through a Vizanti web-based visualization in a tablet computer. The designed interface allows the communication between a human and our humanoid robot named IVO. The obtained results show an improvement in the search perception of the users without loss of efficiency.

Updated: 2024-10-01 08:57:28

标题: 人机协作下基于蚁群优化算法的最短搜索路径研究

摘要: 人机协作（HRC）由于人工智能（AI）和人机交互（HRI）等领域的最新突破而发展成为一个极具前景的问题。这种新兴增长增加了设计能够管理人类偏好的多智能体算法的需求。本文提出了一种将蚁群优化（ACO）元启发式扩展为解决最小时间搜索（MTS）任务的模型，其中人类和机器人一起执行物体搜索任务。所提出的模型由两个主要模块组成。第一个是一个卷积神经网络（CNN），提供了有关从分割图像中物体可能位于何处的先验概率。第二个是子先验MTS-ACO算法（SP-MTS-ACO），其将先验概率和不同子先验中代理的特定搜索偏好作为输入，为所有代理生成搜索计划。该模型已在平板电脑上通过Vizanti基于Web的可视化进行了实际实验，用于共同搜索物体。设计的界面允许人类与名为IVO的我们的仿人机器人进行通信。获得的结果显示了用户在搜索感知上的改进，而没有效率损失。

更新时间: 2024-10-01 08:57:28

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.00517v1

Enhancing Sentinel-2 Image Resolution: Evaluating Advanced Techniques based on Convolutional and Generative Neural Networks

This paper investigates the enhancement of spatial resolution in Sentinel-2 bands that contain spectral information using advanced super-resolution techniques by a factor of 2. State-of-the-art CNN models are compared with enhanced GAN approaches in terms of quality and feasibility. Therefore, a representative dataset comprising Sentinel-2 low-resolution images and corresponding high-resolution aerial orthophotos is required. Literature study reveals no feasible dataset for the land type of interest (forests), for which reason an adequate dataset had to be generated in addition, accounting for accurate alignment and image source optimization. The results reveal that while CNN-based approaches produce satisfactory outcomes, they tend to yield blurry images. In contrast, GAN-based models not only provide clear and detailed images, but also demonstrate superior performance in terms of quantitative assessment, underlying the potential of the framework beyond the specific land type investigated.

Updated: 2024-10-01 08:56:46

标题: 提高Sentinel-2图像分辨率：基于卷积和生成神经网络的先进技术评估

摘要: 本文研究了使用先进的超分辨率技术将Sentinel-2波段中包含光谱信息的空间分辨率提高2倍。在质量和可行性方面，将最先进的CNN模型与增强的GAN方法进行比较。因此，需要一个代表性数据集，包括Sentinel-2低分辨率图像和相应的高分辨率航空正射影像。文献研究表明，对于感兴趣的土地类型（森林），目前没有可行的数据集，因此需要额外生成一个适当的数据集，考虑到准确的对齐和图像源优化。结果显示，虽然基于CNN的方法产生了令人满意的结果，但它们往往会产生模糊的图像。相比之下，基于GAN的模型不仅提供清晰和详细的图像，而且在定量评估方面表现出优越的性能，突显了该框架在研究具体土地类型之外的潜力。

更新时间: 2024-10-01 08:56:46

领域: eess.IV,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.00516v1

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

Building general-purpose robots that operate seamlessly in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. However, as a community, we have been constraining most robotic systems by designing them for specific tasks, training them on specific datasets, and deploying them within specific environments. These systems require extensively-labeled data and task-specific models. When deployed in real-world scenarios, such systems face several generalization issues and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of general-purpose robotics, and also exploring (ii) what a robotics-specific foundation model would look like. We begin by providing a generalized formulation of how foundation models are used in robotics, and the fundamental barriers to making generalist robots universally applicable. Next, we establish a taxonomy to discuss current work exploring ways to leverage existing foundation models for robotics and develop ones catered to robotics. Finally, we discuss key challenges and promising future directions in using foundation models for enabling general-purpose robotic systems. We encourage readers to view our living GitHub repository 2 of resources, including papers reviewed in this survey, as well as related projects and repositories for developing foundation models for robotics.

Updated: 2024-10-01 08:54:53

标题: 朝向通用机器人：通过基础模型的调查和元分析

摘要: 构建可以在任何环境中无缝运行的通用机器人，与任何物体进行交互，并利用各种技能完成多样化任务一直是人工智能领域的长期目标。然而，作为一个社区，我们一直通过为特定任务设计机器人系统、在特定数据集上训练它们，并在特定环境中部署它们来限制大多数机器人系统。这些系统需要大量标记的数据和特定任务的模型。当这些系统在现实场景中部署时，它们面临着一些泛化问题，并且难以保持对分布变化的稳健性。受到在自然语言处理（NLP）和计算机视觉（CV）等研究领域中基于web规模、大容量预训练模型（即基础模型）的出色开放性表现和内容生成能力的启发，我们致力于探索（i）这些现有基础模型如何应用于通用机器人领域，以及探索（ii）一个机器人特定的基础模型会是什么样子。我们首先提供了基础模型在机器人领域中的使用方式的一般公式化，以及使通用机器人普遍适用的基本障碍。接下来，我们建立一个分类法，讨论当前的工作探索如何利用现有基础模型为机器人开发并发展专门的基础模型。最后，我们讨论使用基础模型实现通用机器人系统的关键挑战和有前景的未来方向。我们鼓励读者查看我们的资源库，包括本调查中审查的论文以及相关项目和存储库，用于开发机器人的基础模型。

更新时间: 2024-10-01 08:54:53

领域: cs.RO,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.08782v3

Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing

Recent efforts have aimed to utilize multilingual pretrained language models (mPLMs) to extend semantic parsing (SP) across multiple languages without requiring extensive annotations. However, achieving zero-shot cross-lingual transfer for SP remains challenging, leading to a performance gap between source and target languages. In this study, we propose Cross-Lingual Back-Parsing (CBP), a novel data augmentation methodology designed to enhance cross-lingual transfer for SP. Leveraging the representation geometry of the mPLMs, CBP synthesizes target language utterances from source meaning representations. Our methodology effectively performs cross-lingual data augmentation in challenging zero-resource settings, by utilizing only labeled data in the source language and monolingual corpora. Extensive experiments on two cross-language SP benchmarks (Mschema2QA and Xspider) demonstrate that CBP brings substantial gains in the target language. Further analysis of the synthesized utterances shows that our method successfully generates target language utterances with high slot value alignment rates while preserving semantic integrity. Our codes and data are publicly available at https://github.com/deokhk/CBP.

Updated: 2024-10-01 08:53:38

标题: 跨语言反解析：从意义表示合成话语以进行无资源语义解析

摘要: 最近的研究旨在利用多语言预训练语言模型（mPLMs）扩展跨多语言的语义解析（SP），而无需大量注释。然而，实现SP的零次跨语言转移仍然具有挑战性，导致源语言和目标语言之间的性能差距。在本研究中，我们提出了一种名为跨语言反向解析（CBP）的新型数据增强方法，旨在增强SP的跨语言转移。利用mPLMs的表征几何形状，CBP从源含义表示中合成目标语言话语。我们的方法在具有挑战性的零资源环境中有效地进行跨语言数据增强，仅利用源语言和单语语料库中的标记数据。在两个跨语言SP基准测试（Mschema2QA和Xspider）上进行的大量实验表明，CBP在目标语言中带来了实质性收益。对合成话语的进一步分析显示，我们的方法成功生成高槽值对齐率的目标语言话语，同时保留语义完整性。我们的代码和数据可在https://github.com/deokhk/CBP 上公开获取。

更新时间: 2024-10-01 08:53:38

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.00513v1

Pre-training with Synthetic Patterns for Audio

In this paper, we propose to pre-train audio encoders using synthetic patterns instead of real audio data. Our proposed framework consists of two key elements. The first one is Masked Autoencoder (MAE), a self-supervised learning framework that learns from reconstructing data from randomly masked counterparts. MAEs tend to focus on low-level information such as visual patterns and regularities within data. Therefore, it is unimportant what is portrayed in the input, whether it be images, audio mel-spectrograms, or even synthetic patterns. This leads to the second key element, which is synthetic data. Synthetic data, unlike real audio, is free from privacy and licensing infringement issues. By combining MAEs and synthetic patterns, our framework enables the model to learn generalized feature representations without real data, while addressing the issues related to real audio. To evaluate the efficacy of our framework, we conduct extensive experiments across a total of 13 audio tasks and 17 synthetic datasets. The experiments provide insights into which types of synthetic patterns are effective for audio. Our results demonstrate that our framework achieves performance comparable to models pre-trained on AudioSet-2M and partially outperforms image-based pre-training methods.

Updated: 2024-10-01 08:52:35

标题: 使用合成模式进行音频的预训练

摘要: 在本文中，我们提出使用合成模式而不是真实音频数据来预训练音频编码器。我们提出的框架包括两个关键元素。第一个是遮蔽自动编码器（MAE），这是一个自监督学习框架，从随机遮蔽对应数据的重构中学习。 MAE倾向于关注低级信息，如视觉模式和数据内的规律性。因此，输入中所描绘的内容（无论是图像、音频mel-光谱图，甚至是合成模式）并不重要。这导致了第二个关键元素，即合成数据。合成数据与真实音频不同，不受隐私和许可侵权问题的影响。通过结合MAE和合成模式，我们的框架使模型能够学习广义特征表示而无需真实数据，同时解决与真实音频相关的问题。为了评估我们框架的有效性，我们在总共13个音频任务和17个合成数据集上进行了广泛实验。实验提供了哪种类型的合成模式对音频有效的见解。我们的结果表明，我们的框架实现了与在AudioSet-2M上预训练的模型相当的性能，并部分优于基于图像的预训练方法。

更新时间: 2024-10-01 08:52:35

领域: eess.AS,cs.AI,cs.CV

下载: http://arxiv.org/abs/2410.00511v1

Read Over the Lines: Attacking LLMs and Toxicity Detection Systems with ASCII Art to Mask Profanity

We introduce a novel family of adversarial attacks that exploit the inability of language models to interpret ASCII art. To evaluate these attacks, we propose the ToxASCII benchmark and develop two custom ASCII art fonts: one leveraging special tokens and another using text-filled letter shapes. Our attacks achieve a perfect 1.0 Attack Success Rate across ten models, including OpenAI's o1-preview and LLaMA 3.1. Warning: this paper contains examples of toxic language used for research purposes.

Updated: 2024-10-01 08:50:01

标题: 阅读文字之间的线索：使用ASCII艺术攻击LLMs和毒性检测系统，以掩盖粗话

摘要: 我们引入了一种新颖的对抗攻击家族，利用语言模型无法解释ASCII艺术的特点。为了评估这些攻击，我们提出了ToxASCII基准测试，并开发了两种自定义ASCII艺术字体：一种利用特殊标记，另一种使用填充文本的字母形状。我们的攻击在十个模型中均实现了完美的1.0攻击成功率，包括OpenAI的o1-preview和LLaMA 3.1。警告：本文包含了研究目的的有毒语言示例。

更新时间: 2024-10-01 08:50:01

领域: cs.CL,cs.AI,cs.CR

下载: http://arxiv.org/abs/2409.18708v3

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

Since the invention of GPT2--1.5B in 2019, large language models (LLMs) have transitioned from specialized models to versatile foundation models. The LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment. Traditional fine-tuning techniques with the first-order optimizers require substantial GPU memory that exceeds mainstream hardware capability. Therefore, memory-efficient methods are motivated to be investigated. Model compression techniques can reduce energy consumption, operational costs, and environmental impact so that to support sustainable artificial intelligence advancements. Additionally, large-scale foundation models have expanded to create images, audio, videos, and multi-modal contents, further emphasizing the need for efficient deployment. Therefore, we are motivated to present a comprehensive overview of the prevalent memory-efficient fine-tuning methods over the network edge. We also review the state-of-the-art literatures on model compression to provide a vision on deploying LLMs over the network edge.

Updated: 2024-10-01 08:48:34

标题: Fein-Tuning和在边缘部署大型语言模型：问题和方法

摘要: 自2019年GPT2--1.5B的发明以来，大型语言模型（LLMs）已经从专门的模型转变为多功能的基础模型。LLMs展示了令人印象深刻的零-shot能力，然而，需要在本地数据集上进行微调，并需要大量资源进行部署。传统的微调技术与一阶优化器需要大量的GPU内存，超出了主流硬件的能力。因此，有动机去研究内存高效的方法。模型压缩技术可以降低能源消耗、运营成本和环境影响，以支持可持续的人工智能进步。此外，大规模的基础模型已经扩展到创建图像、音频、视频和多模态内容，进一步强调了高效部署的需求。因此，我们有动机提供对网络边缘上流行的内存高效微调方法的全面概述。我们还回顾了关于模型压缩的最新文献，以提供关于在网络边缘部署LLMs的愿景。

更新时间: 2024-10-01 08:48:34

领域: cs.AI

下载: http://arxiv.org/abs/2408.10691v2

Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging

Supervised fine-tuning (SFT) is crucial for adapting Large Language Models (LLMs) to specific tasks. In this work, we demonstrate that the order of training data can lead to significant training imbalances, potentially resulting in performance degradation. Consequently, we propose to mitigate this imbalance by merging SFT models fine-tuned with different data orders, thereby enhancing the overall effectiveness of SFT. Additionally, we introduce a novel technique, "parameter-selection merging," which outperforms traditional weighted-average methods on five datasets. Further, through analysis and ablation studies, we validate the effectiveness of our method and identify the sources of performance improvements.

Updated: 2024-10-01 08:44:31

标题: 通过选择性参数合并在LLM微调中减轻训练不平衡

摘要: Supervised fine-tuning (SFT)对于调整大型语言模型（LLMs）以适应特定任务至关重要。在这项工作中，我们证明训练数据的顺序可能导致显著的训练不平衡，可能导致性能下降。因此，我们提出通过合并以不同数据顺序进行微调的SFT模型来缓解这种不平衡，从而增强SFT的整体有效性。此外，我们引入了一个新颖的技术“参数选择合并”，在五个数据集上优于传统的加权平均方法。通过分析和消融研究，我们验证了我们方法的有效性，并确定性能改进的来源。

更新时间: 2024-10-01 08:44:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.03743v1

Effective Intrusion Detection for UAV Communications using Autoencoder-based Feature Extraction and Machine Learning Approach

This paper proposes a novel intrusion detection method for unmanned aerial vehicles (UAV) in the presence of recent actual UAV intrusion dataset. In particular, in the first stage of our method, we design an autoencoder architecture for effectively extracting important features, which are then fed into various machine learning models in the second stage for detecting and classifying attack types. To the best of our knowledge, this is the first attempt to propose such the autoencoder-based machine learning intrusion detection method for UAVs using actual dataset, while most of existing works only consider either simulated datasets or datasets irrelevant to UAV communications. Our experiment results show that the proposed method outperforms the baselines such as feature selection schemes in both binary and multi-class classification tasks.

Updated: 2024-10-01 08:44:23

标题: 无人机通信的有效入侵检测：基于自编码器特征提取和机器学习方法

摘要: 本文提出了一种针对最近实际无人机侵入数据集的无人机入侵检测方法。具体而言，在我们的方法的第一阶段，我们设计了一个自动编码器架构，用于有效提取重要特征，然后在第二阶段将这些特征输入到各种机器学习模型中，以便检测和分类攻击类型。据我们所知，这是第一次尝试使用实际数据集提出基于自动编码器的机器学习入侵检测方法，而大多数现有作品只考虑模拟数据集或与无人机通信无关的数据集。我们的实验结果表明，所提出的方法在二元分类和多类分类任务中优于基线方法，比如特征选择方案。

更新时间: 2024-10-01 08:44:23

领域: cs.RO,cs.AI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2410.02827v1

Mutatis Mutandis: Revisiting the Comparator in Discrimination Testing

Testing for discrimination consists of deriving a profile, known as the comparator, similar to the profile making the discrimination claim, known as the complainant, and comparing the outcomes of these two profiles. An important aspect for establishing discrimination is evidence, often obtained via discrimination testing tools that implement the complainant-comparator pair. In this work, we revisit the role of the comparator in discrimination testing. We argue for the causal modeling nature of deriving the comparator, and introduce a two-kinds classification for the comparator: the ceteris paribus (CP), and mutatis mutandis (MM) comparators. The CP comparator is the standard one among discrimination testing, representing an idealized comparison as it aims for having a complainant-comparator pair that only differs on membership to the protected attribute. As an alternative to it, we define the MM comparator, which requires that the comparator represents what would have been of the complainant without the effects of the protected attribute on the non-protected attributes. The complainant-comparator pair, in that case, may also be dissimilar in terms of all attributes. We illustrate these two comparators and their impact on discrimination testing using a real illustrative example. Importantly, we position generative models and, overall, machine learning methods as useful tools for constructing the MM comparator and, in turn, achieving more complex and realistic comparisons when testing for discrimination.

Updated: 2024-10-01 08:40:17

标题: 变换的变化：重新审视辨别测试中的比较器

摘要: 测试歧视包括制定一个类似于提出歧视主张的个人（投诉人）的档案，被称为比较者，并比较这两个档案的结果。建立歧视的一个重要方面是证据，通常通过实施投诉人-比较者对的歧视测试工具获得。在这项工作中，我们重新审视了歧视测试中比较者的作用。我们主张将比较者推导为因果建模的性质，并为比较者引入了两种分类：齐其余条件（CP）和变更其余条件（MM）比较者。CP比较者是歧视测试中的标准之一，代表理想化的比较，因为它旨在使投诉人-比较者对仅在受保护属性的成员资格上有所不同。作为其替代方案，我们定义了MM比较者，它要求比较者代表没有受保护属性对非受保护属性的影响会是什么样子的投诉人。在这种情况下，投诉人-比较者对在所有属性方面可能也是不相似的。我们使用一个真实的例子说明了这两种比较者及其对歧视测试的影响。重要的是，我们将生成模型和总体上将机器学习方法定位为构建MM比较者的有用工具，并进而在测试歧视时实现更复杂和更现实的比较。

更新时间: 2024-10-01 08:40:17

领域: cs.LG

下载: http://arxiv.org/abs/2405.13693v2

Multi-Target Cross-Lingual Summarization: a novel task and a language-neutral approach

Cross-lingual summarization aims to bridge language barriers by summarizing documents in different languages. However, ensuring semantic coherence across languages is an overlooked challenge and can be critical in several contexts. To fill this gap, we introduce multi-target cross-lingual summarization as the task of summarizing a document into multiple target languages while ensuring that the produced summaries are semantically similar. We propose a principled re-ranking approach to this problem and a multi-criteria evaluation protocol to assess semantic coherence across target languages, marking a first step that will hopefully stimulate further research on this problem.

Updated: 2024-10-01 08:33:57

标题: 多目标跨语言摘要：一项新颖的任务和一种语言中立的方法

摘要: 跨语言摘要旨在通过总结不同语言的文件来弥合语言障碍。然而，确保跨语言之间的语义连贯性是一个被忽视的挑战，在几个背景下可能至关重要。为了填补这一空白，我们引入了多目标跨语言摘要作为将文档总结为多种目标语言的任务，同时确保生成的摘要在语义上相似。我们提出了一种基于原则的重新排序方法来解决这个问题，并提出了一个多标准评估协议来评估跨目标语言的语义连贯性，这是一个希望能激发进一步研究这个问题的第一步。

更新时间: 2024-10-01 08:33:57

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.00502v1

Unlabeled Debiasing in Downstream Tasks via Class-wise Low Variance Regularization

Language models frequently inherit societal biases from their training data. Numerous techniques have been proposed to mitigate these biases during both the pre-training and fine-tuning stages. However, fine-tuning a pre-trained debiased language model on a downstream task can reintroduce biases into the model. Additionally, existing debiasing methods for downstream tasks either (i) require labels of protected attributes (e.g., age, race, or political views) that are often not available or (ii) rely on indicators of bias, which restricts their applicability to gender debiasing since they rely on gender-specific words. To address this, we introduce a novel debiasing regularization technique based on the class-wise variance of embeddings. Crucially, our method does not require attribute labels and targets any attribute, thus addressing the shortcomings of existing debiasing methods. Our experiments on encoder language models and three datasets demonstrate that our method outperforms existing strong debiasing baselines that rely on target attribute labels while maintaining performance on the target task.

Updated: 2024-10-01 08:30:13

标题: 通过类别低方差正则化在下游任务中的无标签去偏见

摘要: 语言模型经常从其训练数据中继承社会偏见。已经提出了许多技术来减轻这些偏见，包括在预训练和微调阶段。然而，在下游任务上微调预训练的去偏见语言模型可能会重新引入偏见。此外，现有的针对下游任务的去偏见方法要么需要受保护属性的标签（例如年龄、种族或政治观点），这些标签通常不可用，要么依赖于偏见指标，这限制了它们在性别去偏见方面的适用性，因为它们依赖于特定于性别的词语。为了解决这个问题，我们提出了一种基于嵌入类别方差的新型去偏见正则化技术。关键是，我们的方法不需要属性标签，并且可以针对任何属性，因此解决了现有去偏见方法的缺点。我们在编码器语言模型和三个数据集上的实验证明，我们的方法在维持目标任务性能的同时，优于依赖目标属性标签的现有强去偏见基线。

更新时间: 2024-10-01 08:30:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.19541v2

SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models

Text-to-image diffusion models (SD) exhibit significant advancements while requiring extensive computational resources. Existing acceleration methods usually require extensive training and are not universally applicable. LCM-LoRA, trainable once for diverse models, offers universality but rarely considers ensuring the consistency of generated content before and after acceleration. This paper proposes SpeedUpNet (SUN), an innovative acceleration module, to address the challenges of universality and consistency. Exploiting the role of cross-attention layers in U-Net for SD models, we introduce an adapter specifically designed for these layers, quantifying the offset in image generation caused by negative prompts relative to positive prompts. This learned offset demonstrates stability across a range of models, enhancing SUN's universality. To improve output consistency, we propose a Multi-Step Consistency (MSC) loss, which stabilizes the offset and ensures fidelity in accelerated content. Experiments on SD v1.5 show that SUN leads to an overall speedup of more than 10 times compared to the baseline 25-step DPM-solver++, and offers two extra advantages: (1) training-free integration into various fine-tuned Stable-Diffusion models and (2) state-of-the-art FIDs of the generated data set before and after acceleration guided by random combinations of positive and negative prompts. Code is available: https://williechai.github.io/speedup-plugin-for-stable-diffusions.github.io.

Updated: 2024-10-01 08:30:05

标题: SpeedUpNet：用于加速文本到图像扩散模型的即插即用适配器网络

摘要: 文本到图像扩散模型（SD）在需要大量计算资源的同时取得了显著进展。现有的加速方法通常需要大量训练，并非普遍适用。LCM-LoRA可一次训练适用于各种模型，具有普遍性，但很少考虑在加速之前和之后生成内容的一致性。本文提出了SpeedUpNet（SUN），一种创新的加速模块，旨在解决普遍性和一致性方面的挑战。利用U-Net中的交叉注意力层在SD模型中的作用，我们引入了专门为这些层设计的适配器，量化了由于相对于正面提示而导致的图像生成偏移。这种学习到的偏移在一系列模型中表现稳定，增强了SUN的普遍性。为了提高输出一致性，我们提出了一个多步一致性（MSC）损失，该损失稳定了偏移并确保加速内容的忠实度。在SD v1.5上的实验表明，与基准的25步DPM-solver++相比，SUN使整体加速超过10倍，并提供两个额外优势：（1）无需训练即可集成到各种微调的稳态扩散模型中，以及（2）通过随机组合正面和负面提示指导加速前后生成的数据集的最新FID。代码可在以下链接找到：https://williechai.github.io/speedup-plugin-for-stable-diffusions.github.io.

更新时间: 2024-10-01 08:30:05

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2312.08887v4

Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling

This study explores replacing Transformers in Visual Language Models (VLMs) with Mamba, a recent structured state space model (SSM) that demonstrates promising performance in sequence modeling. We test models up to 3B parameters under controlled conditions, showing that Mamba-based VLMs outperforms Transformers-based VLMs in captioning, question answering, and reading comprehension. However, we find that Transformers achieve greater performance in visual grounding and the performance gap widens with scale. We explore two hypotheses to explain this phenomenon: 1) the effect of task-agnostic visual encoding on the updates of the hidden states, and 2) the difficulty in performing visual grounding from the perspective of in-context multimodal retrieval. Our results indicate that a task-aware encoding yields minimal performance gains on grounding, however, Transformers significantly outperform Mamba at in-context multimodal retrieval. Overall, Mamba shows promising performance on tasks where the correct output relies on a summary of the image but struggles when retrieval of explicit information from the context is required.

Updated: 2024-10-01 08:29:53

标题: 颠覆VLMs：比较Transformer和结构化状态空间模型在视觉与语言建模中的应用

摘要: 这项研究探讨了在视觉语言模型（VLMs）中用最近展示了在序列建模中表现出色的结构化状态空间模型（SSM）Mamba替代Transformers。我们在受控条件下测试了多达30亿参数的模型，结果显示基于Mamba的VLM在字幕、问答和阅读理解方面优于基于Transformers的VLM。然而，我们发现Transformers在视觉定位方面表现更好，且随着规模的扩大，性能差距也在扩大。我们探讨了两种假设来解释这一现象：1）任务不可知的视觉编码对隐藏状态的更新的影响，和2）从上下文多模式检索的视角进行视觉定位的困难。我们的结果表明，任务感知编码对于定位的性能提升微乎其微，然而，在上下文多模式检索方面，Transformers明显优于Mamba。总体而言，Mamba在依赖于图像摘要的正确输出的任务上表现出色，但在需要从上下文中检索显式信息时表现较差。

更新时间: 2024-10-01 08:29:53

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.05395v2

Individual mapping of large polymorphic shrubs in high mountains using satellite images and deep learning

Monitoring the distribution and size of long-living large shrubs, such as junipers, is crucial for assessing the long-term impacts of global change on high-mountain ecosystems. While deep learning models have shown remarkable success in object segmentation, adapting these models to detect shrub species with polymorphic nature remains challenging. In this research, we release a large dataset of individual shrub delineations on freely available satellite imagery and use an instance segmentation model to map all junipers over the treeline for an entire biosphere reserve (Sierra Nevada, Spain). To optimize performance, we introduced a novel dual data construction approach: using photo-interpreted (PI) data for model development and fieldwork (FW) data for validation. To account for the polymorphic nature of junipers during model evaluation, we developed a soft version of the Intersection over Union metric. Finally, we assessed the uncertainty of the resulting map in terms of canopy cover and density of shrubs per size class. Our model achieved an F1-score in shrub delineation of 87.87% on the PI data and 76.86% on the FW data. The R2 and RMSE of the observed versus predicted relationship were 0.63 and 6.67% for canopy cover, and 0.90 and 20.62 for shrub density. The greater density of larger shrubs in lower altitudes and smaller shrubs in higher altitudes observed in the model outputs was also present in the PI and FW data, suggesting an altitudinal uplift in the optimal performance of the species. This study demonstrates that deep learning applied on freely available high-resolution satellite imagery is useful to detect medium to large shrubs of high ecological value at the regional scale, which could be expanded to other high-mountains worldwide and to historical and forthcoming imagery.

Updated: 2024-10-01 08:25:14

标题: 利用卫星图像和深度学习技术对高山地区大型多态灌木进行个体化映射

摘要: 监测长寿命大灌木的分布和大小，例如刺柏，对评估全球变化对高山生态系统的长期影响至关重要。虽然深度学习模型在目标分割方面取得了显著成功，但将这些模型调整为检测具有多态性的灌木物种仍然具有挑战性。在这项研究中，我们发布了一个大型数据集，其中包含在免费提供的卫星影像上个体灌木的描绘，并使用实例分割模型来绘制西班牙内华达山脉（Sierra Nevada）整个生物圈保护区的树线上所有刺柏。为了优化性能，我们引入了一种新颖的双数据构建方法：使用用于模型开发的照片解译（PI）数据和用于验证的野外调查（FW）数据。为了在模型评估期间考虑刺柏的多态性，我们开发了交并比度量的软版本。最后，我们评估了地图的不确定性，以树冠覆盖率和每个大小类别的灌木密度为指标。我们的模型在PI数据上的灌木描绘F1分数为87.87％，在FW数据上为76.86％。观察与预测关系的R2和RMSE分别为树冠覆盖率0.63和6.67％，灌木密度为0.90和20.62。模型输出中观察到的低海拔地区较大灌木的更高密度和高海拔地区较小灌木的更高密度也存在于PI和FW数据中，表明物种的最佳性能在海拔上升。这项研究表明，在免费提供的高分辨率卫星影像上应用深度学习可用于检测具有高生态价值的中大型灌木在区域尺度上，这可以扩展到全球其他高山地区以及历史和即将到来的影像中。

更新时间: 2024-10-01 08:25:14

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2401.17985v2

Learning Adaptive Hydrodynamic Models Using Neural ODEs in Complex Conditions

Reinforcement learning-based quadruped robots excel across various terrains but still lack the ability to swim in water due to the complex underwater environment. This paper presents the development and evaluation of a data-driven hydrodynamic model for amphibious quadruped robots, aiming to enhance their adaptive capabilities in complex and dynamic underwater environments. The proposed model leverages Neural Ordinary Differential Equations (ODEs) combined with attention mechanisms to accurately process and interpret real-time sensor data. The model enables the quadruped robots to understand and predict complex environmental patterns, facilitating robust decision-making strategies. We harness real-time sensor data, capturing various environmental and internal state parameters to train and evaluate our model. A significant focus of our evaluation involves testing the quadruped robot's performance across different hydrodynamic conditions and assessing its capabilities at varying speeds and fluid dynamic conditions. The outcomes suggest that the model can effectively learn and adapt to varying conditions, enabling the prediction of force states and enhancing autonomous robotic behaviors in various practical scenarios.

Updated: 2024-10-01 08:18:36

标题: 在复杂条件下使用神经ODE学习自适应流体动力学模型

摘要: 基于强化学习的四足机器人在各种地形上表现出色，但由于复杂的水下环境，它们仍然缺乏在水中游泳的能力。本文介绍了一种为两栖四足机器人开发和评估的数据驱动流体动力学模型，旨在增强它们在复杂和动态的水下环境中的适应能力。所提出的模型利用神经常微分方程（ODEs）结合注意机制来准确处理和解释实时传感器数据。该模型使四足机器人能够理解和预测复杂的环境模式，促进稳健的决策策略。我们利用实时传感器数据，捕捉各种环境和内部状态参数，对我们的模型进行训练和评估。我们评估的一个重点是测试四足机器人在不同流体动力学条件下的性能，并评估其在不同速度和流体动态条件下的能力。结果表明，该模型能够有效学习和适应不同条件，使得能够预测力状态并增强在各种实际场景中的自主机器人行为。

更新时间: 2024-10-01 08:18:36

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.00490v1

Securing Voice Authentication Applications Against Targeted Data Poisoning

Deep neural network-based voice authentication systems are promising biometric verification techniques that uniquely identify biological characteristics to verify a user. However, they are particularly susceptible to targeted data poisoning attacks, where attackers replace legitimate users' utterances with their own. We propose an enhanced framework using realworld datasets considering realistic attack scenarios. The results show that the proposed approach is robust, providing accurate authentications even when only a small fraction (5% of the dataset) is poisoned.

Updated: 2024-10-01 08:16:52

标题: 保护语音认证应用免受针对性数据中毒的影响

摘要: 基于深度神经网络的语音认证系统是一种有前途的生物特征验证技术，可以唯一识别生物特征以验证用户身份。然而，它们特别容易受到有针对性的数据毒化攻击的影响，攻击者会用自己的声音替换合法用户的话语。我们提出了一个增强框架，使用真实世界的数据集考虑了现实攻击场景。结果显示，提出的方法是稳健的，即使只有数据集的一小部分（5%）被毒化，也能提供准确的认证。

更新时间: 2024-10-01 08:16:52

领域: cs.CR

下载: http://arxiv.org/abs/2406.17277v2

Stochastic Direct Search Method for Blind Resource Allocation

Motivated by programmatic advertising optimization, we consider the task of sequentially allocating budget across a set of resources. At every time step, a feasible allocation is chosen and only a corresponding random return is observed. The goal is to maximize the cumulative expected sum of returns. This is a realistic model for budget allocation across subdivisions of marketing campaigns, with the objective of maximizing the number of conversions. We study direct search (also known as pattern search) methods for linearly constrained and derivative-free optimization in the presence of noise, which apply in particular to sequential budget allocation. These algorithms, which do not rely on hierarchical partitioning of the resource space, are easy to implement; they respect the operational constraints of resource allocation by avoiding evaluation outside of the feasible domain; and they are also compatible with warm start by being (approximate) descent algorithms. However, they have not yet been analyzed from the perspective of cumulative regret. We show that direct search methods achieves finite regret in the deterministic and unconstrained case. In the presence of evaluation noise and linear constraints, we propose a simple extension of direct search that achieves a regret upper-bound of the order of $T^{2/3}$. We also propose an accelerated version of the algorithm, relying on repeated sequential testing, that significantly improves the practical behavior of the approach.

Updated: 2024-10-01 08:15:02

标题: "盲资源分配的随机直接搜索方法"

摘要: 受程序化广告优化的启发，我们考虑跨资源集合连续分配预算的任务。在每个时间步长，选择一个可行的分配方案，并仅观察到相应的随机回报。目标是最大化累积期望回报总和。这是一个现实的模型，用于跨营销活动细分的预算分配，其目标是最大化转化次数。我们研究了在线约束和无导数优化的直接搜索（也称为模式搜索）方法，在存在噪音的情况下适用于连续预算分配。这些算法不依赖资源空间的分层划分，易于实现；通过避免在可行域之外进行评估，尊重资源分配的操作约束；并且通过是（近似）下降算法，与热启动兼容。然而，它们尚未从累积遗憾的角度进行分析。我们证明直接搜索方法在确定性和无约束情况下实现有限遗憾。在评估噪音和线性约束存在的情况下，我们提出了直接搜索的简单扩展，实现了$T^{2/3}$数量级的遗憾上限。我们还提出了一种加速版本的算法，依赖于重复的顺序测试，显著改进了方法的实际行为。

更新时间: 2024-10-01 08:15:02

领域: cs.AI,math.ST,stat.TH

下载: http://arxiv.org/abs/2210.05222v2

MCGM: Mask Conditional Text-to-Image Generative Model

Recent advancements in generative models have revolutionized the field of artificial intelligence, enabling the creation of highly-realistic and detailed images. In this study, we propose a novel Mask Conditional Text-to-Image Generative Model (MCGM) that leverages the power of conditional diffusion models to generate pictures with specific poses. Our model builds upon the success of the Break-a-scene [1] model in generating new scenes using a single image with multiple subjects and incorporates a mask embedding injection that allows the conditioning of the generation process. By introducing this additional level of control, MCGM offers a flexible and intuitive approach for generating specific poses for one or more subjects learned from a single image, empowering users to influence the output based on their requirements. Through extensive experimentation and evaluation, we demonstrate the effectiveness of our proposed model in generating high-quality images that meet predefined mask conditions and improving the current Break-a-scene generative model.

Updated: 2024-10-01 08:13:47

标题: MCGM：掩码条件文本到图像生成模型

摘要: 最近生成模型的进展已经彻底改变了人工智能领域，使得能够创造高度逼真和详细的图像成为可能。在这项研究中，我们提出了一种新颖的Mask Conditional Text-to-Image 生成模型（MCGM），利用条件扩散模型的能力生成具有特定姿势的图片。我们的模型建立在 Break-a-scene [1] 模型成功生成新场景的基础上，使用单个图像中的多个主体，并引入了一个允许生成过程进行条件化的掩模嵌入注入。通过引入这一额外的控制层，MCGM 提供了一种灵活和直观的方法，用于从单个图像学习生成一个或多个主体的特定姿势，使用户能够根据自己的需求影响输出。通过广泛的实验和评估，我们展示了我们提出的模型在生成符合预定义掩模条件的高质量图像方面的有效性，并改进了当前的 Break-a-scene 生成模型。

更新时间: 2024-10-01 08:13:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.00483v1

Multi-Designated Detector Watermarking for Language Models

In this paper, we initiate the study of \emph{multi-designated detector watermarking (MDDW)} for large language models (LLMs). This technique allows model providers to generate watermarked outputs from LLMs with two key properties: (i) only specific, possibly multiple, designated detectors can identify the watermarks, and (ii) there is no perceptible degradation in the output quality for ordinary users. We formalize the security definitions for MDDW and present a framework for constructing MDDW for any LLM using multi-designated verifier signatures (MDVS). Recognizing the significant economic value of LLM outputs, we introduce claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings. To support claimable MDDW, we propose a generic transformation converting any MDVS to a claimable MDVS. Our implementation of the MDDW scheme highlights its advanced functionalities and flexibility over existing methods, with satisfactory performance metrics.

Updated: 2024-10-01 08:08:42

标题: 多指定检测器水印技术用于语言模型

摘要: 在本文中，我们开始研究大型语言模型（LLMs）的\emph{多指定检测器水印（MDDW）}。该技术允许模型提供者从LLMs生成带水印的输出，具有两个关键属性：（i）只有特定的、可能是多个的指定检测器能够识别水印，（ii）对于普通用户来说，输出质量没有可察觉的降级。我们为MDDW形式化了安全定义，并提出了使用多指定验证者签名（MDVS）构建任何LLM的MDDW的框架。鉴于LLM输出的重要经济价值，我们引入了claimability作为MDDW的可选安全特性，使模型提供者能够在指定检测器设置中主张对LLM输出的所有权。为了支持可主张的MDDW，我们提出了一种通用转换，将任何MDVS转换为可主张的MDVS。我们的MDDW方案实现突显了其对现有方法的先进功能和灵活性，具有令人满意的性能指标。

更新时间: 2024-10-01 08:08:42

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2409.17518v2

Removing the need for ground truth UWB data collection: self-supervised ranging error correction using deep reinforcement learning

Indoor positioning using UWB technology has gained interest due to its centimeter-level accuracy potential. However, multipath effects and non-line-of-sight conditions cause ranging errors between anchors and tags. Existing approaches for mitigating these ranging errors rely on collecting large labeled datasets, making them impractical for real-world deployments. This paper proposes a novel self-supervised deep reinforcement learning approach that does not require labeled ground truth data. A reinforcement learning agent uses the channel impulse response as a state and predicts corrections to minimize the error between corrected and estimated ranges. The agent learns, self-supervised, by iteratively improving corrections that are generated by combining the predictability of trajectories with filtering and smoothening. Experiments on real-world UWB measurements demonstrate comparable performance to state-of-the-art supervised methods, overcoming data dependency and lack of generalizability limitations. This makes self-supervised deep reinforcement learning a promising solution for practical and scalable UWB-ranging error correction.

Updated: 2024-10-01 08:05:23

标题: 摘要：利用深度强化学习进行自监督测距误差校正，消除超宽带数据采集中对地面真实数据的需求

摘要: 利用UWB技术进行室内定位因其具有厘米级精度潜力而备受关注。然而，多径效应和非视距条件导致锚点和标签之间的测距误差。现有的减轻这些测距误差的方法依赖于收集大量的标记数据集，使它们在实际部署中不切实际。本文提出了一种新颖的自监督深度强化学习方法，不需要标记的地面真实数据。一个强化学习代理使用通道脉冲响应作为状态，并预测校正以最小化校正和估计范围之间的误差。该代理通过结合轨迹的可预测性与过滤和平滑生成的校正来迭代改进学习，自监督学习。在真实世界的UWB测量实验中，表明与最先进的监督方法性能相当，克服了数据依赖性和缺乏普适性的限制。这使得自监督深度强化学习成为可行和可扩展的UWB测距误差校正的有希望的解决方案。

更新时间: 2024-10-01 08:05:23

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2403.19262v2

A POD-TANN approach for the multiscale modeling of materials and macroelement derivation in geomechanics

This paper introduces a novel approach that combines Proper Orthogonal Decomposition (POD) with Thermodynamics-based Artificial Neural Networks (TANN) to capture the macroscopic behavior of complex inelastic systems and derive macroelements in geomechanics. The methodology leverages POD to extract macroscopic Internal State Variables from microscopic state information, thereby enriching the macroscopic state description used to train an energy potential network within the TANN framework. The thermodynamic consistency provided by TANN, combined with the hierarchical nature of POD, allows to reproduce complex, non-linear inelastic material behaviors as well as macroscopic geomechanical systems responses. The approach is validated through applications of increasing complexity, demonstrating its capability to reproduce high-fidelity simulation data. The applications proposed include the homogenization of continuous inelastic representative unit cells and the derivation of a macroelement for a geotechnical system involving a monopile in a clay layer subjected to horizontal loading. Eventually, the projection operators directly obtained via POD, are exploit to easily reconstruct the microscopic fields. The results indicate that the POD-TANN approach not only offers accuracy in reproducing the studied constitutive responses, but also reduces computational costs, making it a practical tool for the multiscale modeling of heterogeneous inelastic geomechanical systems.

Updated: 2024-10-01 07:52:54

标题: 一种用于材料多尺度建模和地质力学中宏观元素推导的POD-TANN方法

摘要: 本文介绍了一种结合Proper Orthogonal Decomposition（POD）与基于热力学的人工神经网络（TANN）的新方法，用于捕捉复杂非弹性系统的宏观行为并推导地质力学中的宏观元素。该方法利用POD从微观状态信息中提取宏观内部状态变量，从而丰富了用于训练TANN框架内能量势网络的宏观状态描述。TANN提供的热力学一致性，结合POD的分层性质，允许重现复杂、非线性的非弹性材料行为以及宏观地质力学系统的响应。该方法通过应用逐渐增加复杂性来进行验证，证明了其能够复现高保真度的仿真数据。提出的应用包括连续非弹性代表性单元的均质化以及对一个受水平载荷的粘土层中单桩的地质系统推导宏观元素。最终，通过POD直接获取的投影算子被利用来轻松重建微观场。结果表明，POD-TANN方法不仅在复现研究的本构响应方面提供了准确性，而且降低了计算成本，使其成为多尺度建模异质非弹性地质力学系统的实用工具。

更新时间: 2024-10-01 07:52:54

领域: cs.LG

下载: http://arxiv.org/abs/2408.07165v3

Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving

Semantic segmentation is a significant perception task in autonomous driving. It suffers from the risks of adversarial examples. In the past few years, deep learning has gradually transitioned from convolutional neural network (CNN) models with a relatively small number of parameters to foundation models with a huge number of parameters. The segment-anything model (SAM) is a generalized image segmentation framework that is capable of handling various types of images and is able to recognize and segment arbitrary objects in an image without the need to train on a specific object. It is a unified model that can handle diverse downstream tasks, including semantic segmentation, object detection, and tracking. In the task of semantic segmentation for autonomous driving, it is significant to study the zero-shot adversarial robustness of SAM. Therefore, we deliver a systematic empirical study on the robustness of SAM without additional training. Based on the experimental results, the zero-shot adversarial robustness of the SAM under the black-box corruptions and white-box adversarial attacks is acceptable, even without the need for additional training. The finding of this study is insightful in that the gigantic model parameters and huge amounts of training data lead to the phenomenon of emergence, which builds a guarantee of adversarial robustness. SAM is a vision foundation model that can be regarded as an early prototype of an artificial general intelligence (AGI) pipeline. In such a pipeline, a unified model can handle diverse tasks. Therefore, this research not only inspects the impact of vision foundation models on safe autonomous driving but also provides a perspective on developing trustworthy AGI. The code is available at: https://github.com/momo1986/robust_sam_iv.

Updated: 2024-10-01 07:50:41

标题: Segment-Anything模型在自动驾驶中实现零-shot鲁棒性

摘要: 语义分割是自动驾驶中的重要感知任务。它受到对抗性示例风险的困扰。在过去几年中，深度学习逐渐从具有相对较少参数的卷积神经网络（CNN）模型过渡到具有大量参数的基础模型。段-任何模型（SAM）是一个通用的图像分割框架，能够处理各种类型的图像，并能够识别和分割图像中的任意对象，而无需在特定对象上进行训练。它是一个统一模型，可以处理多样的下游任务，包括语义分割、物体检测和跟踪。在自动驾驶的语义分割任务中，研究SAM的零样本对抗鲁棒性是非常重要的。因此，我们对SAM的鲁棒性进行了系统的实证研究，而无需额外的训练。根据实验结果，在黑盒污染和白盒对抗攻击下，SAM的零样本对抗鲁棒性是可接受的，甚至无需额外训练。本研究的发现具有启发性，即巨大的模型参数和庞大的训练数据导致出现现象，从而构建了对抗鲁棒性的保证。SAM是一个视觉基础模型，可以被视为人工通用智能（AGI）管道的早期原型。在这样的管道中，一个统一模型可以处理多样的任务。因此，这项研究不仅检验了视觉基础模型对安全自动驾驶的影响，还提供了发展可信AGI的视角。代码可在以下链接找到：https://github.com/momo1986/robust_sam_iv。

更新时间: 2024-10-01 07:50:41

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2408.09839v2

A TextGCN-Based Decoding Approach for Improving Remote Sensing Image Captioning

Remote sensing images are highly valued for their ability to address complex real-world issues such as risk management, security, and meteorology. However, manually captioning these images is challenging and requires specialized knowledge across various domains. This letter presents an approach for automatically describing (captioning) remote sensing images. We propose a novel encoder-decoder setup that deploys a Text Graph Convolutional Network (TextGCN) and multi-layer LSTMs. The embeddings generated by TextGCN enhance the decoder's understanding by capturing the semantic relationships among words at both the sentence and corpus levels. Furthermore, we advance our approach with a comparison-based beam search method to ensure fairness in the search strategy for generating the final caption. We present an extensive evaluation of our approach against various other state-of-the-art encoder-decoder frameworks. We evaluated our method across three datasets using seven metrics: BLEU-1 to BLEU-4, METEOR, ROUGE-L, and CIDEr. The results demonstrate that our approach significantly outperforms other state-of-the-art encoder-decoder methods.

Updated: 2024-10-01 07:46:04

标题: 一种基于TextGCN的解码方法用于改进遥感图像字幕生成

摘要: 遥感图像因其能够解决复杂的现实世界问题，如风险管理、安全和气象学，而备受重视。然而，手动为这些图像加上标题是具有挑战性的，需要跨越各个领域的专业知识。本文介绍了一种自动描述（加标题）遥感图像的方法。我们提出了一种新颖的编码器-解码器设置，其中使用了文本图卷积网络（TextGCN）和多层LSTM。由TextGCN生成的嵌入通过捕捉句子和语料库级别的单词之间的语义关系，增强了解码器的理解能力。此外，我们通过一种基于比较的波束搜索方法来进一步推进我们的方法，以确保在生成最终标题的搜索策略中公平性。我们通过与其他各种最先进的编码器-解码器框架进行广泛评估来展示我们的方法。我们使用BLEU-1到BLEU-4、METEOR、ROUGE-L和CIDEr等七个指标在三个数据集上评估了我们的方法。结果表明，我们的方法明显优于其他最先进的编码器-解码器方法。

更新时间: 2024-10-01 07:46:04

领域: cs.LG

下载: http://arxiv.org/abs/2409.18467v2

Transductive Active Learning: Theory and Applications

We generalize active learning to address real-world settings with concrete prediction targets where sampling is restricted to an accessible region of the domain, while prediction targets may lie outside this region. We analyze a family of decision rules that sample adaptively to minimize uncertainty about prediction targets. We are the first to show, under general regularity assumptions, that such decision rules converge uniformly to the smallest possible uncertainty obtainable from the accessible data. We demonstrate their strong sample efficiency in two key applications: Active few-shot fine-tuning of large neural networks and safe Bayesian optimization, where they improve significantly upon the state-of-the-art.

Updated: 2024-10-01 07:45:38

标题: 跨领域主动学习：理论与应用

摘要: 我们将主动学习推广到解决具体预测目标的实际情境，其中抽样受限于领域的可访问区域，而预测目标可能位于该区域之外。我们分析了一类决策规则，这些规则根据最小化关于预测目标的不确定性来自适应地进行抽样。我们首次在一般正则性假设下展示，这些决策规则会一致地收敛到从可访问数据中获得的最小可能不确定性。我们展示了它们在两个关键应用中的强大样本效率：大型神经网络的主动少样本微调和安全贝叶斯优化，在这些应用中它们明显优于目前的最新技术。

更新时间: 2024-10-01 07:45:38

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.15898v4

Model-based Preference Optimization in Abstractive Summarization without Human Feedback

In abstractive summarization, the challenge of producing concise and accurate summaries arises from the vast amount of information contained in the source document. Consequently, although Large Language Models (LLMs) can generate fluent text, they often introduce inaccuracies by hallucinating content not found in the original source. While supervised fine-tuning methods that maximize likelihood contribute to this issue, they do not consistently enhance the faithfulness of the summaries. Preference-based optimization methods, such as Direct Preference Optimization (DPO), can further refine the model to align with human preferences. However, these methods still heavily depend on costly human feedback. In this work, we introduce a novel and straightforward approach called Model-based Preference Optimization (MPO) to fine-tune LLMs for improved summarization abilities without any human feedback. By leveraging the model's inherent summarization capabilities, we create a preference dataset that is fully generated by the model using different decoding strategies. Our experiments on standard summarization datasets and various metrics demonstrate that our proposed MPO significantly enhances the quality of generated summaries without relying on human feedback.

Updated: 2024-10-01 07:29:06

标题: 基于模型的偏好优化在抽象总结中无需人类反馈

摘要: 在抽象总结中，产生简洁准确摘要的挑战源自源文档中包含的大量信息。因此，尽管大型语言模型（LLMs）可以生成流畅的文本，但它们经常通过产生原始来源中未发现的内容来引入不准确性。虽然最大化可能性的监督微调方法对这个问题有所帮助，但它们并不始终增强摘要的忠实度。基于偏好的优化方法，如直接偏好优化（DPO），可以进一步改进模型以与人类偏好保持一致。但是，这些方法仍然严重依赖于昂贵的人类反馈。在这项工作中，我们引入了一种新颖且简单的方法，称为基于模型的偏好优化（MPO），以微调LLMs以提高摘要能力，而无需任何人类反馈。通过利用模型固有的摘要能力，我们创建了一个偏好数据集，该数据集完全由模型使用不同的解码策略生成。我们在标准摘要数据集和各种指标上的实验表明，我们提出的MPO显著提高了生成摘要的质量，而不依赖于人类反馈。

更新时间: 2024-10-01 07:29:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.18618v2

Generalized Learning of Coefficients in Spectral Graph Convolutional Networks

Spectral Graph Convolutional Networks (GCNs) have gained popularity in graph machine learning applications due, in part, to their flexibility in specification of network propagation rules. These propagation rules are often constructed as polynomial filters whose coefficients are learned using label information during training. In contrast to learned polynomial filters, explicit filter functions are useful in capturing relationships between network topology and distribution of labels across the network. A number of algorithms incorporating either approach have been proposed; however the relationship between filter functions and polynomial approximations is not fully resolved. This is largely due to the ill-conditioned nature of the linear systems that must be solved to derive polynomial approximations of filter functions. To address this challenge, we propose a novel Arnoldi orthonormalization-based algorithm, along with a unifying approach, called G-Arnoldi-GCN that can efficiently and effectively approximate a given filter function with a polynomial. We evaluate G-Arnoldi-GCN in the context of multi-class node classification across ten datasets with diverse topological characteristics. Our experiments show that G-Arnoldi-GCN consistently outperforms state-of-the-art methods when suitable filter functions are employed. Overall, G-Arnoldi-GCN opens important new directions in graph machine learning by enabling the explicit design and application of diverse filter functions. Code link: https://github.com/mustafaCoskunAgu/GArnoldi-GCN

Updated: 2024-10-01 07:28:39

标题: 谱图卷积网络中系数的广义学习

摘要: 谱图卷积网络（GCNs）在图机器学习应用中变得越来越受欢迎，部分原因是它们在指定网络传播规则方面的灵活性。这些传播规则通常被构建为多项式滤波器，其系数在训练过程中使用标签信息进行学习。与学习的多项式滤波器相比，显式滤波函数对捕捉网络拓扑和标签在网络中的分布之间的关系很有用。已经提出了许多结合这两种方法的算法；然而，滤波函数和多项式逼近之间的关系尚未完全解决。这在很大程度上是由于线性系统的奇异性质，必须通过解决这些系统来推导滤波函数的多项式逼近。为了解决这一挑战，我们提出了一种基于Arnoldi正交化的新算法，以及一种统一的方法，称为G-Arnoldi-GCN，可以高效有效地近似给定的滤波函数为多项式。我们在多类节点分类的背景下评估了G-Arnoldi-GCN，在具有不同拓扑特征的十个数据集上进行了实验。我们的实验表明，当适当的滤波函数被使用时，G-Arnoldi-GCN始终优于最先进的方法。总的来说，G-Arnoldi-GCN通过实现多样化滤波函数的显式设计和应用，为图机器学习开辟了重要的新方向。代码链接：https://github.com/mustafaCoskunAgu/GArnoldi-GCN

更新时间: 2024-10-01 07:28:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.04813v2

A Scheduling-Aware Defense Against Prefetching-Based Side-Channel Attacks

Modern computer processors use microarchitectural optimization mechanisms to improve performance. As a downside, such optimizations are prone to introducing side-channel vulnerabilities. Speculative loading of memory, called prefetching, is common in real-world CPUs and may cause such side-channel vulnerabilities: Prior work has shown that it can be exploited to bypass process isolation and leak secrets, such as keys used in RSA, AES, and ECDH implementations. However, to this date, no effective and efficient countermeasure has been presented that secures software on systems with affected prefetchers. In this work, we answer the question: How can a process defend against prefetch-based side channels? We first systematize prefetching-based side-channel vulnerabilities presented in academic literature so far. Next, we design and implement PreFence, a scheduling-aware defense against these side channels that allows processes to disable the prefetcher temporarily during security-critical operations. We implement our countermeasure for an x86_64 and an ARM processor; it can be adapted to any platform that allows to disable the prefetcher. We evaluate our defense and find that our solution reliably stops prefetch leakage. Our countermeasure causes negligible performance impact while no security-relevant code is executed, and its worst case performance is comparable to completely turning off the prefetcher. The expected average performance impact depends on the security-relevant code in the application and can be negligible as we demonstrate with a simple web server application. We expect our countermeasure could widely be integrated in commodity OS, and even be extended to signal generally security-relevant code to the kernel to allow coordinated application of countermeasures.

Updated: 2024-10-01 07:12:23

标题: 一种针对基于预取的侧信道攻击的调度感知防御措施

摘要: 现代计算机处理器使用微架构优化机制来提高性能。然而，这种优化容易引入侧信道漏洞。内存的预取，也称为预取，是现实世界中的CPU中常见的操作，可能会导致这种侧信道漏洞：先前的研究表明，可以利用它来绕过进程隔离并泄漏诸如RSA、AES和ECDH实现中使用的密钥等机密信息。然而，到目前为止，尚未提出有效和高效的对策来保护受影响的预取器系统上的软件。在这项工作中，我们回答了一个问题：进程如何抵御基于预取的侧信道？我们首先系统化了学术文献中迄今为止提出的基于预取的侧信道漏洞。接下来，我们设计并实现了PreFence，这是一种针对这些侧信道的调度感知防御措施，允许进程在安全关键操作期间暂时禁用预取器。我们为x86_64和ARM处理器实现了我们的对策；它可以适应任何允许禁用预取器的平台。我们评估了我们的防御措施，并发现我们的解决方案可靠地阻止了预取泄漏。我们的对策在不执行安全相关代码时几乎没有性能影响，其最坏情况性能与完全关闭预取器相当。预期的平均性能影响取决于应用程序中的安全相关代码，并且可以忽略不计，正如我们在一个简单的Web服务器应用程序中演示的那样。我们期望我们的对策可以广泛集成到普通操作系统中，甚至可以扩展到向内核发出信号以允许协调应用对策的普遍安全相关代码。

更新时间: 2024-10-01 07:12:23

领域: cs.CR

下载: http://arxiv.org/abs/2410.00452v1

UniSumEval: Towards Unified, Fine-Grained, Multi-Dimensional Summarization Evaluation for LLMs

Existing benchmarks for summarization quality evaluation often lack diverse input scenarios, focus on narrowly defined dimensions (e.g., faithfulness), and struggle with subjective and coarse-grained annotation schemes. To address these shortcomings, we create UniSumEval benchmark, which extends the range of input context (e.g., domain, length) and provides fine-grained, multi-dimensional annotations. We use AI assistance in data creation, identifying potentially hallucinogenic input texts, and also helping human annotators reduce the difficulty of fine-grained annotation tasks. With UniSumEval, we benchmark nine latest language models as summarizers, offering insights into their performance across varying input contexts and evaluation dimensions. Furthermore, we conduct a thorough comparison of SOTA automated summary evaluators. Our benchmark data will be available at https://github.com/DISL-Lab/UniSumEval-v1.0.

Updated: 2024-10-01 07:11:44

标题: UniSumEval: 为LLMs实现统一、细粒度、多维度摘要评估

摘要: 现有的用于总结质量评估的基准往往缺乏多样化的输入场景，侧重于狭义定义的维度（例如，忠实度），并且难以处理主观和粗粒度的注释方案。为了解决这些缺点，我们创建了UniSumEval基准，扩展了输入上下文的范围（例如，领域，长度），并提供了细粒度的、多维度的注释。我们在数据创建中使用人工智能辅助，识别潜在的臆想性输入文本，并帮助人类注释者减少细粒度注释任务的难度。通过UniSumEval，我们将九个最新的语言模型作为总结器进行基准测试，提供它们在不同输入上下文和评估维度下的表现见解。此外，我们进行了对SOTA自动摘要评估器的彻底比较。我们的基准数据将在https://github.com/DISL-Lab/UniSumEval-v1.0上提供。

更新时间: 2024-10-01 07:11:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.19898v2

Recursive deep learning framework for forecasting the decadal world economic outlook

The gross domestic product (GDP) is the most widely used indicator in macroeconomics and the main tool for measuring a country's economic output. Due to the diversity and complexity of the world economy, a wide range of models have been used, but there are challenges in making decadal GDP forecasts given unexpected changes such as emergence of catastrophic world events including pandemics and wars. Deep learning models are well suited for modelling temporal sequences and time series forecasting. In this paper, we develop a deep learning framework to forecast the GDP growth rate of the world economy over a decade. We use the Penn World Table as the data source featuring 13 countries prior to the COVID-19 pandemic, such as Australia, China, India, and the United States. We present a recursive deep learning framework to predict the GDP growth rate in the next ten years. We test prominent deep learning models and compare their results with traditional econometric models for selected developed and developing countries. Our decadal forecasts reveal that that most of the developed countries would experience economic growth slowdown, stagnation and even recession within five years (2020-2024). Furthermore, our model forecasts show that only China, France, and India would experience stable GDP growth.

Updated: 2024-10-01 07:10:39

标题: 递归深度学习框架用于预测十年世界经济展望

摘要: 国内生产总值（GDP）是宏观经济学中最广泛使用的指标，也是衡量一个国家经济产出的主要工具。由于世界经济的多样性和复杂性，使用了各种模型，但在进行十年GDP预测时存在挑战，因为出现了意想不到的变化，如灾难性的世界事件，包括大流行和战争。深度学习模型非常适合对时间序列和时间序列进行建模。在本文中，我们开发了一个深度学习框架，用于预测未来十年世界经济的GDP增长率。我们使用宾夕法尼亚世界表作为数据来源，涵盖了COVID-19大流行前的13个国家，如澳大利亚、中国、印度和美国。我们提出了一个递归深度学习框架，预测未来十年的GDP增长率。我们测试了知名的深度学习模型，并将它们的结果与传统的计量经济模型进行比较，针对选定的发达国家和发展中国家。我们的十年预测显示，大多数发达国家在未来五年（2020-2024）将经历经济增长放缓、停滞甚至衰退。此外，我们的模型预测显示，只有中国、法国和印度将经历稳定的GDP增长。

更新时间: 2024-10-01 07:10:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2301.10874v2

Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm

Parameter-efficient fine-tuning (PEFT) has become a key training strategy for large language models. However, its reliance on fewer trainable parameters poses security risks, such as task-agnostic backdoors. Despite their severe impact on a wide range of tasks, there is no practical defense solution available that effectively counters task-agnostic backdoors within the context of PEFT. In this study, we introduce Obliviate, a PEFT-integrable backdoor defense. We develop two techniques aimed at amplifying benign neurons within PEFT layers and penalizing the influence of trigger tokens. Our evaluations across three major PEFT architectures show that our method can significantly reduce the attack success rate of the state-of-the-art task-agnostic backdoors (83.6%$\downarrow$). Furthermore, our method exhibits robust defense capabilities against both task-specific backdoors and adaptive attacks. Source code will be obtained at https://github.com/obliviateARR/Obliviate.

Updated: 2024-10-01 07:10:02

标题: Obliviate: 在参数高效微调范式中中和与任务无关的后门

摘要: 参数高效微调（PEFT）已经成为大型语言模型的关键训练策略。然而，它对较少可训练参数的依赖存在安全风险，例如任务不可知的后门。尽管后门对各种任务都造成严重影响，但目前还没有实际有效的防御解决方案可以有效地对抗PEFT背景下的任务不可知后门。在这项研究中，我们介绍了一种名为Obliviate的PEFT可整合后门防御方法。我们开发了两种技术，旨在增强PEFT层内的良性神经元，并惩罚触发令牌的影响。我们在三种主要的PEFT架构上进行评估，结果显示我们的方法可以显著降低最先进的任务不可知后门的攻击成功率（83.6%下降）。此外，我们的方法表现出对任务特定后门和自适应攻击的强大防御能力。源代码可在https://github.com/obliviateARR/Obliviate获取。

更新时间: 2024-10-01 07:10:02

领域: cs.CL,cs.AI,cs.CR,cs.LG

下载: http://arxiv.org/abs/2409.14119v2

Unsupervised Concept Drift Detection based on Parallel Activations of Neural Network

Practical applications of artificial intelligence increasingly often have to deal with the streaming properties of real data, which, considering the time factor, are subject to phenomena such as periodicity and more or less chaotic degeneration - resulting directly in the concept drifts. The modern concept drift detectors almost always assume immediate access to labels, which due to their cost, limited availability and possible delay has been shown to be unrealistic. This work proposes an unsupervised Parallel Activations Drift Detector, utilizing the outputs of an untrained neural network, presenting its key design elements, intuitions about processing properties, and a pool of computer experiments demonstrating its competitiveness with state-of-the-art methods.

Updated: 2024-10-01 07:04:55

标题: 基于神经网络并行激活的无监督概念漂移检测

摘要: 人工智能的实际应用越来越频繁地需要处理真实数据的流式特性，考虑到时间因素，这些数据受到周期性和更或多或少混乱退化等现象的影响，直接导致概念漂移。现代概念漂移检测器几乎总是假设可以立即访问标签，但由于其成本、有限的可用性和可能的延迟，这被证明是不现实的。本研究提出了一种无监督的并行激活漂移检测器，利用未经训练的神经网络的输出，介绍了其关键设计元素、处理属性的直觉，以及一系列计算机实验，展示了它与最先进方法的竞争力。

更新时间: 2024-10-01 07:04:55

领域: cs.LG

下载: http://arxiv.org/abs/2404.07776v2

Identifiable Shared Component Analysis of Unpaired Multimodal Mixtures

A core task in multi-modal learning is to integrate information from multiple feature spaces (e.g., text and audio), offering modality-invariant essential representations of data. Recent research showed that, classical tools such as {\it canonical correlation analysis} (CCA) provably identify the shared components up to minor ambiguities, when samples in each modality are generated from a linear mixture of shared and private components. Such identifiability results were obtained under the condition that the cross-modality samples are aligned/paired according to their shared information. This work takes a step further, investigating shared component identifiability from multi-modal linear mixtures where cross-modality samples are unaligned. A distribution divergence minimization-based loss is proposed, under which a suite of sufficient conditions ensuring identifiability of the shared components are derived. Our conditions are based on cross-modality distribution discrepancy characterization and density-preserving transform removal, which are much milder than existing studies relying on independent component analysis. More relaxed conditions are also provided via adding reasonable structural constraints, motivated by available side information in various applications. The identifiability claims are thoroughly validated using synthetic and real-world data.

Updated: 2024-10-01 07:04:04

标题: 无对应翻译，可以尝试：无配对多模态混合物的可识别共享成分分析

摘要: 多模态学习中的核心任务是整合来自多个特征空间的信息（例如文本和音频），提供数据的模态不变的基本表示。最近的研究表明，经典工具如典型相关分析（CCA）可以确定共享组件，尽管存在一些小的模糊性，当每个模态中的样本是从共享和私有组件的线性混合生成时。这种可识别性的结果是在交叉模态样本根据它们的共享信息对齐/配对的条件下获得的。这项工作更进一步，研究了跨模态样本未对齐的多模态线性混合中的共享组件可识别性。提出了基于分布差异最小化的损失，根据这个损失提出了一系列确保共享组件可识别性的充分条件。我们的条件基于跨模态分布差异的表征和保密变换的移除，这比现有依赖于独立成分分析的研究要温和得多。通过添加合理的结构约束，还提供了更宽松的条件，这些约束受到各种应用中可用的侧面信息的启发。通过使用合成和真实世界数据对可识别性要求进行了彻底验证。

更新时间: 2024-10-01 07:04:04

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2409.19422v2

Redefining Data Pairing for Motion Retargeting Leveraging a Human Body Prior

We propose MR HuBo(Motion Retargeting leveraging a HUman BOdy prior), a cost-effective and convenient method to collect high-quality upper body paired <robot, human> pose data, which is essential for data-driven motion retargeting methods. Unlike existing approaches which collect <robot, human> pose data by converting human MoCap poses into robot poses, our method goes in reverse. We first sample diverse random robot poses, and then convert them into human poses. However, since random robot poses can result in extreme and infeasible human poses, we propose an additional technique to sort out extreme poses by exploiting a human body prior trained from a large amount of human pose data. Our data collection method can be used for any humanoid robots, if one designs or optimizes the system's hyperparameters which include a size scale factor and the joint angle ranges for sampling. In addition to this data collection method, we also present a two-stage motion retargeting neural network that can be trained via supervised learning on a large amount of paired data. Compared to other learning-based methods trained via unsupervised learning, we found that our deep neural network trained with ample high-quality paired data achieved notable performance. Our experiments also show that our data filtering method yields better retargeting results than training the model with raw and noisy data. Our code and video results are available on https://sites.google.com/view/mr-hubo/

Updated: 2024-10-01 06:42:29

标题: 重新定义数据配对以利用人体先验的动作重新定位

摘要: 我们提出了MR HuBo（利用人体先验知识进行动作重定向）的方法，这是一种经济高效和方便的方法，用于收集高质量的上半身配对<机器人，人类>姿势数据，这对于基于数据的动作重定向方法至关重要。与现有方法不同，它们通过将人体MoCap姿势转换为机器人姿势来收集<机器人，人类>姿势数据，我们的方法则相反。我们首先采样多样化的随机机器人姿势，然后将它们转换为人类姿势。然而，由于随机机器人姿势可能导致极端和不可行的人类姿势，我们提出了一种额外的技术，通过利用从大量人体姿势数据训练出的人体先验知识来筛选极端姿势。我们的数据收集方法可用于任何人形机器人，如果设计或优化系统的超参数，包括尺度因子和用于采样的关节角范围。除了这种数据收集方法，我们还提出了一个两阶段的动作重定向神经网络，可以通过在大量配对数据上进行监督学习来训练。与通过无监督学习训练的其他学习方法相比，我们发现，通过充足高质量的配对数据训练的深度神经网络表现出显著的性能。我们的实验还表明，我们的数据过滤方法比使用原始和嘈杂数据训练模型产生更好的重定向结果。我们的代码和视频结果可在https://sites.google.com/view/mr-hubo/上找到。

更新时间: 2024-10-01 06:42:29

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.13208v3

ReXplain: Translating Radiology into Patient-Friendly Video Reports

Radiology reports often remain incomprehensible to patients, undermining patient-centered care. We present ReXplain (Radiology eXplanation), an innovative AI-driven system that generates patient-friendly video reports for radiology findings. ReXplain uniquely integrates a large language model for text simplification, an image segmentation model for anatomical region identification, and an avatar generation tool, producing comprehensive explanations with plain language, highlighted imagery, and 3D organ renderings. Our proof-of-concept study with five board-certified radiologists indicates that ReXplain could accurately deliver radiological information and effectively simulate one-on-one consultations. This work demonstrates a new paradigm in AI-assisted medical communication, potentially improving patient engagement and satisfaction in radiology care, and opens new avenues for research in multimodal medical communication.

Updated: 2024-10-01 06:41:18

标题: ReXplain：将放射学翻译成病人友好的视频报告

摘要: 放射学报告通常对患者来说难以理解，削弱了以患者为中心的护理。我们提出了ReXplain（Radiology eXplanation），这是一个创新的人工智能驱动系统，用于生成放射学检查结果的患者友好视频报告。ReXplain独特地整合了一个大型语言模型用于文本简化，一个图像分割模型用于解剖区域识别，以及一个化身生成工具，生成具有简单语言、突出图像和三维器官渲染的全面解释。我们与五名获得认证的放射科医师进行的概念验证研究表明，ReXplain能够准确传递放射学信息，并有效模拟一对一咨询。这项工作展示了AI辅助医疗沟通中的新范式，可能提高放射学护理中患者参与和满意度，并为多模态医疗沟通研究开辟了新途径。

更新时间: 2024-10-01 06:41:18

领域: cs.AI,eess.IV

下载: http://arxiv.org/abs/2410.00441v1

Identifying Knowledge Editing Types in Large Language Models

Knowledge editing has emerged as an efficient technology for updating the knowledge of large language models (LLMs), attracting increasing attention in recent years. However, there is a lack of effective measures to prevent the malicious misuse of this technology, which could lead to harmful edits in LLMs. These malicious modifications could cause LLMs to generate toxic content, misleading users into inappropriate actions. In front of this risk, we introduce a new task, Knowledge Editing Type Identification (KETI), aimed at identifying different types of edits in LLMs, thereby providing timely alerts to users when encountering illicit edits. As part of this task, we propose KETIBench, which includes five types of harmful edits covering most popular toxic types, as well as one benign factual edit. We develop four classical classification models and three BERT-based models as baseline identifiers for both open-source and closed-source LLMs. Our experimental results, across 42 trials involving two models and three knowledge editing methods, demonstrate that all seven baseline identifiers achieve decent identification performance, highlighting the feasibility of identifying malicious edits in LLMs. Additional analyses reveal that the performance of the identifiers is independent of the reliability of the knowledge editing methods and exhibits cross-domain generalization, enabling the identification of edits from unknown sources. All data and code are available in https://github.com/xpq-tech/KETI. Warning: This paper contains examples of toxic text.

Updated: 2024-10-01 06:35:24

标题: 在大型语言模型中识别知识编辑类型

摘要: 知识编辑已经成为更新大型语言模型（LLMs）知识的有效技术，近年来吸引了越来越多的关注。然而，目前缺乏有效措施来防止这项技术的恶意滥用，这可能导致LLMs中的有害编辑。这些恶意修改可能导致LLMs生成有毒内容，误导用户采取不当行为。面对这一风险，我们引入了一个新的任务，即知识编辑类型识别（KETI），旨在识别LLMs中不同类型的编辑，从而在遇到非法编辑时向用户提供及时警报。作为这一任务的一部分，我们提出了KETIBench，其中包括五种涵盖大多数流行有害类型的有害编辑类型，以及一种良性事实编辑。我们为开源和闭源LLMs开发了四种经典分类模型和三种基于BERT的模型作为基线标识符。我们的实验结果，涉及两种模型和三种知识编辑方法的42次试验，表明所有七个基线标识符均取得了不错的识别性能，突出了在LLMs中识别恶意编辑的可行性。额外的分析显示，标识符的性能与知识编辑方法的可靠性无关，并展示了跨领域的泛化能力，使其能够识别来自未知来源的编辑。所有数据和代码均可在https://github.com/xpq-tech/KETI 上获得。警告：本文包含有害文本示例。

更新时间: 2024-10-01 06:35:24

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2409.19663v2

Towards Democratization of Subspeciality Medical Expertise

The scarcity of subspecialist medical expertise, particularly in rare, complex and life-threatening diseases, poses a significant challenge for healthcare delivery. This issue is particularly acute in cardiology where timely, accurate management determines outcomes. We explored the potential of AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)-based experimental AI system optimized for diagnostic dialogue, to potentially augment and support clinical decision-making in this challenging context. We curated a real-world dataset of 204 complex cases from a subspecialist cardiology practice, including results for electrocardiograms, echocardiograms, cardiac MRI, genetic tests, and cardiopulmonary stress tests. We developed a ten-domain evaluation rubric used by subspecialists to evaluate the quality of diagnosis and clinical management plans produced by general cardiologists or AMIE, the latter enhanced with web-search and self-critique capabilities. AMIE was rated superior to general cardiologists for 5 of the 10 domains (with preference ranging from 9% to 20%), and equivalent for the rest. Access to AMIE's response improved cardiologists' overall response quality in 63.7% of cases while lowering quality in just 3.4%. Cardiologists' responses with access to AMIE were superior to cardiologist responses without access to AMIE for all 10 domains. Qualitative examinations suggest AMIE and general cardiologist could complement each other, with AMIE thorough and sensitive, while general cardiologist concise and specific. Overall, our results suggest that specialized medical LLMs have the potential to augment general cardiologists' capabilities by bridging gaps in subspecialty expertise, though further research and validation are essential for wide clinical utility.

Updated: 2024-10-01 06:34:31

标题: 走向亚专业医学专长的民主化

摘要: 稀缺的专科医学专业知识，特别是在罕见、复杂和危及生命的疾病中，给卫生保健提供带来了重大挑战。这个问题在心脏病学领域尤为严重，及时准确的管理决定了结果。我们探索了AMIE（Articulate Medical Intelligence Explorer）的潜力，这是一个基于大型语言模型（LLM）的实验性人工智能系统，针对诊断对话进行了优化，以潜在地增强和支持这一具有挑战性的临床决策环境。我们从专科心脏病学实践中策划了一个包含204个复杂病例的真实数据集，其中包括心电图、超声心动图、心脏核磁共振、基因检测和心肺压力测试的结果。我们制定了一个由专科医生使用的十个领域评估标准，用于评估一般心脏病学家或AMIE产生的诊断和临床管理计划的质量，后者具有网络搜索和自我批评能力。在10个领域中，AMIE在5个领域上被评为优于一般心脏病学家（偏好范围为9%至20%），在其余领域上被评为等同。使用AMIE的回应在63.7%的病例中提高了心脏病学家的整体响应质量，而降低了仅有3.4%的质量。与没有使用AMIE的心脏病学家相比，拥有AMIE的心脏病学家的回应在所有10个领域上都更优秀。定性研究表明，AMIE和一般心脏病学家可以相互补充，AMIE全面而敏感，而一般心脏病学家简明而具体。总的来说，我们的结果表明，专业的医学LLM有潜力通过弥合专科医学知识的差距来增强一般心脏病学家的能力，尽管进一步的研究和验证对于广泛的临床应用至关重要。

更新时间: 2024-10-01 06:34:31

领域: cs.HC,cs.AI

下载: http://arxiv.org/abs/2410.03741v1

PrivTuner with Homomorphic Encryption and LoRA: A P3EFT Scheme for Privacy-Preserving Parameter-Efficient Fine-Tuning of AI Foundation Models

AI foundation models have recently demonstrated impressive capabilities across a wide range of tasks. Fine-tuning (FT) is a method of customizing a pre-trained AI foundation model by further training it on a smaller, targeted dataset. In this paper, we initiate the study of the Privacy-Preserving Parameter-Efficient FT (P3EFT) framework, which can be viewed as the intersection of Parameter-Efficient FT (PEFT) and Privacy-Preserving FT (PPFT). PEFT modifies only a small subset of the model's parameters to achieve FT (i.e., adapting a pre-trained model to a specific dataset), while PPFT uses privacy-preserving technologies to protect the confidentiality of the model during the FT process. There have been many studies on PEFT or PPFT but very few on their fusion, which motivates our work on P3EFT to achieve both parameter efficiency and model privacy. To exemplify our P3EFT, we present the PrivTuner scheme, which incorporates Fully Homomorphic Encryption (FHE) enabled privacy protection into LoRA (short for ``Low-Rank Adapter''). Intuitively speaking, PrivTuner allows the model owner and the external data owners to collaboratively implement PEFT with encrypted data. After describing PrivTuner in detail, we further investigate its energy consumption and privacy protection. Then, we consider a PrivTuner system over wireless communications and formulate a joint optimization problem to adaptively minimize energy while maximizing privacy protection, with the optimization variables including FDMA bandwidth allocation, wireless transmission power, computational resource allocation, and privacy protection. A resource allocation algorithm is devised to solve the problem. Experiments demonstrate that our algorithm can significantly reduce energy consumption while adapting to different privacy requirements.

Updated: 2024-10-01 06:30:06

标题: PrivTuner: 采用同态加密和LoRA技术的隐私保护参数高效调整AI基础模型的P3EFT方案

摘要: 人工智能基础模型最近展示出在各种任务中令人印象深刻的能力。微调（FT）是一种通过在较小的、针对性的数据集上进一步训练来定制预训练的人工智能基础模型的方法。本文首次研究了保护隐私的参数高效微调（P3EFT）框架，可以看作是参数高效微调（PEFT）和隐私保护微调（PPFT）的交集。PEFT仅修改模型的参数的一个小子集以实现微调（即，将预训练模型调整到特定数据集），而PPFT使用隐私保护技术在微调过程中保护模型的机密性。关于PEFT或PPFT已经有很多研究，但很少有关于它们融合的研究，这促使我们进行P3EFT工作，以实现参数效率和模型隐私。为了举例说明我们的P3EFT，我们提出了PrivTuner方案，它将完全同态加密（FHE）功能集成到LoRA（简称“低秩适配器”）中以实现隐私保护。直观地说，PrivTuner允许模型所有者和外部数据所有者共同使用加密数据实施PEFT。在详细描述PrivTuner后，我们进一步研究了其能量消耗和隐私保护。然后，我们考虑了无线通信上的PrivTuner系统，并制定了一个联合优化问题，以自适应地最小化能量同时最大化隐私保护，优化变量包括FDMA带宽分配、无线传输功率、计算资源分配和隐私保护。我们设计了一个资源分配算法来解决这个问题。实验表明我们的算法可以显著减少能量消耗，并适应不同的隐私要求。

更新时间: 2024-10-01 06:30:06

领域: cs.CR

下载: http://arxiv.org/abs/2410.00433v1

Unsupervised Meta-Learning via In-Context Learning

Unsupervised meta-learning aims to learn feature representations from unsupervised datasets that can transfer to downstream tasks with limited labeled data. In this paper, we propose a novel approach to unsupervised meta-learning that leverages the generalization abilities of in-context learning observed in transformer architectures. Our method reframes meta-learning as a sequence modeling problem, enabling the transformer encoder to learn task context from support images and utilize it to predict query images. At the core of our approach lies the creation of diverse tasks generated using a combination of data augmentations and a mixing strategy that challenges the model during training while fostering generalization to unseen tasks at test time. Experimental results on benchmark datasets showcase the superiority of our approach over existing unsupervised meta-learning baselines, establishing it as the new state-of-the-art in the field. Remarkably, our method achieves competitive results with supervised and self-supervised approaches, underscoring the efficacy of the model in leveraging generalization over memorization.

Updated: 2024-10-01 06:29:08

标题: 无监督元学习通过上下文学习

摘要: 无监督元学习旨在从无监督数据集中学习特征表示，可以将其转移到具有有限标记数据的下游任务中。在本文中，我们提出了一种新颖的无监督元学习方法，利用了在Transformer架构中观察到的上下文学习的泛化能力。我们的方法将元学习重新构建为一个序列建模问题，使Transformer编码器能够从支持图像中学习任务上下文，并利用它来预测查询图像。我们方法的核心在于通过数据增强和混合策略生成多样化的任务，挑战模型在训练过程中，同时促进对未知任务的泛化。在基准数据集上的实验结果展示了我们方法优于现有无监督元学习基线的优越性，将其确立为该领域的新领先技术。值得注意的是，我们的方法在竞争结果方面与监督和自监督方法相当，强调了该模型利用泛化而不是记忆的有效性。

更新时间: 2024-10-01 06:29:08

领域: cs.LG

下载: http://arxiv.org/abs/2405.16124v2

Scalable Multi-Task Transfer Learning for Molecular Property Prediction

Molecules have a number of distinct properties whose importance and application vary. Often, in reality, labels for some properties are hard to achieve despite their practical importance. A common solution to such data scarcity is to use models of good generalization with transfer learning. This involves domain experts for designing source and target tasks whose features are shared. However, this approach has limitations: i). Difficulty in accurate design of source-target task pairs due to the large number of tasks, and ii). corresponding computational burden verifying many trials and errors of transfer learning design, thereby iii). constraining the potential of foundation modeling of multi-task molecular property prediction. We address the limitations of the manual design of transfer learning via data-driven bi-level optimization. The proposed method enables scalable multi-task transfer learning for molecular property prediction by automatically obtaining the optimal transfer ratios. Empirically, the proposed method improved the prediction performance of 40 molecular properties and accelerated training convergence.

Updated: 2024-10-01 06:28:14

标题: 可扩展的多任务迁移学习用于分子性质预测

摘要: 分子具有许多不同的性质，其重要性和应用各不相同。通常情况下，尽管这些性质在实践中很重要，但有些性质的标签很难获取。在面对这种数据稀缺情况时，常见的解决方案是使用具有迁移学习的良好泛化能力的模型。这涉及到领域专家设计共享特征的源任务和目标任务。然而，这种方法存在一些局限性：一是由于任务数量庞大，很难准确设计源-目标任务对；二是验证许多迁移学习设计的试验和错误会导致计算负担增加；从而限制了多任务分子性质预测基础建模的潜力。我们通过数据驱动的双层优化解决了迁移学习手动设计的局限性。所提出的方法通过自动获取最佳迁移比例，实现了可扩展的多任务迁移学习，用于分子性质预测。从经验上看，所提出的方法提高了40种分子性质的预测性能，并加快了训练收敛速度。

更新时间: 2024-10-01 06:28:14

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.00432v1

ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

Simulation has enabled unprecedented compute-scalable approaches to robot learning. However, many existing simulation frameworks typically support a narrow range of scenes/tasks and lack features critical for scaling generalizable robotics and sim2real. We introduce and open source ManiSkill3, the fastest state-visual GPU parallelized robotics simulator with contact-rich physics targeting generalizable manipulation. ManiSkill3 supports GPU parallelization of many aspects including simulation+rendering, heterogeneous simulation, pointclouds/voxels visual input, and more. Simulation with rendering on ManiSkill3 can run 10-1000x faster with 2-3x less GPU memory usage than other platforms, achieving up to 30,000+ FPS in benchmarked environments due to minimal python/pytorch overhead in the system, simulation on the GPU, and the use of the SAPIEN parallel rendering system. Tasks that used to take hours to train can now take minutes. We further provide the most comprehensive range of GPU parallelized environments/tasks spanning 12 distinct domains including but not limited to mobile manipulation for tasks such as drawing, humanoids, and dextrous manipulation in realistic scenes designed by artists or real-world digital twins. In addition, millions of demonstration frames are provided from motion planning, RL, and teleoperation. ManiSkill3 also provides a comprehensive set of baselines that span popular RL and learning-from-demonstrations algorithms.

Updated: 2024-10-01 06:10:39

标题: ManiSkill3：用于通用化体验智能AI的GPU并行化机器人模拟和渲染

摘要: 模拟技术使得机器人学习变得更加可扩展。然而，许多现有的模拟框架通常只支持有限范围的场景/任务，并且缺乏对于扩展通用机器人和模拟到真实世界的关键功能。我们介绍并开源了ManiSkill3，这是最快的状态-视觉GPU并行化机器人模拟器，具有丰富的接触物理效果，旨在实现通用的操纵能力。ManiSkill3支持GPU并行化的许多方面，包括模拟+渲染、异构模拟、点云/体素视觉输入等。在ManiSkill3上进行模拟和渲染可以比其他平台快10-1000倍，GPU内存使用量少2-3倍，在基准环境中达到每秒30,000帧以上的运行速度，这得益于系统中最小化的python/pytorch开销、GPU上的模拟和使用SAPIEN并行渲染系统。之前需要几小时训练的任务现在只需要几分钟。我们还提供了最全面的GPU并行化环境/任务范围，涵盖12个不同领域，包括移动操作（如绘画）、仿生机器人和艺术家设计或真实世界数字孪生中的灵巧操作等任务。此外，我们提供了来自运动规划、RL和远程操作的数百万个演示帧。ManiSkill3还提供了一套涵盖流行RL和从演示学习算法的全面基线。

更新时间: 2024-10-01 06:10:39

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2410.00425v1

On the Counting of Involutory MDS Matrices

The optimal branch number of MDS matrices has established their importance in designing diffusion layers for various block ciphers and hash functions. As a result, numerous matrix structures, including Hadamard and circulant matrices, have been proposed for constructing MDS matrices. Also, in the literature, significant attention is typically given to identifying MDS candidates with optimal implementations or proposing new constructions across different orders. However, this paper takes a different approach by not emphasizing efficiency issues or introducing new constructions. Instead, its primary objective is to enumerate Hadamard MDS and involutory Hadamard MDS matrices of order $4$ within the field $\mathbb{F}_{2^r}$. Specifically, it provides an explicit formula for the count of both Hadamard MDS and involutory Hadamard MDS matrices of order $4$ over $\mathbb{F}_{2^r}$. Additionally, it derives the count of Hadamard Near-MDS (NMDS) and involutory Hadamard NMDS matrices, each with exactly one zero in each row, of order $4$ over $\mathbb{F}_{2^r}$. Furthermore, the paper discusses some circulant-like matrices for constructing NMDS matrices and proves that when $n$ is even, any $2n \times 2n$ Type-II circulant-like matrix can never be an NMDS matrix. While it is known that NMDS matrices may be singular, this paper establishes that singular Hadamard matrices can never be NMDS matrices. Moreover, it proves that there exist exactly two orthogonal Type-I circulant-like matrices of order $4$ over $\mathbb{F}_{2^r}$.

Updated: 2024-10-01 06:08:45

标题: 关于对反演的MDS矩阵进行计数

摘要: MDS矩阵的最佳分支数已经确定了它们在设计各种分组密码和哈希函数的扩散层中的重要性。因此，已经提出了许多矩阵结构，包括Hadamard矩阵和循环矩阵，用于构建MDS矩阵。此外，在文献中，通常会特别关注识别具有最佳实现或在不同阶数之间提出新构造的MDS候选者。然而，本文采取了不强调效率问题或引入新构造的不同方法。相反，它的主要目标是列举在域$\mathbb{F}_{2^r}$中的阶数为$4$的Hadamard MDS和可逆Hadamard MDS矩阵。具体而言，它提供了一个明确的公式，用于计算$\mathbb{F}_{2^r}$上阶数为$4$的Hadamard MDS和可逆Hadamard MDS矩阵的数量。此外，它推导了阶数为$4$的$\mathbb{F}_{2^r}$上具有每行恰好一个零的Hadamard近MDS（NMDS）和可逆Hadamard NMDS矩阵的数量。此外，本文讨论了用于构造NMDS矩阵的类循环矩阵，并证明了当$n$为偶数时，任何$2n \times 2n$的Type-II类循环矩阵永远不可能是NMDS矩阵。尽管已知NMDS矩阵可能是奇异的，但本文确定奇异的Hadamard矩阵永远不可能是NMDS矩阵。此外，它证明了在$\mathbb{F}_{2^r}$上存在着两个正交的阶数为$4$的Type-I类循环矩阵。

更新时间: 2024-10-01 06:08:45

领域: cs.CR,cs.IT,math.IT

下载: http://arxiv.org/abs/2310.00090v3

MDA: An Interpretable Multi-Modal Fusion with Missing Modalities and Intrinsic Noise

Multi-modal fusion is crucial in medical data research, enabling a comprehensive understanding of diseases and improving diagnostic performance by combining diverse modalities. However, multi-modal fusion faces challenges, including capturing interactions between modalities, addressing missing modalities, handling erroneous modal information, and ensuring interpretability. Many existing researchers tend to design different solutions for these problems, often overlooking the commonalities among them. This paper proposes a novel multi-modal fusion framework that achieves adaptive adjustment over the weights of each modality by introducing the Modal-Domain Attention (MDA). It aims to facilitate the fusion of multi-modal information while allowing for the inclusion of missing modalities or intrinsic noise, thereby enhancing the representation of multi-modal data. We provide visualizations of accuracy changes and MDA weights by observing the process of modal fusion, offering a comprehensive analysis of its interpretability. Extensive experiments on various gastrointestinal disease benchmarks, the proposed MDA maintains high accuracy even in the presence of missing modalities and intrinsic noise. One thing worth mentioning is that the visualization of MDA is highly consistent with the conclusions of existing clinical studies on the dependence of different diseases on various modalities. Code and dataset will be made available.

Updated: 2024-10-01 06:08:00

标题: MDA：一种可解释的多模态融合方法，适用于缺失模态及内在噪声

摘要: 多模态融合在医学数据研究中至关重要，可以通过结合多种模态实现对疾病的全面理解，并提高诊断性能。然而，多模态融合面临挑战，包括捕捉模态之间的交互作用，处理缺失模态，处理错误的模态信息，并确保可解释性。许多现有研究人员倾向于针对这些问题设计不同的解决方案，经常忽视它们之间的共同点。本文提出了一个新颖的多模态融合框架，通过引入模态域注意力（MDA）实现对每种模态权重的自适应调整。它旨在促进多模态信息的融合，同时允许包含缺失模态或内在噪声，从而增强多模态数据的表示。我们通过观察模态融合过程，提供了准确性变化和MDA权重的可视化，对其可解释性进行了全面分析。在各种胃肠疾病基准测试上进行了大量实验，提出的MDA即使在缺失模态和内在噪声存在的情况下也能保持高准确性。值得一提的是，MDA的可视化与现有临床研究关于不同疾病对各种模态依赖性的结论高度一致。代码和数据集将提供。

更新时间: 2024-10-01 06:08:00

领域: cs.LG,cs.CV,I.5.2; I.2.7; I.2.10; J.3

下载: http://arxiv.org/abs/2406.10569v2

FLEX: Expert-level False-Less EXecution Metric for Reliable Text-to-SQL Benchmark

Text-to-SQL technology has become crucial for translating natural language into SQL queries in various industries, enabling non-technical users to perform complex data operations. The need for accurate evaluation methods has increased as these systems have grown more sophisticated. However, we found that the Execution Accuracy (EX), the most promising evaluation metric, still shows a substantial portion of false positives and negatives compared to human evaluation. Thus, this paper introduces FLEX (False-Less EXecution), a novel approach to evaluating text-to-SQL systems using large language models (LLMs) to emulate human expert-level evaluation of SQL queries. Our method shows significantly higher agreement with human expert judgments, improving Cohen's kappa from 61 to 78.17. Re-evaluating top-performing models on the Spider and BIRD benchmarks using FLEX reveals substantial shifts in performance rankings, with an average performance decrease of 3.15 due to false positive corrections and an increase of 6.07 from addressing false negatives. This work contributes to a more accurate and nuanced evaluation of text-to-SQL systems, potentially reshaping our understanding of state-of-the-art performance in this field.

Updated: 2024-10-01 05:55:33

标题: FLEX: 可靠的文本到SQL基准测试的专家级低误差执行度量

摘要: 文本到SQL技术已经成为在各个行业将自然语言转化为SQL查询的关键技术，使非技术用户能够执行复杂的数据操作。随着这些系统变得越来越复杂，对准确评估方法的需求也在增加。然而，我们发现，执行准确度（EX）这一最有前景的评估指标，与人工评估相比仍然存在相当大比例的假阳性和假阴性。因此，本文引入了FLEX（False-Less EXecution），一种使用大型语言模型（LLMs）来模拟人类专家级对SQL查询进行评估的新方法。我们的方法与人类专家判断显示出更高的一致性，将Cohen's kappa从61提高到78.17。使用FLEX重新评估Spider和BIRD基准上表现最好的模型，显示出性能排名出现了显著变化，由于纠正了假阳性而导致平均性能下降了3.15，而通过处理假阴性而增加了6.07。这项工作有助于更准确和细致地评估文本到SQL系统，可能重塑我们对这一领域最先进性能的理解。

更新时间: 2024-10-01 05:55:33

领域: cs.CL,cs.IR,cs.LG

下载: http://arxiv.org/abs/2409.19014v2

Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration

Photo-realistic image restoration algorithms are typically evaluated by distortion measures (e.g., PSNR, SSIM) and by perceptual quality measures (e.g., FID, NIQE), where the desire is to attain the lowest possible distortion without compromising on perceptual quality. To achieve this goal, current methods typically attempt to sample from the posterior distribution, or to optimize a weighted sum of a distortion loss (e.g., MSE) and a perceptual quality loss (e.g., GAN). Unlike previous works, this paper is concerned specifically with the optimal estimator that minimizes the MSE under a constraint of perfect perceptual index, namely where the distribution of the reconstructed images is equal to that of the ground-truth ones. A recent theoretical result shows that such an estimator can be constructed by optimally transporting the posterior mean prediction (MMSE estimate) to the distribution of the ground-truth images. Inspired by this result, we introduce Posterior-Mean Rectified Flow (PMRF), a simple yet highly effective algorithm that approximates this optimal estimator. In particular, PMRF first predicts the posterior mean, and then transports the result to a high-quality image using a rectified flow model that approximates the desired optimal transport map. We investigate the theoretical utility of PMRF and demonstrate that it consistently outperforms previous methods on a variety of image restoration tasks.

Updated: 2024-10-01 05:54:07

标题: 后验均值矫正流：朝向最小均方误差的照片逼真图像恢复

摘要: 照片逼真的图像恢复算法通常通过失真度量（例如PSNR、SSIM）和感知质量度量（例如FID、NIQE）进行评估，希望在不影响感知质量的情况下实现最低可能的失真。为了实现这一目标，当前的方法通常尝试从后验分布中进行取样，或者优化失真损失（例如MSE）和感知质量损失（例如GAN）的加权和。与以往的研究不同，本文专注于在完美感知指数约束下最小化均方误差的最优估计器，即重建图像的分布等于地面真实图像的分布。最近的理论结果表明，这样的估计器可以通过将后验均值预测（MMSE估计）最优地传输到地面真实图像的分布来构建。受此结果启发，我们引入后验均值校正流（PMRF），这是一个简单但非常有效的算法，可以近似这个最优估计器。具体来说，PMRF首先预测后验均值，然后使用逼近所需最优传输映射的校正流模型将结果传输到高质量图像。我们研究了PMRF的理论实用性，并证明它在各种图像恢复任务中始终优于以前的方法。

更新时间: 2024-10-01 05:54:07

领域: eess.IV,cs.AI,cs.CV,eess.SP

下载: http://arxiv.org/abs/2410.00418v1

A Foundation Model for Zero-shot Logical Query Reasoning

Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional queries comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on which requires substantial training time before being deployed on a new graph. Here we present UltraQuery, the first foundation model for inductive reasoning that can zero-shot answer logical queries on any KG. The core idea of UltraQuery is to derive both projections and logical operations as vocabulary-independent functions which generalize to new entities and relations in any KG. With the projection operation initialized from a pre-trained inductive KG reasoning model, UltraQuery can solve CLQA on any KG after finetuning on a single dataset. Experimenting on 23 datasets, UltraQuery in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 15 of them.

Updated: 2024-10-01 05:52:11

标题: 零样本逻辑查询推理的基础模型

摘要: 在知识图谱（KGs）中进行复杂逻辑查询回答（CLQA）超越简单的KG完成，旨在回答由多个投影和逻辑操作组成的组合查询。现有的学习参数绑定到某些实体或关系词汇的CLQA方法只能应用于它们在训练的图上，这需要大量的训练时间才能在新图上部署。在这里，我们提出了UltraQuery，这是第一个用于归纳推理的基础模型，可以零-shot回答任何KG上的逻辑查询。UltraQuery的核心思想是将投影和逻辑操作推导为与词汇无关的函数，这些函数可以泛化到任何KG中的新实体和关系。通过从预训练的归纳KG推理模型初始化投影操作，UltraQuery可以在单个数据集微调后解决任何KG上的CLQA。在23个数据集上进行实验，UltraQuery在零-shot推理模式下显示出与最佳基线相当或更好的查询回答性能，并在其中的15个数据集上创造了新的技术水平。

更新时间: 2024-10-01 05:52:11

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2404.07198v2

CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks

Post-training, particularly reinforcement learning (RL) using self-play-generated data, has become a new learning paradigm for large language models (LLMs). However, scaling RL to develop a general reasoner remains a research challenge, as existing methods focus on task-specific reasoning without adequately addressing generalization across a broader range of tasks. Moreover, unlike traditional RL with limited action space, LLMs operate in an infinite space, making it crucial to search for valuable and diverse strategies to solve problems effectively. To address this, we propose searching within the action space on high-level abstract plans to enhance model generalization and introduce Critical Plan Step Learning (CPL), comprising: 1) searching on plan, using Monte Carlo Tree Search (MCTS) to explore diverse plan steps in multi-step reasoning tasks, and 2) learning critical plan steps through Step-level Advantage Preference Optimization (Step-APO), which integrates advantage estimates for step preference obtained via MCTS into Direct Preference Optimization (DPO). This combination helps the model effectively learn critical plan steps, enhancing both reasoning capabilities and generalization. Experimental results demonstrate that our method, trained exclusively on GSM8K and MATH, not only significantly improves performance on GSM8K (+10.5%) and MATH (+6.5%), but also enhances out-of-domain reasoning benchmarks, such as HumanEval (+12.2%), GPQA (+8.6%), ARC-C (+4.0%), MMLU-STEM (+2.2%), and BBH (+1.8%).

Updated: 2024-10-01 05:42:12

标题: CPL：关键计划步骤学习提升在推理任务中的LLM泛化

摘要: 在训练后，尤其是使用自我生成数据的强化学习（RL），已经成为大型语言模型（LLMs）的一种新的学习范式。然而，将RL扩展到开发通用推理器仍然是一个研究挑战，因为现有方法侧重于特定任务的推理，而没有充分解决跨更广泛任务范围的泛化问题。此外，与具有有限行动空间的传统RL不同，LLMs在一个无限空间中运行，因此寻找解决问题的有价值和多样化策略至关重要。为了解决这个问题，我们提出在高级抽象计划的行动空间内进行搜索，以增强模型的泛化能力，并引入关键计划步骤学习（CPL），包括：1）在计划上搜索，使用蒙特卡洛树搜索（MCTS）探索多步推理任务中的多样化计划步骤，以及2）通过步骤级优势偏好优化（Step-APO）学习关键计划步骤，该方法将通过MCTS获得的步骤偏好的优势估计整合到直接偏好优化（DPO）中。这种组合有助于模型有效学习关键计划步骤，增强推理能力和泛化能力。实验结果表明，我们的方法仅在GSM8K和MATH上进行训练，不仅显着提高了在GSM8K（+10.5%）和MATH（+6.5%）上的性能，还增强了领域外推理基准，如HumanEval（+12.2%）、GPQA（+8.6%）、ARC-C（+4.0%）、MMLU-STEM（+2.2%）和BBH（+1.8%）。

更新时间: 2024-10-01 05:42:12

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2409.08642v2

Federated Instruction Tuning of LLMs with Domain Coverage Augmentation

Federated Domain-specific Instruction Tuning (FedDIT) utilizes limited cross-client private data alongside server-side public data for instruction augmentation, ultimately enhancing model performance within specific domains. While the factors affecting FedDIT remain unclear and existing instruction augmentation methods mainly focus on the centralized setting without considering the distributed environment. Our experiments reveal that the cross-client domain coverage, rather than data heterogeneity, drives model performance in FedDIT. In response, we propose FedDCA, which optimizes domain coverage through greedy client center selection and retrieval-based augmentation. To alleviate client-side computational burdens, FedDCA$^*$ uses heterogeneous encoders with server-side feature alignment. Extensive experiments across four distinct domains (code, medical, financial, and mathematical) substantiate the effectiveness of both methods. Additionally, we investigate privacy preservation against memory extraction attacks utilizing varying amounts of public data. Results show no significant correlation between the volume of public data and the privacy-preserving capability. However, as the fine-tuning round increases, the risk of privacy leakage reduces or converges.

Updated: 2024-10-01 05:37:07

标题: 使用领域覆盖增强的LLMs的联邦指令调优

摘要: Federated Domain-specific Instruction Tuning (FedDIT)利用有限的跨客户端私有数据以及服务器端公共数据进行指令增强，最终提高特定领域内模型的性能。尽管影响FedDIT的因素尚不清楚，并且现有的指令增强方法主要集中在中心化设置，而没有考虑到分布式环境。我们的实验表明，跨客户端领域覆盖面，而不是数据异质性，推动了FedDIT模型的性能。为此，我们提出了FedDCA，通过贪婪客户中心选择和基于检索的增强来优化领域覆盖面。为了减轻客户端的计算负担，FedDCA$^*$使用具有服务器端特征对齐的异构编码器。对四个不同领域（代码、医疗、金融和数学）进行了大量实验，证实了两种方法的有效性。此外，我们还研究了针对内存提取攻击的隐私保护，利用不同量的公共数据。结果显示公共数据的数量与隐私保护能力之间没有显著相关性。然而，随着微调轮次的增加，隐私泄漏的风险减少或收敛。

更新时间: 2024-10-01 05:37:07

领域: cs.LG,cs.CL,cs.DC

下载: http://arxiv.org/abs/2409.20135v2

LinkThief: Combining Generalized Structure Knowledge with Node Similarity for Link Stealing Attack against GNN

Graph neural networks(GNNs) have a wide range of applications in multimedia.Recent studies have shown that Graph neural networks(GNNs) are vulnerable to link stealing attacks,which infers the existence of edges in the target GNN's training graph.Existing attacks are usually based on the assumption that links exist between two nodes that share similar posteriors;however,they fail to focus on links that do not hold under this assumption.To this end,we propose LinkThief,an improved link stealing attack that combines generalized structure knowledge with node similarity,in a scenario where the attackers' background knowledge contains partially leaked target graph and shadow graph.Specifically,to equip the attack model with insights into the link structure spanning both the shadow graph and the target graph,we introduce the idea of creating a Shadow-Target Bridge Graph and extracting edge subgraph structure features from it.Through theoretical analysis from the perspective of privacy theft,we first explore how to implement the aforementioned ideas.Building upon the findings,we design the Bridge Graph Generator to construct the Shadow-Target Bridge Graph.Then,the subgraph around the link is sampled by the Edge Subgraph Preparation Module.Finally,the Edge Structure Feature Extractor is designed to obtain generalized structure knowledge,which is combined with node similarity to form the features provided to the attack model.Extensive experiments validate the correctness of theoretical analysis and demonstrate that LinkThief still effectively steals links without extra assumptions.

Updated: 2024-10-01 05:34:03

标题: LinkThief：将广义结构知识与节点相似性相结合，对GNN进行链接窃取攻击

摘要: 图神经网络（GNNs）在多媒体领域有着广泛的应用。最近的研究表明，图神经网络（GNNs）容易受到链接窃取攻击的影响，该攻击可以推断目标GNN训练图中存在的边。现有的攻击通常基于一个假设，即在具有相似后验概率的两个节点之间存在链接；然而，它们未能关注在这个假设下不成立的链接。为此，我们提出了LinkThief，一种改进的链接窃取攻击方法，结合通用结构知识和节点相似性，在攻击者的背景知识中包含部分泄露的目标图和阴影图的情况下。具体来说，为了使攻击模型了解横跨阴影图和目标图的链接结构，我们引入了创建Shadow-Target Bridge Graph的想法，并从中提取边子图结构特征。通过从隐私窃取的角度进行理论分析，我们首先探讨如何实现上述想法。基于这些发现，我们设计了Bridge Graph Generator来构建Shadow-Target Bridge Graph。然后，通过Edge Subgraph Preparation Module对链接周围的子图进行抽样。最后，设计了Edge Structure Feature Extractor来获取通用结构知识，将其与节点相似性结合形成提供给攻击模型的特征。大量实验证实了理论分析的正确性，并表明LinkThief仍然能够有效地窃取链接，而无需额外的假设。

更新时间: 2024-10-01 05:34:03

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.02826v1

Weak-to-Strong Reasoning

When large language models (LLMs) exceed human-level capabilities, it becomes increasingly challenging to provide full-scale and accurate supervision for these models. Weak-to-strong learning, which leverages a less capable model to unlock the latent abilities of a stronger model, proves valuable in this context. Yet, the efficacy of this approach for complex reasoning tasks is still untested. Furthermore, tackling reasoning tasks under the weak-to-strong setting currently lacks efficient methods to avoid blindly imitating the weak supervisor including its errors. In this paper, we introduce a progressive learning framework that enables the strong model to autonomously refine its training data, without requiring input from either a more advanced model or human-annotated data. This framework begins with supervised fine-tuning on a selective small but high-quality dataset, followed by preference optimization on contrastive samples identified by the strong model itself. Extensive experiments on the GSM8K and MATH datasets demonstrate that our method significantly enhances the reasoning capabilities of Llama2-70b using three separate weak models. This method is further validated in a forward-looking experimental setup, where Llama3-8b-instruct effectively supervises Llama3-70b on the highly challenging OlympicArena dataset. This work paves the way for a more scalable and sophisticated strategy to enhance AI reasoning powers. All relevant code and resources are available in \url{https://github.com/GAIR-NLP/weak-to-strong-reasoning}.

Updated: 2024-10-01 05:28:54

标题: 弱到强的推理

摘要: 当大型语言模型（LLMs）超越人类水平能力时，为这些模型提供全面和准确的监督变得越来越具有挑战性。弱到强的学习，利用一个能力较弱的模型来释放一个更强大模型的潜在能力，在这种情况下证明是有价值的。然而，这种方法对于复杂推理任务的有效性仍未经过测试。此外，在弱到强设置下处理推理任务目前缺乏有效的方法来避免盲目模仿弱监督者，包括其错误。在本文中，我们介绍了一个渐进学习框架，使得强模型能够自主地改进其训练数据，而无需依赖于更先进的模型或人工注释数据。该框架从对一个选择性的小型但高质量数据集进行监督微调开始，然后通过强模型自身识别的对比样本进行优化。在GSM8K和MATH数据集上进行了大量实验，证明我们的方法使用三个不同的弱模型显著增强了Llama2-70b的推理能力。该方法在一个前瞻性实验设置中进一步得到验证，在这个设置中，Llama3-8b-instruct有效地监督了Llama3-70b在极具挑战性的OlympicArena数据集上。这项工作为提高AI推理能力打开了一条更加可扩展和复杂的策略之路。所有相关代码和资源都可以在\url{https://github.com/GAIR-NLP/weak-to-strong-reasoning}上找到。

更新时间: 2024-10-01 05:28:54

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.13647v2

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training. We develop a novel error-bounded lossy compression algorithm, informed by an in-depth analysis of embedding data features, to achieve high compression ratios. Moreover, we introduce a dual-level adaptive strategy for error-bound adjustment, spanning both table-wise and iteration-wise aspects, to balance the compression benefits with the potential impacts on accuracy. We further optimize our compressor for PyTorch tensors on GPUs, minimizing compression overhead. Evaluation shows that our method achieves a 1.38$\times$ training speedup with a minimal accuracy impact.

Updated: 2024-10-01 05:20:59

标题: 使用双层自适应有损压缩加速深度学习推荐模型训练中的通信

摘要: DLRM是一种最先进的推荐系统模型，在各种行业应用中得到了广泛采用。然而，DLRM模型的大尺寸需要使用多个设备/GPUs进行高效训练。在这个过程中一个重要的瓶颈是需要进行耗时的全对全通信来收集来自所有设备的嵌入数据。为了缓解这个问题，我们引入了一种方法，利用误差有界的有损压缩来减少通信数据大小并加速DLRM训练。我们开发了一种新颖的误差有界的有损压缩算法，通过深入分析嵌入数据特征来实现高压缩比。此外，我们引入了一个双层自适应策略来调整误差界限，涵盖表格级和迭代级方面，以平衡压缩带来的好处和对准确性的潜在影响。我们进一步优化了针对GPU上的PyTorch张量的压缩器，最小化了压缩开销。评估表明，我们的方法在最小准确性影响的情况下实现了1.38倍的训练加速。

更新时间: 2024-10-01 05:20:59

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2407.04272v5

GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Glycans are basic biomolecules and perform essential functions within living organisms. The rapid increase of functional glycan data provides a good opportunity for machine learning solutions to glycan understanding. However, there still lacks a standard machine learning benchmark for glycan property and function prediction. In this work, we fill this blank by building a comprehensive benchmark for Glycan Machine Learning (GlycanML). The GlycanML benchmark consists of diverse types of tasks including glycan taxonomy prediction, glycan immunogenicity prediction, glycosylation type prediction, and protein-glycan interaction prediction. Glycans can be represented by both sequences and graphs in GlycanML, which enables us to extensively evaluate sequence-based models and graph neural networks (GNNs) on benchmark tasks. Furthermore, by concurrently performing eight glycan taxonomy prediction tasks, we introduce the GlycanML-MTL testbed for multi-task learning (MTL) algorithms. Also, we evaluate how taxonomy prediction can boost other three function prediction tasks by MTL. Experimental results show the superiority of modeling glycans with multi-relational GNNs, and suitable MTL methods can further boost model performance. We provide all datasets and source codes at https://github.com/GlycanML/GlycanML and maintain a leaderboard at https://GlycanML.github.io/project

Updated: 2024-10-01 05:14:15

标题: GlycanML：糖机器学习的多任务和多结构基准

摘要: 糖类是基本的生物分子，在生物体内发挥着重要的功能。功能性糖类数据的快速增加为机器学习解决糖类理解提供了良好的机会。然而，目前仍缺乏糖类性质和功能预测的标准机器学习基准。在这项工作中，我们通过构建全面的糖类机器学习基准（GlycanML），填补了这一空白。GlycanML基准包括多种类型的任务，包括糖类分类预测、糖类免疫原性预测、糖基化类型预测和蛋白质-糖类相互作用预测。在GlycanML中，糖类可以用序列和图表示，这使我们能够广泛评估基于序列的模型和图神经网络（GNNs）在基准任务上的表现。此外，通过同时执行八个糖类分类预测任务，我们引入了用于多任务学习（MTL）算法的GlycanML-MTL测试平台。此外，我们评估了分类预测如何通过MTL提升其他三个功能预测任务的效果。实验结果显示，用多关系GNNs建模糖类具有优越性，并且适当的MTL方法可以进一步提升模型性能。我们在https://github.com/GlycanML/GlycanML提供所有数据集和源代码，并在https://GlycanML.github.io/project上维护排行榜。

更新时间: 2024-10-01 05:14:15

领域: cs.LG

下载: http://arxiv.org/abs/2405.16206v3

TikGuard: A Deep Learning Transformer-Based Solution for Detecting Unsuitable TikTok Content for Kids

The rise of short-form videos on platforms like TikTok has brought new challenges in safeguarding young viewers from inappropriate content. Traditional moderation methods often fall short in handling the vast and rapidly changing landscape of user-generated videos, increasing the risk of children encountering harmful material. This paper introduces TikGuard, a transformer-based deep learning approach aimed at detecting and flagging content unsuitable for children on TikTok. By using a specially curated dataset, TikHarm, and leveraging advanced video classification techniques, TikGuard achieves an accuracy of 86.7%, showing a notable improvement over existing methods in similar contexts. While direct comparisons are limited by the uniqueness of the TikHarm dataset, TikGuard's performance highlights its potential in enhancing content moderation, contributing to a safer online experience for minors. This study underscores the effectiveness of transformer models in video classification and sets a foundation for future research in this area.

Updated: 2024-10-01 05:00:05

标题: TikGuard：一种基于深度学习Transformer的解决方案，用于检测不适合儿童观看的TikTok内容

摘要: 短视频平台如TikTok上短视频的兴起给保护年轻观众免受不当内容的挑战带来了新的挑战。传统的审查方法往往无法处理用户生成的视频的广泛和迅速变化的景观，增加了儿童遭遇有害材料的风险。本文介绍了TikGuard，这是一种基于变压器的深度学习方法，旨在检测和标记TikTok上不适合儿童观看的内容。通过使用一个经过特别策划的数据集TikHarm，并利用先进的视频分类技术，TikGuard实现了86.7%的准确率，显示出与类似情境下现有方法相比显著的改进。尽管由于TikHarm数据集的独特性，直接比较有限，但TikGuard的表现突显了其在增强内容审查方面的潜力，为未成年人提供更安全的在线体验。这项研究强调了变压器模型在视频分类中的有效性，并为该领域的未来研究奠定了基础。

更新时间: 2024-10-01 05:00:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2410.00403v1

Federated Learning with Reduced Information Leakage and Computation

Federated learning (FL) is a distributed learning paradigm that allows multiple decentralized clients to collaboratively learn a common model without sharing local data. Although local data is not exposed directly, privacy concerns nonetheless exist as clients' sensitive information can be inferred from intermediate computations. Moreover, such information leakage accumulates substantially over time as the same data is repeatedly used during the iterative learning process. As a result, it can be particularly difficult to balance the privacy-accuracy trade-off when designing privacy-preserving FL algorithms. This paper introduces Upcycled-FL, a simple yet effective strategy that applies first-order approximation at every even round of model update. Under this strategy, half of the FL updates incur no information leakage and require much less computational and transmission costs. We first conduct the theoretical analysis on the convergence (rate) of Upcycled-FL and then apply two perturbation mechanisms to preserve privacy. Extensive experiments on both synthetic and real-world data show that the Upcycled-FL strategy can be adapted to many existing FL frameworks and consistently improve the privacy-accuracy trade-off.

Updated: 2024-10-01 04:44:29

标题: 具有减少信息泄露和计算量的联邦学习

摘要: 联邦学习（FL）是一种分布式学习范式，允许多个分散的客户端协作学习一个共同的模型，而不共享本地数据。尽管本地数据没有直接暴露，但隐私问题仍然存在，因为客户端的敏感信息可以从中间计算中推断出来。此外，这种信息泄漏随着时间的推移会大幅累积，因为相同的数据在迭代学习过程中被反复使用。因此，在设计保护隐私的FL算法时，很难平衡隐私和准确性之间的权衡。本文介绍了Upcycled-FL，这是一种简单而有效的策略，它在每个偶数轮的模型更新中应用一阶逼近。根据这种策略，一半的FL更新不会泄露信息，而且需要更少的计算和传输成本。我们首先对Upcycled-FL的收敛（速率）进行了理论分析，然后应用两种扰动机制来保护隐私。对合成数据和真实数据进行的大量实验表明，Upcycled-FL策略可以适应许多现有的FL框架，并持续改善隐私和准确性之间的权衡。

更新时间: 2024-10-01 04:44:29

领域: cs.LG

下载: http://arxiv.org/abs/2310.06341v2

How Far Are We from Intelligent Visual Deductive Reasoning?

Vision-Language Models (VLMs) have recently demonstrated incredible strides on diverse vision language tasks. We dig into vision-based deductive reasoning, a more sophisticated but less explored realm, and find previously unexposed blindspots in the current SOTA VLMs. Specifically, we leverage Raven's Progressive Matrices (RPMs), to assess VLMs' abilities to perform multi-hop relational and deductive reasoning relying solely on visual clues. We perform comprehensive evaluations of several popular VLMs employing standard strategies such as in-context learning, self-consistency, and Chain-of-thoughts (CoT) on three diverse datasets, including the Mensa IQ test, IntelligenceTest, and RAVEN. The results reveal that despite the impressive capabilities of LLMs in text-based reasoning, we are still far from achieving comparable proficiency in visual deductive reasoning. We found that certain standard strategies that are effective when applied to LLMs do not seamlessly translate to the challenges presented by visual reasoning tasks. A detailed analysis reveals that VLMs struggle to solve these tasks mainly because they are unable to perceive and comprehend multiple, confounding abstract patterns in RPM examples.

Updated: 2024-10-01 04:41:53

标题: 我们离智能视觉演绎推理还有多远？

摘要: 视觉语言模型（VLMs）最近在各种视觉语言任务上取得了巨大进展。我们深入研究了基于视觉的演绎推理，这是一个更复杂但不太被探索的领域，并发现了当前最先进的VLMs中以前未曝光的盲点。具体来说，我们利用Raven的渐进矩阵（RPMs）来评估VLMs仅依靠视觉线索进行多跳关系和演绎推理的能力。我们对几种流行的VLMs进行了全面评估，采用标准策略，如上下文学习、自洽性和思维链（CoT），在三个不同的数据集上进行评估，包括Mensa智商测试、智力测试和RAVEN。结果显示，尽管LLMs在基于文本的推理方面具有令人印象深刻的能力，但在视觉演绎推理方面我们仍然远远落后。我们发现，一些对LLMs有效的标准策略在转化到视觉推理任务所面临的挑战时并不完全适用。详细分析显示，VLMs难以解决这些任务主要是因为它们无法感知和理解RPM示例中的多个、混杂的抽象模式。

更新时间: 2024-10-01 04:41:53

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2403.04732v3

Identifying Spurious Correlations using Counterfactual Alignment

Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual (CF) alignment method to detect and quantify spurious correlations of black box classifiers. Our methodology is based on counterfactual images generated with respect to one classifier being input into other classifiers to see if they also induce changes in the outputs of these classifiers. The relationship between these responses can be quantified and used to identify specific instances where a spurious correlation exists. This is validated by observing intuitive trends in a face-attribute face-attribute and waterbird classifiers, as well as by fabricating spurious correlations and detecting their presence, both visually and quantitatively. Furthermore, utilizing the CF alignment method, we demonstrate that we can evaluate robust optimization methods (GroupDRO, JTT, and FLAC) by detecting a reduction in spurious correlations.

Updated: 2024-10-01 04:39:14

标题: 使用反事实对齐方法识别虚假相关性

摘要: 由虚假相关性驱动的模型通常产生较差的泛化性能。我们提出了反事实（CF）对齐方法，用于检测和量化黑盒分类器的虚假相关性。我们的方法基于针对一个分类器生成的反事实图像，输入到其他分类器中，以查看它们是否也会引起这些分类器输出的变化。这些响应之间的关系可以被量化并用于识别存在虚假相关性的特定实例。通过观察面部属性和水鸟分类器中直观趋势的验证，以及制造虚假相关性并通过视觉和定量方法检测它们的存在，验证了这一点。此外，利用CF对齐方法，我们展示了可以通过检测虚假相关性的减少来评估强化优化方法（GroupDRO，JTT和FLAC）。

更新时间: 2024-10-01 04:39:14

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2312.02186v2

TRANSAGENT: An LLM-Based Multi-Agent System for Code Translation

Code translation converts code from one programming language to another while maintaining its original functionality, which is crucial for software migration, system refactoring, and cross-platform development. Traditional rule-based methods rely on manually-written rules, which can be time-consuming and often result in less readable code. To overcome this, learning-based methods have been developed, leveraging parallel data to train models for automated code translation. More recently, the advance of Large Language Models (LLMs) further boosts learning-based code translation. Although promising, LLM-translated program still suffers from diverse quality issues (e.g., syntax errors and semantic errors). In particular, it can be challenging for LLMs to self-debug these errors when simply provided with the corresponding error messages. In this work, we propose a novel LLM-based multi-agent system TRANSAGENT, which enhances LLM-based code translation by fixing the syntax errors and semantic errors with the synergy between four LLM-based agents, including Initial Code Translator, Syntax Error Fixer, Code Aligner, and Semantic Error Fixer. The main insight of TRANSAGENT is to first localize the error code block in the target program based on the execution alignment between the target and source program, which can narrow down the fixing space and thus lower down the fixing difficulties. To evaluate TRANSAGENT, we first construct a new benchmark from recent programming tasks to mitigate the potential data leakage issue. On our benchmark, TRANSAGENT outperforms the latest LLM-based code translation technique UniTrans in both translation effectiveness and efficiency; additionally, our evaluation on different LLMs show the generalization of TRANSAGENT and our ablation study shows the contribution of each agent.

Updated: 2024-10-01 04:35:05

标题: TRANSAGENT：基于LLM的用于代码翻译的多智能体系统

摘要: 代码翻译将代码从一种编程语言转换为另一种，同时保持其原始功能，这对于软件迁移、系统重构和跨平台开发至关重要。传统的基于规则的方法依赖于手工编写的规则，这可能耗时且经常导致不易阅读的代码。为了克服这一问题，已经开发了基于学习的方法，利用并行数据训练模型进行自动化代码翻译。最近，大型语言模型（LLMs）的进步进一步推动了基于学习的代码翻译。尽管有所希望，LLM翻译的程序仍然存在各种质量问题（例如，语法错误和语义错误）。特别是，LLMs在仅提供相应错误消息的情况下自我调试这些错误可能具有挑战性。在这项工作中，我们提出了一种新颖的基于LLM的多代理系统TRANSAGENT，通过四个基于LLM的代理之间的协同作用，包括初始代码翻译器、语法错误修复程序、代码对齐程序和语义错误修复程序，增强了基于LLM的代码翻译。TRANSAGENT的主要见解是根据目标程序和源程序之间的执行对齐来首先定位目标程序中的错误代码块，这可以缩小修复空间，从而降低修复难度。为了评估TRANSAGENT，我们首先从最近的编程任务中构建了一个新的基准，以减轻潜在的数据泄漏问题。在我们的基准测试中，TRANSAGENT在翻译效果和效率方面均优于最新的基于LLM的代码翻译技术UniTrans；此外，我们对不同LLMs的评估显示了TRANSAGENT的泛化能力，我们的消融研究显示了每个代理的贡献。

更新时间: 2024-10-01 04:35:05

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2409.19894v2

FedDTG:Federated Data-Free Knowledge Distillation via Three-Player Generative Adversarial Networks

While existing federated learning approaches primarily focus on aggregating local models to construct a global model, in realistic settings, some clients may be reluctant to share their private models due to the inclusion of privacy-sensitive information. Knowledge distillation, which can extract model knowledge without accessing model parameters, is well-suited for this federated scenario. However, most distillation methods in federated learning (federated distillation) require a proxy dataset, which is difficult to obtain in the real world. Therefore, in this paper, we introduce a distributed three-player Generative Adversarial Network (GAN) to implement data-free mutual distillation and propose an effective method called FedDTG. We confirmed that the fake samples generated by GAN can make federated distillation more efficient and robust. Additionally, the distillation process between clients can deliver good individual client performance while simultaneously acquiring global knowledge and protecting data privacy. Our extensive experiments on benchmark vision datasets demonstrate that our method outperforms other federated distillation algorithms in terms of generalization.

Updated: 2024-10-01 04:34:34

标题: FedDTG：通过三方生成对抗网络实现联合无数据知识蒸馏

摘要: 现有的联邦学习方法主要集中在聚合本地模型以构建全局模型，然而在现实环境中，一些客户可能因包含隐私敏感信息而不愿分享他们的私有模型。知识蒸馏可以在不访问模型参数的情况下提取模型知识，非常适用于这种联邦学习场景。然而，在联邦学习中大多数蒸馏方法（联合蒸馏）需要一个代理数据集，这在现实世界中很难获得。因此，在本文中，我们引入了一个分布式的三方生成对抗网络（GAN）来实现无数据相互蒸馏，并提出了一种名为FedDTG的有效方法。我们确认GAN生成的虚假样本可以使联邦蒸馏更高效和鲁棒。此外，客户之间的蒸馏过程可以提供良好的个体客户性能，同时获取全局知识并保护数据隐私。我们在基准视觉数据集上进行了大量实验，证实我们的方法在泛化方面优于其他联邦蒸馏算法。

更新时间: 2024-10-01 04:34:34

领域: cs.LG

下载: http://arxiv.org/abs/2201.03169v4

Revisiting Essential and Nonessential Settings of Evidential Deep Learning

Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation that provides reliable predictive uncertainty in a single forward pass, attracting significant attention. Grounded in subjective logic, EDL derives Dirichlet concentration parameters from neural networks to construct a Dirichlet probability density function (PDF), modeling the distribution of class probabilities. Despite its success, EDL incorporates several nonessential settings: In model construction, (1) a commonly ignored prior weight parameter is fixed to the number of classes, while its value actually impacts the balance between the proportion of evidence and its magnitude in deriving predictive scores. In model optimization, (2) the empirical risk features a variance-minimizing optimization term that biases the PDF towards a Dirac delta function, potentially exacerbating overconfidence. (3) Additionally, the structural risk typically includes a KL-divergence-minimizing regularization, whose optimization direction extends beyond the intended purpose and contradicts common sense, diminishing the information carried by the evidence magnitude. Therefore, we propose Re-EDL, a simplified yet more effective variant of EDL, by relaxing the nonessential settings and retaining the essential one, namely, the adoption of projected probability from subjective logic. Specifically, Re-EDL treats the prior weight as an adjustable hyperparameter rather than a fixed scalar, and directly optimizes the expectation of the Dirichlet PDF provided by deprecating both the variance-minimizing optimization term and the divergence regularization term. Extensive experiments and state-of-the-art performance validate the effectiveness of our method. The source code is available at https://github.com/MengyuanChen21/Re-EDL.

Updated: 2024-10-01 04:27:07

标题: 重温证据深度学习的基本和非基本设置

摘要: 证据深度学习（EDL）是一种新兴的不确定性估计方法，提供可靠的预测不确定性，仅需进行一次前向传递即可获得，受到了广泛关注。基于主观逻辑，EDL从神经网络中导出Dirichlet集中参数，构建Dirichlet概率密度函数（PDF），建模分类概率的分布。尽管取得了成功，EDL包含了几个非必要的设置：在模型构建中，（1）一个常被忽视的先验权重参数被固定为类的数量，而其实际值影响了证据比例与其大小在推导预测分数中的平衡。在模型优化中，（2）经验风险特征一个方差最小化的优化项，使得PDF偏向于Dirac delta函数，可能加剧自信心过强。（3）此外，结构风险通常包括一个KL散度最小化的正则化，其优化方向超出了预期目的，违背常识，降低了证据量所携带的信息。因此，我们提出了Re-EDL，这是EDL的一个简化但更有效的变体，通过放宽非必要设置并保留必要设置——即采用主观逻辑中的投影概率。具体来说，Re-EDL将先验权重视为可调节的超参数，而不是固定的标量，并直接优化由于废弃方差最小化优化项和散度正则项而提供的Dirichlet PDF的期望值。大量实验和最先进的性能验证了我们方法的有效性。源代码可在https://github.com/MengyuanChen21/Re-EDL找到。

更新时间: 2024-10-01 04:27:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.00393v1

Off-Path TCP Hijacking in Wi-Fi Networks: A Packet-Size Side Channel Attack

In this paper, we unveil a fundamental side channel in Wi-Fi networks, specifically the observable frame size, which can be exploited by attackers to conduct TCP hijacking attacks. Despite the various security mechanisms (e.g., WEP and WPA2/WPA3) implemented to safeguard Wi-Fi networks, our study reveals that an off path attacker can still extract sufficient information from the frame size side channel to hijack the victim's TCP connection. Our side channel attack is based on two significant findings: (i) response packets (e.g., ACK and RST) generated by TCP receivers vary in size, and (ii) the encrypted frames containing these response packets have consistent and distinguishable sizes. By observing the size of the victim's encrypted frames, the attacker can detect and hijack the victim's TCP connections. We validate the effectiveness of this side channel attack through two case studies, i.e., SSH DoS and web traffic manipulation. Precisely, our attack can terminate the victim's SSH session in 19 seconds and inject malicious data into the victim's web traffic within 28 seconds. Furthermore, we conduct extensive measurements to evaluate the impact of our attack on real-world Wi-Fi networks. We test 30 popular wireless routers from 9 well-known vendors, and none of these routers can protect victims from our attack. Besides, we implement our attack in 80 real-world Wi-Fi networks and successfully hijack the victim's TCP connections in 75 (93.75%) evaluated Wi-Fi networks. We have responsibly disclosed the vulnerability to the Wi-Fi Alliance and proposed several mitigation strategies to address this issue.

Updated: 2024-10-01 04:22:34

标题: Wi-Fi网络中的Off-Path TCP劫持：一种数据包大小侧信道攻击

摘要: 在这篇论文中，我们揭示了Wi-Fi网络中的一个基本侧信道，即可观察到的帧大小，攻击者可以利用这一侧信道进行TCP劫持攻击。尽管已经实施了各种安全机制（例如WEP和WPA2/WPA3）来保护Wi-Fi网络，但我们的研究表明，一个非正常路径的攻击者仍然可以从帧大小侧信道中提取足够的信息来劫持受害者的TCP连接。我们的侧信道攻击基于两个重要发现：（i）TCP接收器生成的响应数据包（例如ACK和RST）大小不同，（ii）包含这些响应数据包的加密帧具有一致且可区分的大小。通过观察受害者的加密帧大小，攻击者可以检测和劫持受害者的TCP连接。我们通过两个案例研究验证了这种侧信道攻击的有效性，即SSH DoS和Web流量操纵。具体来说，我们的攻击可以在19秒内终止受害者的SSH会话，并在28秒内向受害者的Web流量中注入恶意数据。此外，我们进行了大量的测量来评估我们的攻击对真实世界Wi-Fi网络的影响。我们测试了来自9个知名供应商的30个热门无线路由器，这些路由器中没有一个可以保护受害者免受我们的攻击。此外，我们在80个真实世界Wi-Fi网络中实施我们的攻击，并成功地在75（93.75％）个评估的Wi-Fi网络中劫持受害者的TCP连接。我们已经负责向Wi-Fi联盟披露了这一漏洞，并提出了几种缓解策略来解决这个问题。

更新时间: 2024-10-01 04:22:34

领域: cs.NI,cs.CR

下载: http://arxiv.org/abs/2402.12716v5

Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation

The data and compute requirements of current language modeling technology pose challenges for the processing and analysis of low-resource languages. Declarative linguistic knowledge has the potential to partially bridge this data scarcity gap by providing models with useful inductive bias in the form of language-specific rules. In this paper, we propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing. We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM. The results demonstrate that significant leaps in performance and efficiency are possible with the right combination of: a) linguistic inputs in the form of grammars, b) the interpretive power of LLMs, and c) the trainability of smaller token classification networks. We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages. Our work also offers documentary linguists a more reliable and more usable tool for morphological glossing by providing well-reasoned explanations and confidence scores for each output.

Updated: 2024-10-01 04:20:14

标题: 利用大型语言模型和检索增强生成技术提升低数据情境下紧凑模型的能力

摘要: 当前语言建模技术的数据和计算需求对于处理和分析低资源语言提出了挑战。声明性语言知识有可能部分弥补这种数据稀缺差距，通过提供具有语言特定规则形式的归纳偏见来为模型提供帮助。在本文中，我们提出了一个基于大型语言模型（LLM）的检索增强生成（RAG）框架，用于纠正一个较小模型在形态学标注这一语言任务中的输出。我们利用语言信息弥补数据和可训练参数的缺失，同时允许从书面描述性语法中获得输入，并通过LLM进行解释和提炼。结果表明，在正确结合以下因素的情况下，性能和效率可以有显著提升：a）以语法形式的语言输入，b）LLM的解释能力，c）较小令牌分类网络的可训练性。我们展示了在数据稀缺环境中，一个紧凑的、受RAG支持的模型在这一任务和我们的目标语言中取得了新的最佳表现。我们的工作还为文献语言学家提供了一种更可靠且更易用的形态学标注工具，通过为每个输出提供合理解释和置信度评分。

更新时间: 2024-10-01 04:20:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.00387v1

Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering

Despite the impressive success of text-to-image (TTI) generation models, existing studies overlook the issue of whether these models accurately convey factual information. In this paper, we focus on the problem of image hallucination, where images created by generation models fail to faithfully depict factual content. To address this, we introduce I-HallA (Image Hallucination evaluation with Question Answering), a novel automated evaluation metric that measures the factuality of generated images through visual question answering (VQA). We also introduce I-HallA v1.0, a curated benchmark dataset for this purpose. As part of this process, we develop a pipeline that generates high-quality question-answer pairs using multiple GPT-4 Omni-based agents, with human judgments to ensure accuracy. Our evaluation protocols measure image hallucination by testing if images from existing text-to-image models can correctly respond to these questions. The I-HallA v1.0 dataset comprises 1.2K diverse image-text pairs across nine categories with 1,000 rigorously curated questions covering various compositional challenges. We evaluate five text-to-image models using I-HallA and reveal that these state-of-the-art models often fail to accurately convey factual information. Moreover, we validate the reliability of our metric by demonstrating a strong Spearman correlation (rho=0.95) with human judgments. We believe our benchmark dataset and metric can serve as a foundation for developing factually accurate text-to-image generation models.

Updated: 2024-10-01 04:19:55

标题: 使用问答评估文本到图像生成中的图像幻觉

摘要: 尽管文本到图像（TTI）生成模型取得了令人印象深刻的成功，但现有研究忽视了这些模型是否能准确传达事实信息的问题。在本文中，我们专注于图像幻觉的问题，即由生成模型创建的图像未能忠实地描绘事实内容。为了解决这个问题，我们引入了I-HallA（使用问答评估图像幻觉），这是一种通过视觉问答（VQA）测量生成图像事实性的新颖自动评估指标。我们还引入了I-HallA v1.0，这是一个专门为此目的而精心策划的基准数据集。作为这一过程的一部分，我们开发了一个流程，使用多个基于GPT-4 Omni的代理生成高质量的问题-答案对，并通过人类判断确保准确性。我们的评估协议通过测试现有文本到图像模型的图像是否能正确回答这些问题来衡量图像幻觉。I-HallA v1.0数据集包括涵盖九个类别的1.2K多样化图像文本对，其中包含1,000个严格策划的问题，涵盖各种构图挑战。我们使用I-HallA评估了五种文本到图像模型，并揭示这些最先进模型经常未能准确传达事实信息。此外，我们通过展示与人类判断之间的强大Spearman相关性（rho=0.95）验证了我们指标的可靠性。我们相信我们的基准数据集和指标可以为开发事实准确的文本到图像生成模型奠定基础。

更新时间: 2024-10-01 04:19:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2409.12784v3

Generative Precipitation Downscaling using Score-based Diffusion with Wasserstein Regularization

Understanding local risks from extreme rainfall, such as flooding, requires both long records (to sample rare events) and high-resolution products (to assess localized hazards). Unfortunately, there is a dearth of long-record and high-resolution products that can be used to understand local risk and precipitation science. In this paper, we present a novel generative diffusion model that downscales (super-resolves) globally available Climate Prediction Center (CPC) gauge-based precipitation products and ERA5 reanalysis data to generate kilometer-scale precipitation estimates. Downscaling gauge-based precipitation from 55 km to 1 km while recovering extreme rainfall signals poses significant challenges. To enforce our model (named WassDiff) to produce well-calibrated precipitation intensity values, we introduce a Wasserstein Distance Regularization (WDR) term for the score-matching training objective in the diffusion denoising process. We show that WDR greatly enhances the model's ability to capture extreme values compared to diffusion without WDR. Extensive evaluation shows that WassDiff has better reconstruction accuracy and bias scores than conventional score-based diffusion models. Case studies of extreme weather phenomena, like tropical storms and cold fronts, demonstrate WassDiff's ability to produce appropriate spatial patterns while capturing extremes. Such downscaling capability enables the generation of extensive km-scale precipitation datasets from existing historical global gauge records and current gauge measurements in areas without high-resolution radar.

Updated: 2024-10-01 04:12:40

标题: 使用基于分数的扩散和Wasserstein正则化进行降尺度的生成性降水

摘要: 理解极端降雨风险（如洪水）需要长期记录（以采样罕见事件）和高分辨率产品（以评估局部危险）。不幸的是，缺乏可用于理解本地风险和降水科学的长期记录和高分辨率产品。在本文中，我们提出了一种新颖的生成扩散模型，将全球可用的气候预测中心（CPC）基于降水计量的产品和ERA5再分析数据下缩放（超分辨率）以生成公里尺度的降水估算。将基于计量的降水从55公里下缩放到1公里，同时恢复极端降雨信号面临重大挑战。为了强制我们的模型（命名为WassDiff）产生良好校准的降水强度数值，我们在扩散去噪过程中引入了Wasserstein距离正则化（WDR）项用于得分匹配训练目标。我们展示了WDR相对于没有WDR的扩散能极大增强模型捕获极端值的能力。广泛的评估显示，WassDiff比传统基于得分的扩散模型具有更好的重建精度和偏差分数。对极端天气现象的案例研究，如热带风暴和冷锋，展示了WassDiff产生适当空间模式并捕捉极端值的能力。这种下缩放能力使得可以从现有历史全球计量记录和没有高分辨率雷达的区域的当前计量测量生成广泛的公里尺度降水数据集。

更新时间: 2024-10-01 04:12:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.00381v1

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers. These textual emphases are vital for the readers to grasp the conveyed information. When interacting with large language models (LLMs), we have a similar need -- steering the model to pay closer attention to user-specified information, e.g., an instruction. Existing methods, however, are constrained to process plain text and do not support such a mechanism. This motivates us to introduce PASTA -- Post-hoc Attention STeering Approach, a method that allows LLMs to read text with user-specified emphasis marks. To this end, PASTA identifies a small subset of attention heads and applies precise attention reweighting on them, directing the model attention to user-specified parts. Like prompting, PASTA is applied at inference time and does not require changing any model parameters. Experiments demonstrate that PASTA can substantially enhance an LLM's ability to follow user instructions or integrate new knowledge from user inputs, leading to a significant performance improvement on a variety of tasks, e.g., an average accuracy improvement of 22% for LLAMA-7B. Our code is publicly available at https://github.com/QingruZhang/PASTA .

Updated: 2024-10-01 04:10:34

标题: 告诉你的模型在哪里关注：LLMs的事后注意力引导

摘要: 在人类撰写的文章中，我们经常利用文本样式的微妙之处，例如粗体和斜体，来引导读者的注意力。这些文本强调对于读者掌握传达的信息至关重要。当与大型语言模型（LLMs）进行交互时，我们有类似的需求--引导模型更加关注用户指定的信息，例如指令。然而，现有方法受限于处理纯文本，并不支持这样的机制。这促使我们引入PASTA--后期注意力引导方法，一种允许LLMs读取带有用户指定强调标记的文本的方法。为此，PASTA识别出一小部分注意力头，并在它们上应用精确的注意力重新加权，将模型的注意力引导到用户指定的部分。与提示类似，PASTA是在推理时间应用的，并不需要改变任何模型参数。实验表明，PASTA可以显著增强LLMs遵循用户指令或集成来自用户输入的新知识的能力，从而在各种任务上实现显著的性能提升，例如对于LLAMA-7B的平均准确性提升了22%。我们的代码可以在https://github.com/QingruZhang/PASTA 上公开获取。

更新时间: 2024-10-01 04:10:34

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2311.02262v2

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset

X-ray image-based medical report generation (MRG) is a pivotal area in artificial intelligence which can significantly reduce diagnostic burdens and patient wait times. Despite significant progress, we believe that the task has reached a bottleneck due to the limited benchmark datasets and the existing large models' insufficient capability enhancements in this specialized domain. Specifically, the recently released CheXpert Plus dataset lacks comparative evaluation algorithms and their results, providing only the dataset itself. This situation makes the training, evaluation, and comparison of subsequent algorithms challenging. Thus, we conduct a comprehensive benchmarking of existing mainstream X-ray report generation models and large language models (LLMs), on the CheXpert Plus dataset. We believe that the proposed benchmark can provide a solid comparative basis for subsequent algorithms and serve as a guide for researchers to quickly grasp the state-of-the-art models in this field. More importantly, we propose a large model for the X-ray image report generation using a multi-stage pre-training strategy, including self-supervised autoregressive generation and Xray-report contrastive learning, and supervised fine-tuning. Extensive experimental results indicate that the autoregressive pre-training based on Mamba effectively encodes X-ray images, and the image-text contrastive pre-training further aligns the feature spaces, achieving better experimental results. Source code can be found on \url{https://github.com/Event-AHU/Medical_Image_Analysis}.

Updated: 2024-10-01 04:07:01

标题: CXPMRG-Bench：CheXpert Plus数据集上X射线医学报告生成的预训练和基准测试

摘要: 基于X射线图像的医疗报告生成（MRG）是人工智能领域的一个关键领域，可以显著减少诊断负担和患者等待时间。尽管取得了显著进展，但我们认为由于有限的基准数据集和现有大型模型在这一专业领域的能力增强不足，这一任务已经达到了瓶颈。具体来说，最近发布的CheXpert Plus数据集缺乏比较评估算法及其结果，仅提供数据集本身。这种情况使得后续算法的训练、评估和比较变得具有挑战性。因此，我们在CheXpert Plus数据集上对现有主流X射线报告生成模型和大型语言模型（LLMs）进行了全面的基准测试。我们认为所提出的基准测试可以为后续算法提供坚实的比较基础，并作为研究人员快速掌握该领域最新模型的指南。更重要的是，我们提出了一种用于X射线图像报告生成的大型模型，采用多阶段预训练策略，包括自监督自回归生成和X射线报告对比学习，以及监督微调。广泛的实验结果表明，基于Mamba的自回归预训练有效地编码X射线图像，并且图像文本对比预训练进一步对齐特征空间，实现更好的实验结果。源代码可在\url{https://github.com/Event-AHU/Medical_Image_Analysis}上找到。

更新时间: 2024-10-01 04:07:01

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.00379v1

Optimizing and Evaluating Enterprise Retrieval-Augmented Generation (RAG): A Content Design Perspective

Retrieval-augmented generation (RAG) is a popular technique for using large language models (LLMs) to build customer-support, question-answering solutions. In this paper, we share our team's practical experience building and maintaining enterprise-scale RAG solutions that answer users' questions about our software based on product documentation. Our experience has not always matched the most common patterns in the RAG literature. This paper focuses on solution strategies that are modular and model-agnostic. For example, our experience over the past few years - using different search methods and LLMs, and many knowledge base collections - has been that simple changes to the way we create knowledge base content can have a huge impact on our RAG solutions' success. In this paper, we also discuss how we monitor and evaluate results. Common RAG benchmark evaluation techniques have not been useful for evaluating responses to novel user questions, so we have found a flexible, "human in the lead" approach is required.

Updated: 2024-10-01 03:54:45

标题: 优化和评估企业检索增强生成（RAG）：内容设计视角

摘要: 检索增强生成（RAG）是一种流行的技术，用于利用大型语言模型（LLMs）构建客户支持、问题解答解决方案。本文分享了我们团队在建立和维护基于产品文档回答用户关于软件问题的企业规模RAG解决方案方面的实践经验。我们的经验并不总是符合RAG文献中最常见的模式。本文侧重于模块化和模型无关的解决方案策略。例如，我们在过去几年中使用不同的搜索方法和LLMs以及许多知识库集合的经验表明，对于我们的RAG解决方案成功，对知识库内容创建方式的简单更改可能会产生巨大影响。在本文中，我们还讨论了如何监控和评估结果。常见的RAG基准评估技术并没有对评估对新用户问题的回答有用，因此我们发现需要采用灵活的“人为主导”的方法。

更新时间: 2024-10-01 03:54:45

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2410.12812v1

Robust Traffic Forecasting against Spatial Shift over Years

Recent advancements in Spatiotemporal Graph Neural Networks (ST-GNNs) and Transformers have demonstrated promising potential for traffic forecasting by effectively capturing both temporal and spatial correlations. The generalization ability of spatiotemporal models has received considerable attention in recent scholarly discourse. However, no substantive datasets specifically addressing traffic out-of-distribution (OOD) scenarios have been proposed. Existing ST-OOD methods are either constrained to testing on extant data or necessitate manual modifications to the dataset. Consequently, the generalization capacity of current spatiotemporal models in OOD scenarios remains largely underexplored. In this paper, we investigate state-of-the-art models using newly proposed traffic OOD benchmarks and, surprisingly, find that these models experience a significant decline in performance. Through meticulous analysis, we attribute this decline to the models' inability to adapt to previously unobserved spatial relationships. To address this challenge, we propose a novel Mixture of Experts (MoE) framework, which learns a set of graph generators (i.e., graphons) during training and adaptively combines them to generate new graphs based on novel environmental conditions to handle spatial distribution shifts during testing. We further extend this concept to the Transformer architecture, achieving substantial improvements. Our method is both parsimonious and efficacious, and can be seamlessly integrated into any spatiotemporal model, outperforming current state-of-the-art approaches in addressing spatial dynamics.

Updated: 2024-10-01 03:49:29

标题: 抗击多年来的空间漂移的稳健交通预测

摘要: 最近在时空图神经网络（ST-GNNs）和Transformer方面的进展表明，通过有效捕捉时间和空间相关性，这些方法在交通预测方面展现出了很大的潜力。时空模型的泛化能力在近期的学术讨论中受到了相当大的关注。然而，目前尚未提出针对交通场景中的分布外（OOD）情况的实质性数据集。现有的ST-OOD方法要么受限于对现有数据的测试，要么需要对数据集进行手动修改。因此，当前时空模型在OOD场景中的泛化能力仍然大部分未被探索。本文通过使用新提出的交通OOD基准来研究最先进的模型，并令人惊讶地发现这些模型在性能上出现了显著下降。通过细致的分析，我们将这种下降归因于模型无法适应先前未观察到的空间关系。为了解决这一挑战，我们提出了一种新颖的专家混合（MoE）框架，在训练过程中学习一组图生成器（即图形），并在测试过程中根据新的环境条件自适应地将它们组合起来生成新的图来处理空间分布的变化。我们进一步将这个概念扩展到Transformer架构，取得了显著的改进。我们的方法既简洁又有效，并且可以无缝地集成到任何时空模型中，优于当前解决空间动态问题的最先进方法。

更新时间: 2024-10-01 03:49:29

领域: cs.LG,cs.AI,cs.DB,stat.ML

下载: http://arxiv.org/abs/2410.00373v1

Cloud-based XAI Services for Assessing Open Repository Models Under Adversarial Attacks

The opacity of AI models necessitates both validation and evaluation before their integration into services. To investigate these models, explainable AI (XAI) employs methods that elucidate the relationship between input features and output predictions. The operations of XAI extend beyond the execution of a single algorithm, involving a series of activities that include preprocessing data, adjusting XAI to align with model parameters, invoking the model to generate predictions, and summarizing the XAI results. Adversarial attacks are well-known threats that aim to mislead AI models. The assessment complexity, especially for XAI, increases when open-source AI models are subject to adversarial attacks, due to various combinations. To automate the numerous entities and tasks involved in XAI-based assessments, we propose a cloud-based service framework that encapsulates computing components as microservices and organizes assessment tasks into pipelines. The current XAI tools are not inherently service-oriented. This framework also integrates open XAI tool libraries as part of the pipeline composition. We demonstrate the application of XAI services for assessing five quality attributes of AI models: (1) computational cost, (2) performance, (3) robustness, (4) explanation deviation, and (5) explanation resilience across computer vision and tabular cases. The service framework generates aggregated analysis that showcases the quality attributes for more than a hundred combination scenarios.

Updated: 2024-10-01 03:41:26

标题: 基于云的XAI服务，用于评估遭受对抗攻击的开放存储库模型

摘要: 人工智能模型的不透明性需要在将其整合到服务之前进行验证和评估。为了调查这些模型，可解释的人工智能（XAI）采用方法阐明输入特征和输出预测之间的关系。XAI的操作不仅限于执行单个算法，还涉及一系列活动，包括数据预处理、调整XAI以与模型参数对齐、调用模型生成预测以及总结XAI结果。对AI模型进行误导的对抗性攻击是众所周知的威胁。评估复杂性，特别是对XAI而言，当开源AI模型受到各种组合的对抗性攻击时会增加。为了自动化涉及XAI评估的众多实体和任务，我们提出了一个基于云的服务框架，将计算组件封装为微服务，并将评估任务组织成管道。目前的XAI工具并非天然面向服务。该框架还将开源XAI工具库整合为管道组成的一部分。我们演示了XAI服务应用于评估AI模型的五个质量属性：（1）计算成本、（2）性能、（3）鲁棒性、（4）解释偏差和（5）解释弹性，在计算机视觉和表格案例中。该服务框架生成了汇总分析，展示了一百多种组合场景的质量属性。

更新时间: 2024-10-01 03:41:26

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2401.12261v4

Block-Attention for Efficient RAG

We introduce Block-Attention, an attention mechanism designed to address the increased inference latency and cost in Retrieval-Augmented Generation (RAG) scenarios. Traditional approaches often encode the entire context. Instead, Block-Attention divides retrieved documents into discrete blocks, with each block independently calculating key-value (KV) states except for the final block. In RAG scenarios, by defining each passage as a block, Block-Attention enables us to reuse the KV states of passages that have been seen before, thereby significantly reducing the latency and the computation overhead during inference. The implementation of Block-Attention involves block segmentation, position re-encoding, and fine-tuning the LLM to adapt to the Block-Attention mechanism. Experiments on four RAG benchmarks demonstrate that after block fine-tuning, the Block-Attention model achieves performance comparable to self-attention models (68.4\% vs 67.9\% on Llama3) or even superior performance (62.8\% vs 59.6\% on Mistral). Notably, Block-Attention significantly reduces the time to first token (TTFT) and floating point operations (FLOPs) to a very low level. It only takes 45 ms to output the first token for an input sequence with a total length of 32K. Compared to the self-attention models, the time consumption and corresponding FLOPs are reduced by 98.7\% and 99.8\%, respectively.

Updated: 2024-10-01 03:40:08

标题: 块级关注以提高RAG效率

摘要: 我们介绍了Block-Attention，这是一种专门设计用来解决检索增强生成（RAG）场景中增加的推理延迟和成本的注意力机制。传统方法通常对整个上下文进行编码。相反，Block-Attention将已检索的文档分成离散的块，每个块都独立计算关键值（KV）状态，除了最后一个块。在RAG场景中，通过将每个段落定义为一个块，Block-Attention使我们能够重复使用之前已经查看过的段落的KV状态，从而显著减少推理过程中的延迟和计算开销。Block-Attention的实现涉及块分割、位置重新编码以及对LLM进行微调以适应Block-Attention机制。对四个RAG基准的实验表明，在块微调之后，Block-Attention模型实现了与自注意力模型相当的性能（Llama3上为68.4％vs 67.9％）甚至更高的性能（Mistral上为62.8％vs 59.6％）。值得注意的是，Block-Attention显著降低了第一个标记的时间（TTFT）和浮点操作（FLOPs）到一个非常低的水平。对于总长度为32K的输入序列，仅需要45毫秒输出第一个标记。与自注意力模型相比，时间消耗和相应的FLOPs分别减少了98.7％和99.8％。

更新时间: 2024-10-01 03:40:08

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2409.15355v3

A SSM is Polymerized from Multivariate Time Series

For multivariate time series (MTS) tasks, previous state space models (SSMs) followed the modeling paradigm of Transformer-based methods. However, none of them explicitly model the complex dependencies of MTS: the Channel Dependency variations with Time (CDT). In view of this, we delve into the derivation of SSM, which involves approximating continuously updated functions by orthogonal function basis. We then develop Poly-Mamba, a novel method for MTS forecasting. Its core concept is to expand the original orthogonal function basis space into a multivariate orthogonal function space containing variable mixing terms, and make a projection on this space so as to explicitly describe the CDT by weighted coefficients. In Poly-Mamba, we propose the Multivariate Orthogonal Polynomial Approximation (MOPA) as a simplified implementation of this concept. For the simple linear relationship between channels, we propose Linear Channel Mixing (LCM) and generate CDT patterns adaptively for different channels through a proposed Order Combining method. Experiments on six real-world datasets demonstrate that Poly-Mamba outperforms the SOTA methods, especially when dealing with datasets having a large number of channels and complex correlations. The codes and log files will be released at: https://github.com/Joeland4/Poly-Mamba.

Updated: 2024-10-01 03:32:24

标题: 一种基于多变量时间序列的SSM聚合方法

摘要: 对于多变量时间序列（MTS）任务，先前的状态空间模型（SSMs）遵循了基于Transformer的方法的建模范式。然而，它们中没有一个明确地建模了MTS的复杂依赖关系：通道依赖性随时间的变化（CDT）。鉴于此，我们深入研究了SSM的推导过程，其中涉及通过正交函数基逼近持续更新的函数。然后，我们开发了Poly-Mamba，一种用于MTS预测的新方法。其核心概念是将原始正交函数基空间扩展为包含可变混合项的多变量正交函数空间，并在这个空间上进行投影，以明确描述CDT的加权系数。在Poly-Mamba中，我们提出了多变量正交多项式近似（MOPA）作为这一概念的简化实现。针对通道之间的简单线性关系，我们提出了线性通道混合（LCM），并通过提出的顺序组合方法自适应地为不同通道生成CDT模式。对六个真实世界数据集的实验表明，Poly-Mamba比SOTA方法表现更好，特别是在处理具有大量通道和复杂相关性的数据集时。代码和日志文件将在以下网址发布：https://github.com/Joeland4/Poly-Mamba。

更新时间: 2024-10-01 03:32:24

领域: cs.LG

下载: http://arxiv.org/abs/2409.20310v2

Easydiagnos: a framework for accurate feature selection for automatic diagnosis in smart healthcare

The rapid advancements in artificial intelligence (AI) have revolutionized smart healthcare, driving innovations in wearable technologies, continuous monitoring devices, and intelligent diagnostic systems. However, security, explainability, robustness, and performance optimization challenges remain critical barriers to widespread adoption in clinical environments. This research presents an innovative algorithmic method using the Adaptive Feature Evaluator (AFE) algorithm to improve feature selection in healthcare datasets and overcome problems. AFE integrating Genetic Algorithms (GA), Explainable Artificial Intelligence (XAI), and Permutation Combination Techniques (PCT), the algorithm optimizes Clinical Decision Support Systems (CDSS), thereby enhancing predictive accuracy and interpretability. The proposed method is validated across three diverse healthcare datasets using six distinct machine learning algorithms, demonstrating its robustness and superiority over conventional feature selection techniques. The results underscore the transformative potential of AFE in smart healthcare, enabling personalized and transparent patient care. Notably, the AFE algorithm, when combined with a Multi-layer Perceptron (MLP), achieved an accuracy of up to 98.5%, highlighting its capability to improve clinical decision-making processes in real-world healthcare applications.

Updated: 2024-10-01 03:28:56

标题: Easydiagnos：智能医疗中准确特征选择的框架

摘要: 人工智能（AI）的快速发展已经彻底改变了智能医疗，推动了可穿戴技术、连续监测设备和智能诊断系统的创新。然而，在临床环境中，安全性、可解释性、稳健性和性能优化等挑战仍然是广泛应用的关键障碍。本研究提出了一种创新的算法方法，使用自适应特征评估器（AFE）算法改善医疗数据集中的特征选择，并克服问题。AFE将遗传算法（GA）、可解释人工智能（XAI）和排列组合技术（PCT）集成在一起，该算法优化临床决策支持系统（CDSS），从而提高预测准确性和可解释性。提出的方法在三个不同的医疗数据集上通过六种不同的机器学习算法进行验证，证明了其对传统特征选择技术的稳健性和优越性。结果强调了AFE在智能医疗中的变革潜力，实现了个性化和透明的患者护理。值得注意的是，当AFE算法与多层感知器（MLP）结合时，准确率可达98.5%，突显了其在现实世界医疗应用中改善临床决策过程的能力。

更新时间: 2024-10-01 03:28:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2410.00366v1

FedPT: Federated Proxy-Tuning of Large Language Models on Resource-Constrained Edge Devices

Despite demonstrating superior performance across a variety of linguistic tasks, pre-trained large language models (LMs) often require fine-tuning on specific datasets to effectively address different downstream tasks. However, fine-tuning these LMs for downstream tasks necessitates collecting data from individuals, which raises significant privacy concerns. Federated learning (FL) has emerged as the de facto solution, enabling collaborative model training without sharing raw data. While promising, federated fine-tuning of large LMs faces significant challenges, including restricted access to model parameters and high computation, communication, and memory overhead. To address these challenges, this paper introduces \textbf{Fed}erated \textbf{P}roxy-\textbf{T}uning (FedPT), a novel framework for federated fine-tuning of black-box large LMs, requiring access only to their predictions over the output vocabulary instead of their parameters. Specifically, devices in FedPT first collaboratively tune a smaller LM, and then the server combines the knowledge learned by the tuned small LM with the knowledge learned by the larger pre-trained LM to construct a large proxy-tuned LM that can reach the performance of directly tuned large LMs. The experimental results demonstrate that FedPT can significantly reduce computation, communication, and memory overhead while maintaining competitive performance compared to directly federated fine-tuning of large LMs. FedPT offers a promising solution for efficient, privacy-preserving fine-tuning of large LMs on resource-constrained devices, broadening the accessibility and applicability of state-of-the-art large LMs.

Updated: 2024-10-01 03:20:39

标题: FedPT: 在资源受限的边缘设备上对大型语言模型进行联合代理调优

摘要: 尽管预训练的大型语言模型（LMs）在各种语言任务中表现出优越的性能，但通常需要在特定数据集上进行微调，以有效地解决不同的下游任务。然而，为了下游任务微调这些LMs，需要收集个人数据，这引发了重大的隐私问题。联邦学习（FL）已成为事实上的解决方案，实现了协作模型训练，而无需共享原始数据。尽管有希望，大型LMs的联邦微调面临着重大挑战，包括对模型参数的访问受限以及高计算、通信和内存开销。为了解决这些挑战，本文介绍了FedPT（联邦代理微调）这一新颖的框架，用于对黑匣子大型LMs进行联邦微调，只需访问它们对输出词汇的预测，而不是它们的参数。具体来说，FedPT中的设备首先协作微调一个较小的LM，然后服务器将调整后的小LM学到的知识与较大的预训练LM学到的知识结合起来，构建一个可以达到直接微调大型LMs性能的大型代理微调LM。实验结果表明，与直接联邦微调大型LMs相比，FedPT可以显著减少计算、通信和内存开销，同时保持竞争性能。FedPT为在资源受限设备上高效、隐私保护的对大型LMs进行微调提供了有希望的解决方案，拓宽了最先进大型LMs的可访问性和适用性。

更新时间: 2024-10-01 03:20:39

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.00362v1

Self-controller: Controlling LLMs with Multi-round Step-by-step Self-awareness

The applications of large language models (LLMs) have been widely spread across all domains. However, the basic abilities such as the controllability of LLMs are still limited. To address this, we propose "Self-controller", a novel agentic framework bringing self-awareness into LLMs' reasoning logic. The core idea of this work is to maintain states based on the LLM's response, letting the LLM become self-aware of current status and think step by step in a multi-round chain-of-thought paradigm. Our experiment on the state of textual length has shown the controllability and effectiveness of the Self-controller. We further implement a binary search algorithm to accelerate the generation process based on the linearity and monotonicity of the textual length state. Another advantage of the Self-controller comes with DeepSeek's Context Caching technology, which significantly saves computational token consumption when a cluster of conversations shares the same prefix of context. Theoretically, we prove that in this scenario the extra time complexity is $O(c \log n)$. Results of the back-of-the-envelope estimation suggest that the token consumption of our method is no more than twice as much as that of the trivial single-round generation. Furthermore, our ablation study on word constraints demonstrates the Self-controller's consistent controllability across all foundation models.

Updated: 2024-10-01 03:14:12

标题: 自我控制器：用多轮逐步自我感知控制LLMs

摘要: 大型语言模型（LLMs）的应用已经广泛传播到各个领域。然而，LLMs的基本能力，如可控性仍然有限。为了解决这个问题，我们提出了一种新颖的主动框架“自我控制器”，将自我意识引入LLMs的推理逻辑中。这项工作的核心思想是基于LLMs的响应维护状态，让LLMs意识到当前状态，并按照多轮思维链的范式逐步思考。我们对文本长度状态进行的实验显示了自我控制器的可控性和有效性。我们进一步实现了基于文本长度状态的线性和单调性的二分搜索算法，加速生成过程。自我控制器的另一个优势在于DeepSeek的上下文缓存技术，当一组对话共享相同的上下文前缀时，可以显著节省计算令牌消耗。在理论上，我们证明在这种情况下额外的时间复杂度为$O(c \log n)$。草稿估算的结果表明，我们方法的令牌消耗量不会超过单轮生成的两倍。此外，我们对单词约束的消融研究表明自我控制器在所有基础模型上都具有一致的可控性。

更新时间: 2024-10-01 03:14:12

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.00359v1

Privacy Evaluation Benchmarks for NLP Models

By inducing privacy attacks on NLP models, attackers can obtain sensitive information such as training data and model parameters, etc. Although researchers have studied, in-depth, several kinds of attacks in NLP models, they are non-systematic analyses. It lacks a comprehensive understanding of the impact caused by the attacks. For example, we must consider which scenarios can apply to which attacks, what the common factors are that affect the performance of different attacks, the nature of the relationships between different attacks, and the influence of various datasets and models on the effectiveness of the attacks, etc. Therefore, we need a benchmark to holistically assess the privacy risks faced by NLP models. In this paper, we present a privacy attack and defense evaluation benchmark in the field of NLP, which includes the conventional/small models and large language models (LLMs). This benchmark supports a variety of models, datasets, and protocols, along with standardized modules for comprehensive evaluation of attacks and defense strategies. Based on the above framework, we present a study on the association between auxiliary data from different domains and the strength of privacy attacks. And we provide an improved attack method in this scenario with the help of Knowledge Distillation (KD). Furthermore, we propose a chained framework for privacy attacks. Allowing a practitioner to chain multiple attacks to achieve a higher-level attack objective. Based on this, we provide some defense and enhanced attack strategies. The code for reproducing the results can be found at https://github.com/user2311717757/nlp_doctor.

Updated: 2024-10-01 03:12:35

标题: NLP模型的隐私评估基准

摘要: 通过对NLP模型进行隐私攻击，攻击者可以获取敏感信息，如训练数据和模型参数等。尽管研究人员深入研究了NLP模型中几种攻击，但它们并非系统性分析。缺乏对攻击造成影响的全面理解。例如，我们必须考虑哪些情景适用于哪些攻击，影响不同攻击性能的共同因素是什么，不同攻击之间的关系性质是什么，以及各种数据集和模型对攻击效果的影响等。因此，我们需要一个基准来全面评估NLP模型面临的隐私风险。在本文中，我们提出了一个NLP领域中的隐私攻击和防御评估基准，其中包括传统/小模型和大语言模型（LLMs）。这个基准支持各种模型、数据集和协议，包括用于全面评估攻击和防御策略的标准化模块。基于上述框架，我们对不同领域的辅助数据与隐私攻击强度之间的关联进行了研究。我们在这种情景中提出了一种改进的攻击方法，利用知识蒸馏（KD）的帮助。此外，我们提出了一个用于隐私攻击的链式框架。允许从业者链接多个攻击以实现更高级别的攻击目标。基于此，我们提供了一些防御和增强攻击策略。可以在https://github.com/user2311717757/nlp_doctor找到复现结果的代码。

更新时间: 2024-10-01 03:12:35

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2409.15868v3

Fair Ordering in Replicated Systems via Streaming Social Choice

Prior work studies the question of ``fairly'' ordering transactions in a replicated state machine. Each of $n$ replicas receives transactions in a possibly different order, and the system must aggregate the observed orderings into a single order. We argue that this problem is best viewed through the lens of social choice theory, in which (in the preference aggregation problem) rankings on candidates are aggregated into an election result. Two features make this problem novel. First, the number of transactions is unbounded, and an ordering must be defined over a countably infinite set. And second, decisions must be made quickly, with only partial information. Additionally, some faulty replicas might alter their reported observations; their influence on the output should be bounded and well understood. Prior work studies a ``$\gamma$-batch-order-fairness'' property, which divides an ordering into contiguous batches. If a $\gamma$ fraction of replicas receive $\tau$ before $\tau^\prime$, then $\tau^\prime$ cannot be in an earlier batch than $\tau$. We strengthen this definition to require that batches have minimal size ($\gamma$-batch-order-fairness can be vacuously satisfied by large batches) while accounting for the possibility of faulty replicas. This social choice lens enables an ordering protocol with strictly stronger fairness and liveness properties than prior work. We study the Ranked Pairs method. Analysis of how missing information moves through the algorithm allows our streaming version to know when it can output a transaction. Deliberate construction of a tiebreaking rule ensures our algorithm outputs a transaction after a bounded time (in a synchronous network). Prior work relies on a fixed choice of $\gamma$ and bound on the number of faulty replicas $f$, but our algorithm satisfies our definition for every $\frac{1}{2}<\gamma\leq 1$ simultaneously and for any $f$.

Updated: 2024-10-01 02:58:56

标题: 通过流式社会选择在复制系统中实现公平排序

摘要: 先前的工作研究了在复制状态机中“公平”地排序交易的问题。每个$n$个副本可能按不同顺序接收交易，系统必须将观察到的顺序聚合成一个单一的顺序。我们认为最好通过社会选择理论的视角来看待这个问题，在这个理论中（在偏好聚合问题中）对候选人的排名被聚合成选举结果。这个问题有两个新颖的特点。首先，交易数量是无限的，必须在可数无限集合上定义顺序。其次，必须快速做出决策，只有部分信息。此外，一些有缺陷的副本可能改变其报告的观察结果；它们对输出的影响应该是有界且被充分理解的。先前的工作研究了“$\gamma$-批量顺序公平性”属性，它将一个排序分成连续的批次。如果$\gamma$分数的副本在$\tau$之前收到$\tau^\prime$，那么$\tau^\prime$不能在比$\tau$更早的批次中。我们加强了这个定义，要求批次具有最小的大小（$\gamma$-批量顺序公平性可以被大批次满足），同时考虑了有缺陷副本的可能性。这种社会选择的视角使得一个排序协议具有比先前工作更强的公平性和活性特性。我们研究了排名对方法。通过分析信息缺失如何通过算法传递，使得我们的流式版本知道何时可以输出一个交易。经过深思熟虑的决策规则的设计确保我们的算法在有界时间后输出一个交易（在同步网络中）。先前的工作依赖于固定的$\gamma$选择和有缺陷副本数量$f$的限制，但我们的算法同时满足了我们的定义对于每个$\frac{1}{2}<\gamma\leq 1$和任意$f$。

更新时间: 2024-10-01 02:58:56

领域: cs.CR

下载: http://arxiv.org/abs/2304.02730v4

Exploring Semantic Clustering in Deep Reinforcement Learning for Video Games

In this paper, we investigate the semantic clustering properties of deep reinforcement learning (DRL) for video games, enriching our understanding of the internal dynamics of DRL and advancing its interpretability. In this context, semantic clustering refers to the inherent capacity of neural networks to internally group video inputs based on semantic similarity. To achieve this, we propose a novel DRL architecture that integrates a semantic clustering module featuring both feature dimensionality reduction and online clustering. This module seamlessly integrates into the DRL training pipeline, addressing instability issues observed in previous t-SNE-based analysis methods and eliminating the necessity for extensive manual annotation of semantic analysis. Through experiments, we validate the effectiveness of the proposed module and the semantic clustering properties in DRL for video games. Additionally, based on these properties, we introduce new analytical methods to help understand the hierarchical structure of policies and the semantic distribution within the feature space.

Updated: 2024-10-01 02:54:41

标题: 探索深度强化学习在视频游戏中的语义聚类

摘要: 在本文中，我们研究了深度强化学习（DRL）在视频游戏中的语义聚类特性，丰富了我们对DRL内部动态的理解，推动了其可解释性的发展。在这种情况下，语义聚类是指神经网络根据语义相似性在内部对视频输入进行分组的固有能力。为了实现这一目标，我们提出了一种新颖的DRL架构，它集成了一个特征维度缩减和在线聚类的语义聚类模块。该模块无缝地集成到DRL训练流程中，解决了先前基于t-SNE的分析方法中观察到的不稳定性问题，并消除了对语义分析的广泛手动标注的必要性。通过实验证实了所提出模块和DRL在视频游戏中的语义聚类特性的有效性。此外，基于这些特性，我们引入了新的分析方法，以帮助理解策略的层次结构和特征空间内的语义分布。

更新时间: 2024-10-01 02:54:41

领域: cs.AI

下载: http://arxiv.org/abs/2409.17411v3

ParFormer: A Vision Transformer with Parallel Mixer and Sparse Channel Attention Patch Embedding

Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable success in computer vision tasks. However, their deep architectures often lead to high computational redundancy, making them less suitable for resource-constrained environments, such as edge devices. This paper introduces ParFormer, a novel vision transformer that addresses this challenge by incorporating a Parallel Mixer and a Sparse Channel Attention Patch Embedding (SCAPE). By combining convolutional and attention mechanisms, ParFormer improves feature extraction. This makes spatial feature extraction more efficient and cuts down on unnecessary computation. The SCAPE module further reduces computational redundancy while preserving essential feature information during down-sampling. Experimental results on the ImageNet-1K dataset show that ParFormer-T achieves 78.9\% Top-1 accuracy with a high throughput on a GPU that outperforms other small models with 2.56$\times$ higher throughput than MobileViT-S, 0.24\% faster than FasterNet-T2, and 1.79$\times$ higher than EdgeNeXt-S. For edge device deployment, ParFormer-T excels with a throughput of 278.1 images/sec, which is 1.38 $\times$ higher than EdgeNeXt-S and 2.36$\times$ higher than MobileViT-S, making it highly suitable for real-time applications in resource-constrained settings. The larger variant, ParFormer-L, reaches 83.5\% Top-1 accuracy, offering a balanced trade-off between accuracy and efficiency, surpassing many state-of-the-art models. In COCO object detection, ParFormer-M achieves 40.7 AP for object detection and 37.6 AP for instance segmentation, surpassing models like ResNet-50, PVT-S and PoolFormer-S24 with significantly higher efficiency. These results validate ParFormer as a highly efficient and scalable model for both high-performance and resource-constrained scenarios, making it an ideal solution for edge-based AI applications.

Updated: 2024-10-01 02:48:20

标题: ParFormer：具有并行混合器和稀疏通道注意力补丁嵌入的视觉Transformer

摘要: 卷积神经网络（CNNs）和Transformer在计算机视觉任务中取得了显著的成功。然而，它们深层的架构通常导致高度的计算冗余，使它们在资源受限的环境（如边缘设备）中不太适用。本文介绍了ParFormer，一种新颖的视觉Transformer，通过引入并行混合器和稀疏通道注意力补丁嵌入（SCAPE）来应对这一挑战。通过结合卷积和注意机制，ParFormer改进了特征提取，使空间特征提取更加高效，并减少了不必要的计算。SCAPE模块进一步减少了计算冗余，同时在下采样过程中保留了重要的特征信息。在ImageNet-1K数据集上的实验结果显示，ParFormer-T在GPU上实现了78.9\%的Top-1准确率，具有较高的吞吐量，优于其他小型模型，比MobileViT-S高2.56倍，比FasterNet-T2快0.24\%，比EdgeNeXt-S高1.79倍。对于边缘设备部署，ParFormer-T在每秒278.1张图像的吞吐量上表现出色，比EdgeNeXt-S高1.38倍，比MobileViT-S高2.36倍，非常适合在资源受限的环境中进行实时应用。更大的变体，ParFormer-L，达到83.5\%的Top-1准确率，在准确性和效率之间取得了平衡，超过了许多最先进的模型。在COCO目标检测中，ParFormer-M实现了40.7的目标检测AP和37.6的实例分割AP，超过了ResNet-50，PVT-S和PoolFormer-S24等模型，具有显著更高的效率。这些结果验证了ParFormer作为一种高效和可扩展的模型，适用于高性能和资源受限的场景，是边缘AI应用的理想解决方案。

更新时间: 2024-10-01 02:48:20

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.15004v2

Almost Sure Convergence of Average Reward Temporal Difference Learning

Tabular average reward Temporal Difference (TD) learning is perhaps the simplest and the most fundamental policy evaluation algorithm in average reward reinforcement learning. After at least 25 years since its discovery, we are finally able to provide a long-awaited almost sure convergence analysis. Namely, we are the first to prove that, under very mild conditions, tabular average reward TD converges almost surely to a sample path dependent fixed point. Key to this success is a new general stochastic approximation result concerning nonexpansive mappings with Markovian and additive noise, built on recent advances in stochastic Krasnoselskii-Mann iterations.

Updated: 2024-10-01 02:33:09

标题: 平均奖励时序差分学习的几乎必然收敛

摘要: 表格平均奖励时间差分（TD）学习可能是平均奖励强化学习中最简单、最基本的策略评估算法。自其发现以来至少已经过去了25年，我们终于能够提供一个期待已久的几乎肯定收敛分析。换句话说，我们是第一个证明，在非常温和的条件下，表格平均奖励TD几乎肯定收敛于样本路径相关的固定点的人。这一成功的关键在于一个新的关于非扩张映射的一般随机逼近结果，其中包含有马尔可夫和加性噪声，这是基于最近在随机Krasnoselskii-Mann迭代中的进展。

更新时间: 2024-10-01 02:33:09

领域: cs.LG,cs.AI,math.OC,stat.ML

下载: http://arxiv.org/abs/2409.19546v2

Grammar Induction from Visual, Speech and Text

Grammar Induction could benefit from rich heterogeneous signals, such as text, vision, and acoustics. In the process, features from distinct modalities essentially serve complementary roles to each other. With such intuition, this work introduces a novel \emph{unsupervised visual-audio-text grammar induction} task (named \textbf{VAT-GI}), to induce the constituent grammar trees from parallel images, text, and speech inputs. Inspired by the fact that language grammar natively exists beyond the texts, we argue that the text has not to be the predominant modality in grammar induction. Thus we further introduce a \emph{textless} setting of VAT-GI, wherein the task solely relies on visual and auditory inputs. To approach the task, we propose a visual-audio-text inside-outside recursive autoencoder (\textbf{VaTiora}) framework, which leverages rich modal-specific and complementary features for effective grammar parsing. Besides, a more challenging benchmark data is constructed to assess the generalization ability of VAT-GI system. Experiments on two benchmark datasets demonstrate that our proposed VaTiora system is more effective in incorporating the various multimodal signals, and also presents new state-of-the-art performance of VAT-GI.

Updated: 2024-10-01 02:24:18

标题: 从视觉、语音和文本中识别语法

摘要: 语法归纳可以从丰富的异构信号中受益，例如文本、视觉和声学。在这个过程中，来自不同模态的特征基本上起着互补的作用。基于这样的直觉，这项工作引入了一项新颖的\emph{无监督视觉-音频-文本语法归纳}任务（称为\textbf{VAT-GI}），从平行图像、文本和语音输入中诱导组成语法树。受到语言语法本质上存在于文本之外的事实的启发，我们认为文本不必是语法归纳中主导的模态。因此，我们进一步引入了一个\emph{无文本}设置的VAT-GI，其中任务仅依赖于视觉和听觉输入。为了解决这个任务，我们提出了一个视觉-音频-文本内外递归自动编码器（\textbf{VaTiora}）框架，利用丰富的模态特定和互补特征来进行有效的语法解析。此外，构建了一个更具挑战性的基准数据集，用于评估VAT-GI系统的泛化能力。在两个基准数据集上的实验表明，我们提出的VaTiora系统在整合各种多模态信号方面更有效，并且还展示了VAT-GI的新的最先进性能。

更新时间: 2024-10-01 02:24:18

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.03739v1

Preserving Generalization of Language models in Few-shot Continual Relation Extraction

Few-shot Continual Relations Extraction (FCRE) is an emerging and dynamic area of study where models can sequentially integrate knowledge from new relations with limited labeled data while circumventing catastrophic forgetting and preserving prior knowledge from pre-trained backbones. In this work, we introduce a novel method that leverages often-discarded language model heads. By employing these components via a mutual information maximization strategy, our approach helps maintain prior knowledge from the pre-trained backbone and strategically aligns the primary classification head, thereby enhancing model performance. Furthermore, we explore the potential of Large Language Models (LLMs), renowned for their wealth of knowledge, in addressing FCRE challenges. Our comprehensive experimental results underscore the efficacy of the proposed method and offer valuable insights for future work.

Updated: 2024-10-01 02:22:34

标题: 在少样本持续关系抽取中保留语言模型的泛化效果

摘要: Few-shot Continual Relations Extraction (FCRE)是一个新兴且动态的研究领域，其中模型可以顺序地整合来自新关系的知识，同时避免灾难性遗忘，并保留来自预训练骨干的先前知识。在这项工作中，我们提出了一种利用通常被丢弃的语言模型头部的新方法。通过采用这些组件并通过互信息最大化策略，我们的方法有助于保持来自预先训练骨干的先前知识，并战略性地对齐主分类头部，从而提升模型性能。此外，我们探讨了大型语言模型（LLMs）在应对FCRE挑战中的潜力。我们的全面实验结果强调了所提出方法的有效性，并为未来的工作提供了宝贵的见解。

更新时间: 2024-10-01 02:22:34

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2410.00334v1

Vision Language Models Know Law of Conservation without Understanding More-or-Less

Conservation is a critical milestone of cognitive development considered to be supported by both the understanding of quantitative concepts and the reversibility of mental operations. To assess whether this critical component of human intelligence has emerged in Vision Language Models, we leverage the ConserveBench from CogDevelop2K, a data-intensive cognitive experiment benchmark for assaying the developmental trajectory of machine intelligence. The battery includes over 350 questions across four dimensions of physical quantities: volume, solid quantity, length, and number. The former two involve only transformational tasks, whereas the latter two also involve non-transformational tasks assessing the understanding of quantitative concepts alone. Surprisingly, we find that while VLMs are generally capable of conserving, they tend to fail at non-transformational tasks which success is typically considered to be entailed by the ability to conserve. This implies that the law of conservation, at least in concrete domains, may exist without corresponding conceptual understanding of quantity.

Updated: 2024-10-01 02:15:49

标题: 视觉语言模型在不完全理解的情况下知道守恒定律

摘要: 保守是认为认知发展的关键里程碑，被认为受到数量概念的理解和心理操作的可逆性的支持。为了评估这一人类智能的关键组成部分是否已经在视觉语言模型中出现，我们利用CogDevelop2K的ConserveBench，这是一个用于评估机器智能发展轨迹的数据密集型认知实验基准。该测试包括超过350个问题，涵盖四个物理量的维度：体积、固体数量、长度和数字。前两者仅涉及转换任务，而后两者还涉及评估仅涉及数量概念理解的非转换任务。令人惊讶的是，我们发现虽然VLMs通常能够保守，但它们倾向于在通常认为与保守能力有关的非转换任务上失败。这意味着，至少在具体领域中，保守定律可能存在，而没有相应的数量概念理解。

更新时间: 2024-10-01 02:15:49

领域: cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2410.00332v1

Extending class group action attacks via sesquilinear pairings

We introduce a new tool for the study of isogeny-based cryptography, namely pairings which are sesquilinear (conjugate linear) with respect to the $\mathcal{O}$-module structure of an elliptic curve with CM by an imaginary quadratic order $\mathcal{O}$. We use these pairings to study the security of problems based on the class group action on collections of oriented ordinary or supersingular elliptic curves. This extends work of both (Castryck, Houben, Merz, Mula, Buuren, Vercauteren, 2023) and (De Feo, Fouotsa, Panny, 2024).

Updated: 2024-10-01 02:13:48

标题: 通过半线性配对扩展类群作用攻击

摘要: 我们引入了一种用于研究同构基密码学的新工具，即对于具有虚二次秩$\mathcal{O}$的椭圆曲线的$\mathcal{O}$-模结构而言是半线性（共轭线性）的配对。我们使用这些配对来研究基于类群在定向普通或超奇异椭圆曲线集合上的作用的问题的安全性。这扩展了(Castryck, Houben, Merz, Mula, Buuren, Vercauteren, 2023)和(De Feo, Fouotsa, Panny, 2024)的工作。

更新时间: 2024-10-01 02:13:48

领域: math.NT,cs.CR,11R65, 14H52, 94A60

下载: http://arxiv.org/abs/2406.10440v2

ReactZyme: A Benchmark for Enzyme-Reaction Prediction

Enzymes, with their specific catalyzed reactions, are necessary for all aspects of life, enabling diverse biological processes and adaptations. Predicting enzyme functions is essential for understanding biological pathways, guiding drug development, enhancing bioproduct yields, and facilitating evolutionary studies. Addressing the inherent complexities, we introduce a new approach to annotating enzymes based on their catalyzed reactions. This method provides detailed insights into specific reactions and is adaptable to newly discovered reactions, diverging from traditional classifications by protein family or expert-derived reaction classes. We employ machine learning algorithms to analyze enzyme reaction datasets, delivering a much more refined view on the functionality of enzymes. Our evaluation leverages the largest enzyme-reaction dataset to date, derived from the SwissProt and Rhea databases with entries up to January 8, 2024. We frame the enzyme-reaction prediction as a retrieval problem, aiming to rank enzymes by their catalytic ability for specific reactions. With our model, we can recruit proteins for novel reactions and predict reactions in novel proteins, facilitating enzyme discovery and function annotation (https://github.com/WillHua127/ReactZyme).

Updated: 2024-10-01 02:12:41

标题: ReactZyme：一个用于酶反应预测的基准

摘要: 酶以其特定的催化反应对于生命的各个方面都是必不可少的，使得多样化的生物过程和适应能力成为可能。预测酶的功能对于理解生物途径、指导药物开发、增加生物产品产量以及促进进化研究至关重要。为了解决固有的复杂性，我们引入了一种基于其催化反应的酶注释新方法。这种方法提供了关于特定反应的详细见解，并可适用于新发现的反应，与传统按蛋白质家族或专家衍生的反应类别分类有所不同。我们采用机器学习算法分析酶反应数据集，提供了更加精细的酶功能视图。我们的评估利用到目前为止最大的酶-反应数据集，来源于SwissProt和Rhea数据库，截至2024年1月8日的条目。我们将酶-反应预测框架化为一个检索问题，旨在根据其对特定反应的催化能力对酶进行排名。借助我们的模型，我们可以为新型反应招募蛋白质，并预测新型蛋白质中的反应，促进酶的发现和功能注释。

更新时间: 2024-10-01 02:12:41

领域: cs.LG,cs.AI,cs.CE,q-bio.QM

下载: http://arxiv.org/abs/2408.13659v3

EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics

Enzyme design is a critical area in biotechnology, with applications ranging from drug development to synthetic biology. Traditional methods for enzyme function prediction or protein binding pocket design often fall short in capturing the dynamic and complex nature of enzyme-substrate interactions, particularly in catalytic processes. To address the challenges, we introduce EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets for specific substrates and catalytic reactions. Additionally, we introduce a large-scale, curated, and validated dataset of enzyme-reaction pairs, specifically designed for the catalytic pocket generation task, comprising a total of $328,192$ pairs. By incorporating evolutionary dynamics and reaction-specific adaptations, EnzymeFlow becomes a powerful model for designing enzyme pockets, which is capable of catalyzing a wide range of biochemical reactions. Experiments on the new dataset demonstrate the model's effectiveness in designing high-quality, functional enzyme catalytic pockets, paving the way for advancements in enzyme engineering and synthetic biology. We provide EnzymeFlow code at https://github.com/WillHua127/EnzymeFlow with notebook demonstration at https://github.com/WillHua127/EnzymeFlow/blob/main/enzymeflow_demo.ipynb.

Updated: 2024-10-01 02:04:01

标题: EnzymeFlow: 通过流匹配和共同进化动力学生成特定反应的酶催化口袋

摘要: 酶设计是生物技术中的一个关键领域，应用范围从药物开发到合成生物学。传统的酶功能预测或蛋白质结合口袋设计方法通常难以捕捉酶-底物相互作用的动态和复杂性，特别是在催化过程中。为了解决这些挑战，我们引入了EnzymeFlow，这是一个利用流匹配与分层预训练以及酶-反应共同进化的生成模型，用于为特定底物和催化反应生成催化口袋。此外，我们还引入了一个大规模的、经过筛选和验证的酶-反应对数据集，专门设计用于催化口袋生成任务，共包含328,192对。通过融入演化动态和反应特异性适应性，EnzymeFlow成为一个强大的模型，用于设计酶口袋，能够催化各种生物化学反应。对新数据集的实验表明了该模型在设计高质量、功能性酶催化口袋方面的有效性，为酶工程和合成生物学的进步铺平了道路。我们在https://github.com/WillHua127/EnzymeFlow 提供EnzymeFlow代码，以及在https://github.com/WillHua127/EnzymeFlow/blob/main/enzymeflow_demo.ipynb上的笔记本演示。

更新时间: 2024-10-01 02:04:01

领域: cs.LG,cs.AI,cs.CE,q-bio.QM

下载: http://arxiv.org/abs/2410.00327v1

Provably Efficient Exploration in Inverse Constrained Reinforcement Learning

To obtain the optimal constraints in complex environments, Inverse Constrained Reinforcement Learning (ICRL) seeks to recover these constraints from expert demonstrations in a data-driven manner. Existing ICRL algorithms collect training samples from an interactive environment. However, the efficacy and efficiency of these sampling strategies remain unknown. To bridge this gap, we introduce a strategic exploration framework with guaranteed efficiency. Specifically, we define a feasible constraint set for ICRL problems and investigate how expert policy and environmental dynamics influence the optimality of constraints. Motivated by our findings, we propose two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimation and 2) strategically constraining the exploration policy. Both algorithms are theoretically grounded with tractable sample complexity. We empirically demonstrate the performance of our algorithms under various environments.

Updated: 2024-10-01 02:00:50

标题: 可证明有效的反向受限强化学习中的探索

摘要: 为了在复杂环境中获得最佳约束条件，逆向约束强化学习（ICRL）试图以数据驱动的方式从专家示范中恢复这些约束条件。现有的ICRL算法从交互式环境中收集训练样本。然而，这些采样策略的有效性和效率仍然未知。为了弥合这一差距，我们引入了一个拥有保证效率的战略探索框架。具体而言，我们为ICRL问题定义了一个可行的约束集，并调查了专家策略和环境动态如何影响约束的最优性。受到我们发现的启发，我们提出了两种探索算法来通过动态减少成本估计的有界聚合误差和策略性地约束探索策略来实现有效的约束推断。这两种算法在可处理的样本复杂度下有理论基础。我们在各种环境中通过实验证明了我们算法的性能。

更新时间: 2024-10-01 02:00:50

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2409.15963v3

Vision Language Models See What You Want but not What You See

Knowing others' intentions and taking others' perspectives are two core components of human intelligence that are typically considered to be instantiations of theory-of-mind. Infiltrating machines with these abilities is an important step towards building human-level artificial intelligence. Recently, Li et al. built CogDevelop2K, a data-intensive cognitive experiment benchmark to assess the developmental trajectory of machine intelligence. Here, to investigate intentionality understanding and perspective-taking in Vision Language Models, we leverage the IntentBench and PerspectBench of CogDevelop2K, which contains over 300 cognitive experiments grounded in real-world scenarios and classic cognitive tasks, respectively. Surprisingly, we find VLMs achieving high performance on intentionality understanding but lower performance on perspective-taking. This challenges the common belief in cognitive science literature that perspective-taking at the corresponding modality is necessary for intentionality understanding.

Updated: 2024-10-01 01:52:01

标题: 视觉语言模型看到你想要看的，但不是你看到的

摘要: 了解他人意图和考虑他人观点是人类智力的两个核心组成部分，通常被认为是心灵理论的实例。赋予机器这些能力是构建人类水平人工智能的重要一步。最近，Li等人构建了CogDevelop2K，这是一个数据密集型的认知实验基准，用于评估机器智能的发展轨迹。在这里，为了研究视觉语言模型中的意图理解和观点采取，我们利用了CogDevelop2K的IntentBench和PerspectBench，其中包含超过300个基于真实场景和经典认知任务的认知实验。令人惊讶的是，我们发现VLMs在意图理解方面表现出色，但在观点采取方面表现较低。这挑战了认知科学文献中的普遍信念，即在相应模态下进行观点采取对于意图理解是必要的。

更新时间: 2024-10-01 01:52:01

领域: cs.AI

下载: http://arxiv.org/abs/2410.00324v1

Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons

In this paper, we investigate whether Large Language Models (LLMs) actively recall or retrieve their internal repositories of factual knowledge when faced with reasoning tasks. Through an analysis of LLMs' internal factual recall at each reasoning step via Knowledge Neurons, we reveal that LLMs fail to harness the critical factual associations under certain circumstances. Instead, they tend to opt for alternative, shortcut-like pathways to answer reasoning questions. By manually manipulating the recall process of parametric knowledge in LLMs, we demonstrate that enhancing this recall process directly improves reasoning performance whereas suppressing it leads to notable degradation. Furthermore, we assess the effect of Chain-of-Thought (CoT) prompting, a powerful technique for addressing complex reasoning tasks. Our findings indicate that CoT can intensify the recall of factual knowledge by encouraging LLMs to engage in orderly and reliable reasoning. Furthermore, we explored how contextual conflicts affect the retrieval of facts during the reasoning process to gain a comprehensive understanding of the factual recall behaviors of LLMs. Code and data will be available soon.

Updated: 2024-10-01 01:48:58

标题: 通过知识神经元揭示大型语言模型的事实回忆行为

摘要: 在这篇论文中，我们调查了大型语言模型（LLMs）在面对推理任务时是否积极地回忆或检索其内部的事实知识库。通过对LLMs在每个推理步骤中的内部事实回忆进行分析，我们揭示了LLMs在某些情况下未能利用关键的事实关联。相反，它们倾向于选择替代的、类似快捷方式的路径来回答推理问题。通过手动操纵LLMs中参数化知识的回忆过程，我们证明了增强这一回忆过程会直接提高推理性能，而抑制它会导致显著的降级。此外，我们评估了“思维链”（CoT）提示技术对解决复杂推理任务的影响。我们的发现表明，通过鼓励LLMs参与有序和可靠的推理，CoT可以加强事实知识的回忆。此外，我们探讨了上下文冲突如何影响推理过程中事实的检索，以全面了解LLMs的事实回忆行为。代码和数据即将提供。

更新时间: 2024-10-01 01:48:58

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2408.03247v3

Probing Mechanical Reasoning in Large Vision Language Models

Mechanical reasoning is a fundamental ability that sets human intelligence apart from other animal intelligence. Mechanical reasoning allows us to design tools, build bridges and canals, and construct houses which set the foundation of human civilization. Embedding machines with such ability is an important step towards building human-level artificial intelligence. Recently, Li et al. built CogDevelop2K, a data-intensive cognitive experiment benchmark for assaying the developmental trajectory of machine intelligence (Li et al., 2024). Here, to investigate mechanical reasoning in Vision Language Models, we leverage the MechBench of CogDevelop2K, which contains approximately 150 cognitive experiments, to test understanding of mechanical system stability, gears and pulley systems, seesaw-like systems and leverage principle, inertia and motion, and other fluid-related systems in Large Vision Language Models. We observe diverse yet consistent behaviors over these aspects in VLMs.

Updated: 2024-10-01 01:33:10

标题: 探究大型视觉语言模型中的机械推理

摘要: 机械推理是一种基本能力，使人类智能与其他动物的智能有所区别。机械推理使我们能够设计工具，建造桥梁和运河，并建造房屋，这为人类文明奠定了基础。赋予机器这种能力是迈向构建人类水平人工智能的重要一步。最近，Li等人建立了CogDevelop2K，一个用于测量机器智能发展轨迹的数据密集型认知实验基准（Li等人，2024年）。在这里，我们利用CogDevelop2K的MechBench来研究视觉语言模型中的机械推理，其中包含约150个认知实验，以测试大型视觉语言模型对机械系统稳定性、齿轮和滑轮系统、平衡木系统和杠杆原理、惯性和运动以及其他与流体相关的系统的理解。我们观察到在VLMs中这些方面的行为多样且一致。

更新时间: 2024-10-01 01:33:10

领域: cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2410.00318v1

EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control

While recent advances in Text-to-Speech (TTS) technology produce natural and expressive speech, they lack the option for users to select emotion and control intensity. We propose EmoKnob, a framework that allows fine-grained emotion control in speech synthesis with few-shot demonstrative samples of arbitrary emotion. Our framework leverages the expressive speaker representation space made possible by recent advances in foundation voice cloning models. Based on the few-shot capability of our emotion control framework, we propose two methods to apply emotion control on emotions described by open-ended text, enabling an intuitive interface for controlling a diverse array of nuanced emotions. To facilitate a more systematic emotional speech synthesis field, we introduce a set of evaluation metrics designed to rigorously assess the faithfulness and recognizability of emotion control frameworks. Through objective and subjective evaluations, we show that our emotion control framework effectively embeds emotions into speech and surpasses emotion expressiveness of commercial TTS services.

Updated: 2024-10-01 01:29:54

标题: EmoKnob：通过细粒度情绪控制增强语音克隆

摘要: 最近的文本到语音（TTS）技术取得了进展，能够产生自然和富有表现力的语音，但缺乏用户选择情感和控制强度的选项。我们提出了EmoKnob，这是一个框架，允许在语音合成中对情感进行精细控制，只需少量示范样本即可表现任意情感。我们的框架利用了最近基于声纹克隆模型的表达性说话者表示空间，基于我们情感控制框架的少样本能力，我们提出了两种方法来对开放性文本描述的情感进行控制，为控制各种微妙情感提供直观界面。为了促进更系统的情感语音合成领域，我们引入了一组评估指标，旨在严格评估情感控制框架的忠实度和可识别性。通过客观和主观评估，我们展示了我们的情感控制框架有效地将情感嵌入语音，并超越了商用TTS服务的情感表达能力。

更新时间: 2024-10-01 01:29:54

领域: cs.CL,cs.AI,cs.HC,cs.SD,eess.AS

下载: http://arxiv.org/abs/2410.00316v1

Contrastive Representation Learning for Predicting Solar Flares from Extremely Imbalanced Multivariate Time Series Data

Major solar flares are abrupt surges in the Sun's magnetic flux, presenting significant risks to technological infrastructure. In view of this, effectively predicting major flares from solar active region magnetic field data through machine learning methods becomes highly important in space weather research. Magnetic field data can be represented in multivariate time series modality where the data displays an extreme class imbalance due to the rarity of major flare events. In time series classification-based flare prediction, the use of contrastive representation learning methods has been relatively limited. In this paper, we introduce CONTREX, a novel contrastive representation learning approach for multivariate time series data, addressing challenges of temporal dependencies and extreme class imbalance. Our method involves extracting dynamic features from the multivariate time series instances, deriving two extremes from positive and negative class feature vectors that provide maximum separation capability, and training a sequence representation embedding module with the original multivariate time series data guided by our novel contrastive reconstruction loss to generate embeddings aligned with the extreme points. These embeddings capture essential time series characteristics and enhance discriminative power. Our approach shows promising solar flare prediction results on the Space Weather Analytics for Solar Flares (SWAN-SF) multivariate time series benchmark dataset against baseline methods.

Updated: 2024-10-01 01:20:47

标题: 对极度不平衡的多变量时间序列数据进行对比表示学习，用于预测太阳耀斑

摘要: 太阳耀斑是太阳磁通的突然激增，对技术基础设施构成重大风险。因此，通过机器学习方法有效预测太阳活动区磁场数据中的主要耀斑在空间天气研究中变得非常重要。磁场数据可以以多变量时间序列模态表示，由于主要耀斑事件的罕见性，数据显示出极端类别不平衡。在基于时间序列分类的耀斑预测中，对比表示学习方法的使用相对有限。在本文中，我们介绍了CONTREX，一种新颖的对比表示学习方法，用于多变量时间序列数据，解决了时间依赖性和极端类别不平衡的挑战。我们的方法涉及从多变量时间序列实例中提取动态特征，从正类和负类特征向量中得出两个极端，这些极端提供了最大的分离能力，并使用我们的新颖对比重构损失来引导原始多变量时间序列数据训练序列表示嵌入模块，生成与极端点对齐的嵌入。这些嵌入捕捉了重要的时间序列特征，并增强了区分能力。我们的方法在太阳耀斑空间天气分析(SWAN-SF)多变量时间序列基准数据集上展示了有希望的太阳耀斑预测结果，与基准方法相比。

更新时间: 2024-10-01 01:20:47

领域: astro-ph.SR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.00312v1

Ask, Pose, Unite: Scaling Data Acquisition for Close Interactions with Vision Language Models

Social dynamics in close human interactions pose significant challenges for Human Mesh Estimation (HME), particularly due to the complexity of physical contacts and the scarcity of training data. Addressing these challenges, we introduce a novel data generation method that utilizes Large Vision Language Models (LVLMs) to annotate contact maps which guide test-time optimization to produce paired image and pseudo-ground truth meshes. This methodology not only alleviates the annotation burden but also enables the assembly of a comprehensive dataset specifically tailored for close interactions in HME. Our Ask Pose Unite (APU) dataset, comprising over 6.2k human mesh pairs in contact covering diverse interaction types, is curated from images depicting naturalistic person-to-person scenes. We empirically show that using our dataset to train a diffusion-based contact prior, used as guidance during optimization, improves mesh estimation on unseen interactions. Our work addresses longstanding challenges of data scarcity for close interactions in HME enhancing the field's capabilities of handling complex interaction scenarios.

Updated: 2024-10-01 01:14:24

标题: 请问，提出，团结：为视觉语言模型的近距离交互扩展数据采集

摘要: 在密切的人际互动中，社交动态给人体网格估计（HME）带来了重大挑战，主要是由于物理接触的复杂性和训练数据的稀缺性。为了解决这些挑战，我们引入了一种新颖的数据生成方法，利用大型视觉语言模型（LVLMs）来注释接触图，指导测试时优化生成配对的图像和伪地面真实网格。这种方法不仅减轻了注释负担，还能够组装一个专门针对HME中密切互动的综合数据集。我们的Ask Pose Unite（APU）数据集包括超过6.2k对接触的人体网格，涵盖各种不同类型的互动，从描绘自然人与人场景的图像中策划而来。我们实证表明，使用我们的数据集训练基于扩散的接触先验，在优化过程中作为指导，可以改善对未知互动的网格估计。我们的工作解决了HME中密切互动数据稀缺的长期挑战，增强了该领域处理复杂互动场景的能力。

更新时间: 2024-10-01 01:14:24

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2410.00309v1

Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention

Conditional diffusion models have shown remarkable success in visual content generation, producing high-quality samples across various domains, largely due to classifier-free guidance (CFG). Recent attempts to extend guidance to unconditional models have relied on heuristic techniques, resulting in suboptimal generation quality and unintended effects. In this work, we propose Smoothed Energy Guidance (SEG), a novel training- and condition-free approach that leverages the energy-based perspective of the self-attention mechanism to enhance image generation. By defining the energy of self-attention, we introduce a method to reduce the curvature of the energy landscape of attention and use the output as the unconditional prediction. Practically, we control the curvature of the energy landscape by adjusting the Gaussian kernel parameter while keeping the guidance scale parameter fixed. Additionally, we present a query blurring method that is equivalent to blurring the entire attention weights without incurring quadratic complexity in the number of tokens. In our experiments, SEG achieves a Pareto improvement in both quality and the reduction of side effects. The code is available at https://github.com/SusungHong/SEG-SDXL.

Updated: 2024-10-01 01:04:58

标题: 平滑能量指导：通过减少注意力的能量曲率来引导扩散模型

摘要: 条件扩散模型在视觉内容生成方面取得了显著成功，跨越各个领域生成高质量样本，这在很大程度上归功于无分类器引导（CFG）。最近尝试将引导扩展到无条件模型的做法依赖于启发式技术，导致生成质量次优和产生意外影响。在这项工作中，我们提出了平滑能量引导（SEG），这是一种新颖的无需训练和条件的方法，利用自注意机制的能量视角来增强图像生成。通过定义自注意力的能量，我们引入了一种方法来减少注意力的能量景观的曲率，并将输出用作无条件预测。在实践中，我们通过调整高斯核参数来控制能量景观的曲率，同时保持引导比例参数固定。此外，我们提出了一种查询模糊方法，相当于模糊整个注意力权重，而不会在令牌数量上产生二次复杂度。在我们的实验中，SEG在质量和副作用减少方面实现了帕累托改进。代码可在https://github.com/SusungHong/SEG-SDXL 上找到。

更新时间: 2024-10-01 01:04:58

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2408.00760v2

VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data

Vision-language models (VLMs) are essential for contextual understanding of both visual and textual information. However, their vulnerability to adversarially manipulated inputs presents significant risks, leading to compromised outputs and raising concerns about the reliability in VLM-integrated applications. Detecting these malicious prompts is thus crucial for maintaining trust in VLM generations. A major challenge in developing a safeguarding prompt classifier is the lack of a large amount of labeled benign and malicious data. To address the issue, we introduce VLMGuard, a novel learning framework that leverages the unlabeled user prompts in the wild for malicious prompt detection. These unlabeled prompts, which naturally arise when VLMs are deployed in the open world, consist of both benign and malicious information. To harness the unlabeled data, we present an automated maliciousness estimation score for distinguishing between benign and malicious samples within this unlabeled mixture, thereby enabling the training of a binary prompt classifier on top. Notably, our framework does not require extra human annotations, offering strong flexibility and practicality for real-world applications. Extensive experiment shows VLMGuard achieves superior detection results, significantly outperforming state-of-the-art methods. Disclaimer: This paper may contain offensive examples; reader discretion is advised.

Updated: 2024-10-01 00:37:29

标题: VLMGuard：通过未标记数据防御VLMs免受恶意提示

摘要: 视觉-语言模型（VLMs）对于理解视觉和文本信息的上下文至关重要。然而，它们对于敌对操纵输入的脆弱性存在重大风险，导致输出受损，并引发对VLM整合应用可靠性的担忧。因此，检测这些恶意提示对于维护对VLM生成的信任至关重要。在开发保护提示分类器时面临的一项主要挑战是缺乏大量标记的良性和恶意数据。为了解决这个问题，我们引入了VLMGuard，这是一个新颖的学习框架，利用野外未标记的用户提示来检测恶意提示。这些未标记的提示在VLMs部署在开放世界时自然产生，包含良性和恶意信息。为了利用未标记数据，我们提出了一个自动的恶意估计分数，用于区分在这个未标记混合中的良性和恶意样本，从而实现在其上训练二元提示分类器。值得注意的是，我们的框架不需要额外的人工注释，为真实世界的应用提供了强大的灵活性和实用性。大量实验证明，VLMGuard实现了优越的检测结果，明显优于现有技术方法。免责声明：本文可能包含冒犯性例子；请谨慎阅读。

更新时间: 2024-10-01 00:37:29

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2410.00296v1

ERASMO: Leveraging Large Language Models for Enhanced Clustering Segmentation

Cluster analysis plays a crucial role in various domains and applications, such as customer segmentation in marketing. These contexts often involve multimodal data, including both tabular and textual datasets, making it challenging to represent hidden patterns for obtaining meaningful clusters. This study introduces ERASMO, a framework designed to fine-tune a pretrained language model on textually encoded tabular data and generate embeddings from the fine-tuned model. ERASMO employs a textual converter to transform tabular data into a textual format, enabling the language model to process and understand the data more effectively. Additionally, ERASMO produces contextually rich and structurally representative embeddings through techniques such as random feature sequence shuffling and number verbalization. Extensive experimental evaluations were conducted using multiple datasets and baseline approaches. Our results demonstrate that ERASMO fully leverages the specific context of each tabular dataset, leading to more precise and nuanced embeddings for accurate clustering. This approach enhances clustering performance by capturing complex relationship patterns within diverse tabular data.

Updated: 2024-10-01 00:37:16

标题: ERASMO：利用大型语言模型增强聚类分割

摘要: 聚类分析在各个领域和应用中起着至关重要的作用，例如在营销中的客户细分。这些情境通常涉及多模态数据，包括表格和文本数据集，使得表示隐藏模式以获取有意义的聚类变得具有挑战性。本研究介绍了ERASMO，一个旨在对文本编码的表格数据进行微调的预训练语言模型，并从微调模型生成嵌入的框架。ERASMO采用文本转换器将表格数据转换为文本格式，使语言模型能够更有效地处理和理解数据。此外，ERASMO通过随机特征序列洗牌和数字语言化等技术生成具有上下文丰富和结构代表性的嵌入。使用多个数据集和基准方法进行了广泛的实验评估。我们的结果表明，ERASMO充分利用了每个表格数据集的特定上下文，从而产生更精确和微妙的嵌入，以实现准确的聚类。这种方法通过捕捉多样化表格数据中的复杂关系模式来增强聚类性能。

更新时间: 2024-10-01 00:37:16

领域: cs.CL,cs.AI,68T50 (Natural language processing), 68T01 (General topics in artificial intelligence)

下载: http://arxiv.org/abs/2410.03738v1

Automatic Quantification of Serial PET/CT Images for Pediatric Hodgkin Lymphoma Patients Using a Longitudinally-Aware Segmentation Network

$\textbf{Purpose}$: Automatic quantification of longitudinal changes in PET scans for lymphoma patients has proven challenging, as residual disease in interim-therapy scans is often subtle and difficult to detect. Our goal was to develop a longitudinally-aware segmentation network (LAS-Net) that can quantify serial PET/CT images for pediatric Hodgkin lymphoma patients. $\textbf{Materials and Methods}$: This retrospective study included baseline (PET1) and interim (PET2) PET/CT images from 297 patients enrolled in two Children's Oncology Group clinical trials (AHOD1331 and AHOD0831). LAS-Net incorporates longitudinal cross-attention, allowing relevant features from PET1 to inform the analysis of PET2. Model performance was evaluated using Dice coefficients for PET1 and detection F1 scores for PET2. Additionally, we extracted and compared quantitative PET metrics, including metabolic tumor volume (MTV) and total lesion glycolysis (TLG) in PET1, as well as qPET and $\Delta$SUVmax in PET2, against physician measurements. We quantified their agreement using Spearman's $\rho$ correlations and employed bootstrap resampling for statistical analysis. $\textbf{Results}$: LAS-Net detected residual lymphoma in PET2 with an F1 score of 0.606 (precision/recall: 0.615/0.600), outperforming all comparator methods (P<0.01). For baseline segmentation, LAS-Net achieved a mean Dice score of 0.772. In PET quantification, LAS-Net's measurements of qPET, $\Delta$SUVmax, MTV and TLG were strongly correlated with physician measurements, with Spearman's $\rho$ of 0.78, 0.80, 0.93 and 0.96, respectively. The performance remained high, with a slight decrease, in an external testing cohort. $\textbf{Conclusion}$: LAS-Net demonstrated significant improvements in quantifying PET metrics across serial scans, highlighting the value of longitudinal awareness in evaluating multi-time-point imaging datasets.

Updated: 2024-10-01 00:14:32

标题: 使用具有纵向感知分割网络对小儿霍奇金淋巴瘤患者的连续PET/CT图像进行自动量化

摘要: $\textbf{目的}$：自动量化淋巴瘤患者PET扫描的纵向变化一直是具有挑战性的，因为中期治疗扫描中的残留疾病通常是微妙且难以检测的。我们的目标是开发一个具有纵向意识的分割网络（LAS-Net），可以为小儿霍奇金淋巴瘤患者量化连续的PET/CT图像。 $\textbf{材料和方法}$：这项回顾性研究包括来自两项儿童肿瘤学组临床试验（AHOD1331和AHOD0831）的297名患者的基线（PET1）和中期（PET2）PET/CT图像。LAS-Net结合了纵向交叉关注，允许PET1中的相关特征影响PET2的分析。模型性能使用PET1的Dice系数和PET2的检测F1分数进行评估。此外，我们提取并比较PET1中的代谢性肿瘤体积（MTV）和总病灶代谢（TLG），以及PET2中的qPET和ΔSUVmax，与医师测量进行比较。我们使用Spearman的ρ相关性量化它们的一致性，并使用自举重采样进行统计分析。 $\textbf{结果}$：LAS-Net在PET2中以0.606的F1分数（精确度/召回率：0.615/0.600）检测到残留的淋巴瘤，优于所有比较方法（P<0.01）。对于基线分割，LAS-Net实现了平均Dice得分为0.772。在PET定量方面，LAS-Net的qPET、ΔSUVmax、MTV和TLG测量与医师测量强相关，分别为0.78、0.80、0.93和0.96的Spearman's ρ。性能在外部测试队列中保持高水平，略有下降。 $\textbf{结论}$：LAS-Net在量化连续扫描的PET指标方面表现出显著改进，突显了纵向意识在评估多时间点成像数据集中的价值。

更新时间: 2024-10-01 00:14:32

领域: cs.CV,cs.AI,physics.med-ph

下载: http://arxiv.org/abs/2404.08611v2

Using Contrastive Learning with Generative Similarity to Learn Spaces that Capture Human Inductive Biases

Humans rely on strong inductive biases to learn from few examples and abstract useful information from sensory data. Instilling such biases in machine learning models has been shown to improve their performance on various benchmarks including few-shot learning, robustness, and alignment. However, finding effective training procedures to achieve that goal can be challenging as psychologically-rich training data such as human similarity judgments are expensive to scale, and Bayesian models of human inductive biases are often intractable for complex, realistic domains. Here, we address this challenge by introducing a Bayesian notion of generative similarity whereby two datapoints are considered similar if they are likely to have been sampled from the same distribution. This measure can be applied to complex generative processes, including probabilistic programs. We show that generative similarity can be used to define a contrastive learning objective even when its exact form is intractable, enabling learning of spatial embeddings that express specific inductive biases. We demonstrate the utility of our approach by showing that it can be used to capture human inductive biases for geometric shapes, distinguish different abstract drawing styles that are parameterized by probabilistic programs, and capture abstract high-level categories that enable generalization.

Updated: 2024-10-01 00:14:07

标题: 利用对比学习和生成式相似性来学习捕捉人类归纳偏好的空间

摘要: 人类依赖强大的归纳偏见从少量示例中学习，并从感觉数据中抽象出有用信息。已经证明将这些偏见灌输到机器学习模型中可以提高它们在各种基准测试中的性能，包括少样本学习、鲁棒性和对齐性。然而，要找到有效的训练程序来实现这一目标可能具有挑战性，因为心理丰富的训练数据，如人类相似性判断，很难扩展，而人类归纳偏见的贝叶斯模型在复杂、现实领域中通常难以解决。在这里，我们通过引入生成相似性的贝叶斯概念来解决这一挑战，其中两个数据点被认为相似，如果它们很可能是从相同的分布中抽样的。这个度量可以应用于复杂的生成过程，包括概率程序。我们展示生成相似性可以用来定义对比学习目标，即使其确切形式难以处理，也能够实现表达特定归纳偏见的空间嵌入的学习。我们通过展示我们的方法的实用性来证明我们的方法，它可以用来捕捉人类对几何形状的归纳偏见，区分由概率程序参数化的不同抽象绘图风格，并捕捉能够实现泛化的抽象高级类别。

更新时间: 2024-10-01 00:14:07

领域: cs.LG,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2405.19420v2

Closed-Form Interpretation of Neural Network Classifiers with Symbolic Gradients

I introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In contrast to neural network-based regression, for classification, it is in general impossible to express the neural network in the form of a symbolic equation even if the neural network itself bases its classification on a quantity that can be written as a closed-form equation. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. I interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. The approach is not limited to classifiers or full neural networks and can be applied to arbitrary neurons in hidden layers or latent spaces.

Updated: 2024-10-01 00:11:48

标题: 神经网络分类器具有符号梯度的闭式解释

摘要: 我介绍了一个统一的框架，用于找到人工神经网络中任何单个神经元的封闭形式解释。使用这一框架，我展示了如何解释神经网络分类器，以揭示其决策边界中编码的概念的封闭形式表达。与基于神经网络的回归相比，对于分类问题，通常无法将神经网络表达为符号方程的形式，即使神经网络本身基于可以写成封闭形式方程的数量进行分类。解释框架基于将训练好的神经网络嵌入到编码相同概念的函数等价类中。我通过找到等价类和由符号搜索空间定义的人类可读方程之间的交集来解释这些神经网络。这种方法不仅局限于分类器或完整的神经网络，还可以应用于隐藏层或潜在空间中的任意神经元。

更新时间: 2024-10-01 00:11:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.04978v2