More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory
In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better. Recent deep learning practice has found repeatedly that larger model size, more data, and more computation (resulting in lower training loss) improves performance. In this paper, we give theoretical backing to these empirical observations by showing that these three properties hold in random feature (RF) regression, a class of models equivalent to shallow networks with only the last layer trained. Concretely, we first show that the test risk of RF regression decreases monotonically with both the number of features and the number of samples, provided the ridge penalty is tuned optimally. In particular, this implies that infinite width RF architectures are preferable to those of any finite width. We then proceed to demonstrate that, for a large class of tasks characterized by powerlaw eigenstructure, training to near-zero training loss is obligatory: near-optimal performance can only be achieved when the training error is much smaller than the test error. Grounding our theory in real-world data, we find empirically that standard computer vision tasks with convolutional neural tangent kernels clearly fall into this class. Taken together, our results tell a simple, testable story of the benefits of overparameterization, overfitting, and more data in random feature models.
Updated: 2024-05-15 23:46:41
标题: 现代机器学习中更多是更好的:当无限超参数化是最优的且过拟合是必然的
摘要: 在我们这个巨大神经网络的时代,经验进步的驱动力是更多就是更好的哲学。最近的深度学习实践一再发现,更大的模型规模、更多的数据和更多的计算(导致训练损失降低)可以提高性能。本文通过展示这三个特性在随机特征(RF)回归中成立来为这些经验观察提供理论支持,RF回归是一类等同于只训练最后一层的浅层网络的模型。 具体来说,我们首先展示了在岭惩罚经过最佳调整的情况下,RF回归的测试风险随着特征数量和样本数量的增加而单调减少。特别是,这意味着无限宽度的RF结构优于任何有限宽度的结构。然后,我们继续证明对于一大类由幂律特征结构特征的任务而言,训练至接近零的训练损失是必要的:只有当训练误差远小于测试误差时,才能实现接近最优性能。通过基于真实世界数据的理论基础,我们发现标准计算机视觉任务与卷积神经切线核明显属于这一类。综上所述,我们的结果展示了在随机特征模型中超参数化、过拟合和更多数据的好处的简单可验证的故事。
更新时间: 2024-05-15 23:46:41
领域: cs.LG,stat.ML
Fully Distributed Fog Load Balancing with Multi-Agent Reinforcement Learning
Real-time Internet of Things (IoT) applications require real-time support to handle the ever-growing demand for computing resources to process IoT workloads. Fog Computing provides high availability of such resources in a distributed manner. However, these resources must be efficiently managed to distribute unpredictable traffic demands among heterogeneous Fog resources. This paper proposes a fully distributed load-balancing solution with Multi-Agent Reinforcement Learning (MARL) that intelligently distributes IoT workloads to optimize the waiting time while providing fair resource utilization in the Fog network. These agents use transfer learning for life-long self-adaptation to dynamic changes in the environment. By leveraging distributed decision-making, MARL agents effectively minimize the waiting time compared to a single centralized agent solution and other baselines, enhancing end-to-end execution delay. Besides performance gain, a fully distributed solution allows for a global-scale implementation where agents can work independently in small collaboration regions, leveraging nearby local resources. Furthermore, we analyze the impact of a realistic frequency to observe the state of the environment, unlike the unrealistic common assumption in the literature of having observations readily available in real-time for every required action. The findings highlight the trade-off between realism and performance using an interval-based Gossip-based multi-casting protocol against assuming real-time observation availability for every generated workload.
Updated: 2024-05-15 23:44:06
标题: 完全分布式的雾负载平衡与多智能体强化学习
摘要: 物联网(IoT)实时应用需要实时支持来处理不断增长的计算资源需求,以处理IoT工作负载。雾计算以分布式方式提供这些资源的高可用性。然而,这些资源必须被有效地管理,以在异构雾资源之间分配不可预测的流量需求。本文提出了一种完全分布式的负载均衡解决方案,采用多智能体强化学习(MARL)来智能地分配IoT工作负载,以优化等待时间同时提供公平资源利用率。这些智能体利用转移学习,实现对环境动态变化的终身自适应。通过利用分布式决策,MARL智能体有效地减少了等待时间,相较于单一集中式智能体解决方案和其他基线,提升了端到端执行延迟。除了性能增益,完全分布式解决方案允许全球范围的实施,智能体可以在小型协作区域独立工作,利用附近的本地资源。此外,我们分析了观察环境状态的实际频率对影响,不同于文献中普遍假设的观察实时可用于每个所需动作。研究结果突出了基于间隔的基于传播的多播协议与假设每个生成的工作负载都可实时观察之间的现实性和性能权衡。
更新时间: 2024-05-15 23:44:06
领域: cs.AI,cs.DC,cs.LG,cs.MA
A survey on fairness of large language models in e-commerce: progress, application, and challenge
This survey explores the fairness of large language models (LLMs) in e-commerce, examining their progress, applications, and the challenges they face. LLMs have become pivotal in the e-commerce domain, offering innovative solutions and enhancing customer experiences. This work presents a comprehensive survey on the applications and challenges of LLMs in e-commerce. The paper begins by introducing the key principles underlying the use of LLMs in e-commerce, detailing the processes of pretraining, fine-tuning, and prompting that tailor these models to specific needs. It then explores the varied applications of LLMs in e-commerce, including product reviews, where they synthesize and analyze customer feedback; product recommendations, where they leverage consumer data to suggest relevant items; product information translation, enhancing global accessibility; and product question and answer sections, where they automate customer support. The paper critically addresses the fairness challenges in e-commerce, highlighting how biases in training data and algorithms can lead to unfair outcomes, such as reinforcing stereotypes or discriminating against certain groups. These issues not only undermine consumer trust, but also raise ethical and legal concerns. Finally, the work outlines future research directions, emphasizing the need for more equitable and transparent LLMs in e-commerce. It advocates for ongoing efforts to mitigate biases and improve the fairness of these systems, ensuring they serve diverse global markets effectively and ethically. Through this comprehensive analysis, the survey provides a holistic view of the current landscape of LLMs in e-commerce, offering insights into their potential and limitations, and guiding future endeavors in creating fairer and more inclusive e-commerce environments.
Updated: 2024-05-15 23:25:19
标题: 一份关于大型语言模型在电子商务中公平性的调查:进展、应用与挑战
摘要: 本调查探讨了大型语言模型(LLMs)在电子商务中的公平性,审视它们的进展、应用和面临的挑战。LLMs已经成为电子商务领域的关键,提供创新解决方案并增强客户体验。本文对LLMs在电子商务中的应用和挑战进行了全面调查。论文首先介绍了LLMs在电子商务中使用的关键原则,详细说明了预训练、微调和提示的过程,以满足这些模型的特定需求。然后探讨了LLMs在电子商务中的各种应用,包括产品评论,它们可以综合和分析客户反馈;产品推荐,它们利用消费者数据来建议相关物品;产品信息翻译,增强全球可访问性;以及产品问答部分,它们可以自动化客户支持。该论文批判性地讨论了电子商务中的公平性挑战,强调培训数据和算法中的偏见可能导致不公平结果,例如强化刻板印象或歧视某些群体。这些问题不仅破坏了消费者信任,还引发了伦理和法律关切。最后,该工作概述了未来的研究方向,强调了在电子商务中需要更加公平和透明的LLMs。它主张持续努力减轻偏见并改善这些系统的公平性,确保它们有效和道德地为多样化的全球市场服务。通过这种全面分析,该调查提供了对LLMs在电子商务中当前格局的整体视角,为其潜力和局限性提供了见解,并指导未来努力创造更加公平和包容的电子商务环境。
更新时间: 2024-05-15 23:25:19
领域: cs.CL,cs.AI,cs.CY
Exploring the Complexity of Deep Neural Networks through Functional Equivalence
We investigate the complexity of deep neural networks through the lens of functional equivalence, which posits that different parameterizations can yield the same network function. Leveraging the equivalence property, we present a novel bound on the covering number for deep neural networks, which reveals that the complexity of neural networks can be reduced. Additionally, we demonstrate that functional equivalence benefits optimization, as overparameterized networks tend to be easier to train since increasing network width leads to a diminishing volume of the effective parameter space. These findings can offer valuable insights into the phenomenon of overparameterization and have implications for understanding generalization and optimization in deep learning.
Updated: 2024-05-15 23:13:02
标题: 通过功能等价探索深度神经网络的复杂性
摘要: 我们通过功能等价的视角研究了深度神经网络的复杂性,该视角认为不同的参数化可以产生相同的网络函数。利用等价性质,我们提出了一种新的深度神经网络覆盖数的上界,揭示了神经网络的复杂度可以降低。此外,我们证明了功能等价有助于优化,因为过度参数化的网络往往更容易训练,因为增加网络宽度导致有效参数空间的体积减少。这些发现可以为过度参数化现象提供有价值的见解,并对理解深度学习中的泛化和优化具有重要意义。
更新时间: 2024-05-15 23:13:02
领域: cs.LG,math.ST,stat.TH
HyperSense: Hyperdimensional Intelligent Sensing for Energy-Efficient Sparse Data Processing
Introducing HyperSense, our co-designed hardware and software system efficiently controls Analog-to-Digital Converter (ADC) modules' data generation rate based on object presence predictions in sensor data. Addressing challenges posed by escalating sensor quantities and data rates, HyperSense reduces redundant digital data using energy-efficient low-precision ADC, diminishing machine learning system costs. Leveraging neurally-inspired HyperDimensional Computing (HDC), HyperSense analyzes real-time raw low-precision sensor data, offering advantages in handling noise, memory-centricity, and real-time learning. Our proposed HyperSense model combines high-performance software for object detection with real-time hardware prediction, introducing the novel concept of Intelligent Sensor Control. Comprehensive software and hardware evaluations demonstrate our solution's superior performance, evidenced by the highest Area Under the Curve (AUC) and sharpest Receiver Operating Characteristic (ROC) curve among lightweight models. Hardware-wise, our FPGA-based domain-specific accelerator tailored for HyperSense achieves a 5.6x speedup compared to YOLOv4 on NVIDIA Jetson Orin while showing up to 92.1% energy saving compared to the conventional system. These results underscore HyperSense's effectiveness and efficiency, positioning it as a promising solution for intelligent sensing and real-time data processing across diverse applications.
Updated: 2024-05-15 22:50:10
标题: 超感知:用于能效稀疏数据处理的高维智能感知
摘要: 引入HyperSense,我们共同设计的硬件和软件系统,根据传感器数据中物体存在的预测有效地控制模拟到数字转换器(ADC)模块的数据生成速率。面对传感器数量和数据速率不断上升带来的挑战,HyperSense利用节能高效的低精度ADC减少冗余数字数据,降低机器学习系统成本。利用神经启发的HyperDimensional Computing(HDC),HyperSense分析实时的原始低精度传感器数据,具有处理噪声、以内存为中心和实时学习方面的优势。我们提出的HyperSense模型将高性能软件用于目标检测与实时硬件预测相结合,引入了智能传感器控制的新概念。全面的软件和硬件评估显示我们的解决方案表现优异,体现在轻量级模型中具有最高的曲线下面积(AUC)和最陡峭的接收器操作特性(ROC)曲线。在硬件方面,我们基于FPGA的面向领域的加速器为HyperSense实现了与NVIDIA Jetson Orin上的YOLOv4相比5.6倍的加速,并相比传统系统节能高达92.1%。这些结果强调了HyperSense的有效性和效率,使其成为智能感知和跨各种应用领域的实时数据处理的有前景的解决方案。
更新时间: 2024-05-15 22:50:10
领域: cs.AR,cs.AI
Differentially-Private Hierarchical Federated Learning
While federated learning (FL) eliminates the transmission of raw data over a network, it is still vulnerable to privacy breaches from the communicated model parameters. In this work, we propose \underline{H}ierarchical \underline{F}ederated Learning with \underline{H}ierarchical \underline{D}ifferential \underline{P}rivacy ({\tt H$^2$FDP}), a DP-enhanced FL methodology for jointly optimizing privacy and performance in hierarchical networks. Building upon recent proposals for Hierarchical Differential Privacy (HDP), one of the key concepts of {\tt H$^2$FDP} is adapting DP noise injection at different layers of an established FL hierarchy -- edge devices, edge servers, and cloud servers -- according to the trust models within particular subnetworks. We conduct a comprehensive analysis of the convergence behavior of {\tt H$^2$FDP}, revealing conditions on parameter tuning under which the training process converges sublinearly to a finite stationarity gap that depends on the network hierarchy, trust model, and target privacy level. Leveraging these relationships, we develop an adaptive control algorithm for {\tt H$^2$FDP} that tunes properties of local model training to minimize communication energy, latency, and the stationarity gap while striving to maintain a sub-linear convergence rate and meet desired privacy criteria. Subsequent numerical evaluations demonstrate that {\tt H$^2$FDP} obtains substantial improvements in these metrics over baselines for different privacy budgets, and validate the impact of different system configurations.
Updated: 2024-05-15 22:44:00
标题: 差分隐私层次化联邦学习
摘要: 在联邦学习(FL)中,虽然消除了原始数据在网络上传输的问题,但仍然容易受到通信模型参数泄霆的风险。在这项工作中,我们提出了具有分层差分隐私增强的分层联邦学习(H2FDP),这是一种用于在分层网络中共同优化隐私和性能的DP增强FL方法。在最近提出的分层差分隐私(HDP)的基础上,H2FDP的一个关键概念是根据特定子网络内的信任模型,在已建立的FL层次结构的不同层级——边缘设备、边缘服务器和云服务器上进行DP噪声注入的调整。我们对H2FDP的收敛行为进行了全面分析,揭示了在参数调整下,训练过程如何次线性收敛到一个取决于网络层次结构、信任模型和目标隐私水平的有限稳态差。利用这些关系,我们开发了一种自适应控制算法用于H2FDP,调整本地模型训练的属性,以最小化通信能量、延迟和稳态差,同时努力保持次线性收敛速度并满足所需的隐私标准。随后的数值评估表明,H2FDP在不同隐私预算下的这些指标上比基线取得了显著改进,并验证了不同系统配置的影响。
更新时间: 2024-05-15 22:44:00
领域: cs.LG,cs.CR,cs.DC
Leveraging Machine Learning for Accurate IoT Device Identification in Dynamic Wireless Contexts
Identifying IoT devices is crucial for network monitoring, security enforcement, and inventory tracking. However, most existing identification methods rely on deep packet inspection, which raises privacy concerns and adds computational complexity. More importantly, existing works overlook the impact of wireless channel dynamics on the accuracy of layer-2 features, thereby limiting their effectiveness in real-world scenarios. In this work, we define and use the latency of specific probe-response packet exchanges, referred to as "device latency," as the main feature for device identification. Additionally, we reveal the critical impact of wireless channel dynamics on the accuracy of device identification based on device latency. Specifically, this work introduces "accumulation score" as a novel approach to capturing fine-grained channel dynamics and their impact on device latency when training machine learning models. We implement the proposed methods and measure the accuracy and overhead of device identification in real-world scenarios. The results confirm that by incorporating the accumulation score for balanced data collection and training machine learning algorithms, we achieve an F1 score of over 97% for device identification, even amidst wireless channel dynamics, a significant improvement over the 75% F1 score achieved by disregarding the impact of channel dynamics on data collection and device latency.
Updated: 2024-05-15 22:34:52
标题: 利用机器学习在动态无线环境中准确识别物联网设备
摘要: 识别物联网设备对于网络监控、安全执行和库存跟踪至关重要。然而,大多数现有的识别方法依赖于深度数据包检查,这引发了隐私问题并增加了计算复杂性。更重要的是,现有研究忽视了无线信道动态对第二层特征准确性的影响,从而限制了它们在实际场景中的有效性。在这项工作中,我们定义并使用特定探测-响应数据包交换的延迟,称为“设备延迟”,作为设备识别的主要特征。此外,我们揭示了无线信道动态对基于设备延迟的设备识别准确性的关键影响。具体来说,这项工作引入了“累积分数”作为一种捕获细粒度信道动态及其对设备延迟的影响的新方法,用于训练机器学习模型。我们实施了提出的方法,并在实际场景中测量了设备识别的准确性和开销。结果证实,通过将累积分数纳入平衡数据收集和训练机器学习算法中,即使在无线信道动态中,我们也能实现设备识别的F1分数超过97%,这显著改善了在数据收集和设备延迟中忽略信道动态影响时达到的75%的F1分数。
更新时间: 2024-05-15 22:34:52
领域: cs.NI,cs.AI,cs.LG,cs.OS
DP-RuL: Differentially-Private Rule Learning for Clinical Decision Support Systems
Serious privacy concerns arise with the use of patient data in rule-based clinical decision support systems (CDSS). The goal of a privacy-preserving CDSS is to learn a population ruleset from individual clients' local rulesets, while protecting the potentially sensitive information contained in the rulesets. We present the first work focused on this problem and develop a framework for learning population rulesets with local differential privacy (LDP), suitable for use within a distributed CDSS and other distributed settings. Our rule discovery protocol uses a Monte-Carlo Tree Search (MCTS) method integrated with LDP to search a rule grammar in a structured way and find rule structures clients are likely to have. Randomized response queries are sent to clients to determine promising paths to search within the rule grammar. In addition, we introduce an adaptive budget allocation method which dynamically determines how much privacy loss budget to use at each query, resulting in better privacy-utility trade-offs. We evaluate our approach using three clinical datasets and find that we are able to learn population rulesets with high coverage (breadth of rules) and clinical utility even at low privacy loss budgets.
Updated: 2024-05-15 22:31:29
标题: DP-RuL:用于临床决策支持系统的差分隐私规则学习
摘要: 在基于规则的临床决策支持系统(CDSS)中使用患者数据会引发严重的隐私问题。隐私保护CDSS的目标是从个体客户的本地规则集中学习一个人群规则集,同时保护规则集中可能包含的潜在敏感信息。我们提出了针对这一问题的首个工作,并开发了一个适用于分布式CDSS和其他分布式环境的具有本地差分隐私(LDP)的学习人群规则集的框架。我们的规则发现协议使用蒙特卡洛树搜索(MCTS)方法与LDP集成,以结构化方式搜索规则语法并找到客户可能具有的规则结构。随机响应查询被发送给客户,以确定在规则语法内搜索有希望的路径。此外,我们引入了一种自适应预算分配方法,动态确定每个查询使用多少隐私损失预算,从而实现更好的隐私效用权衡。我们使用三个临床数据集评估了我们的方法,并发现我们能够在低隐私损失预算下学习具有高覆盖率(规则广度)和临床效用的人群规则集。
更新时间: 2024-05-15 22:31:29
领域: cs.CR
Spectral Editing of Activations for Large Language Model Alignment
Large language models (LLMs) often exhibit undesirable behaviours, such as generating untruthful or biased content. Editing their internal representations has been shown to be effective in mitigating such behaviours on top of the existing alignment methods. We propose a novel inference-time editing method, namely spectral editing of activations (SEA), to project the input representations into directions with maximal covariance with the positive demonstrations (e.g., truthful) while minimising covariance with the negative demonstrations (e.g., hallucinated). We also extend our method to non-linear editing using feature functions. We run extensive experiments on benchmarks concerning truthfulness and bias with six open-source LLMs of different sizes and model families. The results demonstrate the superiority of SEA in effectiveness, generalisation to similar tasks, as well as inference and data efficiency. We also show that SEA editing only has a limited negative impact on other model capabilities.
Updated: 2024-05-15 22:28:23
标题: 大型语言模型对齐的激活谱编辑
摘要: 大型语言模型(LLMs)通常会表现出不良行为,例如生成不真实或有偏见的内容。编辑它们的内部表示已被证明可以有效地减轻这种行为,除了现有的对齐方法。我们提出了一种新颖的推理时编辑方法,即激活谱编辑(SEA),将输入表示投影到与正面示范(例如,真实的)具有最大协方差的方向,同时最小化与负面示范(例如,虚构的)的协方差。我们还将我们的方法扩展到使用特征函数进行非线性编辑。我们对涉及真实性和偏见的基准进行了大量实验,涉及不同大小和模型系列的六个开源LLMs。结果表明SEA在效果、对类似任务的泛化性、以及推理和数据效率方面的优越性。我们还表明SEA编辑对其他模型能力的负面影响有限。
更新时间: 2024-05-15 22:28:23
领域: cs.CL,cs.AI,cs.LG
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or situated reasoning and rarely involve broader knowledge in the real world. Our work aims to delve deeper into reasoning evaluations, specifically within dynamic, open-world, and structured context knowledge. We propose a new benchmark (SOK-Bench), consisting of 44K questions and 10K situations with instance-level annotations depicted in the videos. The reasoning process is required to understand and apply situated knowledge and general knowledge for problem-solving. To create such a dataset, we propose an automatic and scalable generation method to generate question-answer pairs, knowledge graphs, and rationales by instructing the combinations of LLMs and MLLMs. Concretely, we first extract observable situated entities, relations, and processes from videos for situated knowledge and then extend to open-world knowledge beyond the visible content. The task generation is facilitated through multiple dialogues as iterations and subsequently corrected and refined by our designed self-promptings and demonstrations. With a corpus of both explicit situated facts and implicit commonsense, we generate associated question-answer pairs and reasoning processes, finally followed by manual reviews for quality assurance. We evaluated recent mainstream large vision-language models on the benchmark and found several insightful conclusions. For more information, please refer to our benchmark at www.bobbywu.com/SOKBench.
Updated: 2024-05-15 21:55:31
标题: SOK-Bench:一个具有对齐开放世界知识的视频推理基准
摘要: 从真实世界的视觉环境和场景中学习常识推理对于先进人工智能是至关重要的一步。然而,现有的视频推理基准仍然不足,因为它们主要设计用于事实或情境推理,并很少涉及到现实世界中更广泛的知识。我们的工作旨在深入探讨推理评估,特别是在动态、开放世界和结构化上下文知识中。我们提出了一个新的基准(SOK-Bench),包括44,000个问题和10,000个情境,其中视频中展示了实例级别的注释。推理过程需要理解和应用情境知识和一般知识来解决问题。为了创建这样一个数据集,我们提出了一种自动化和可扩展的生成方法,通过指导LLMs和MLLMs的组合来生成问题-答案对、知识图和推理。具体来说,我们首先从视频中提取可观察的情境实体、关系和过程,然后扩展到超出可见内容的开放世界知识。任务生成通过多次对话作为迭代来促进,并随后通过我们设计的自我提示和演示进行纠正和完善。通过具有明确情境事实和隐含常识的语料库,我们生成相关的问题-答案对和推理过程,最终通过手动审查进行质量保证。我们评估了最近主流的大型视觉-语言模型在基准上的表现,并得出了一些有见地的结论。有关更多信息,请参考我们的基准网站www.bobbywu.com/SOKBench。
更新时间: 2024-05-15 21:55:31
领域: cs.CV,cs.AI,cs.CL
STAR: A Benchmark for Situated Reasoning in Real-World Videos
Reasoning in the real world is not divorced from situations. How to capture the present knowledge from surrounding situations and perform reasoning accordingly is crucial and challenging for machine intelligence. This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated Reasoning in Real-World Videos (STAR Benchmark). This benchmark is built upon the real-world videos associated with human actions or interactions, which are naturally dynamic, compositional, and logical. The dataset includes four types of questions, including interaction, sequence, prediction, and feasibility. We represent the situations in real-world videos by hyper-graphs connecting extracted atomic entities and relations (e.g., actions, persons, objects, and relationships). Besides visual perception, situated reasoning also requires structured situation comprehension and logical reasoning. Questions and answers are procedurally generated. The answering logic of each question is represented by a functional program based on a situation hyper-graph. We compare various existing video reasoning models and find that they all struggle on this challenging situated reasoning task. We further propose a diagnostic neuro-symbolic model that can disentangle visual perception, situation abstraction, language understanding, and functional reasoning to understand the challenges of this benchmark.
Updated: 2024-05-15 21:53:54
标题: STAR:一个用于现实世界视频中情景推理的基准Benchmark
摘要: 现实世界中的推理并不是与情境脱离的。如何从周围情境中捕捉当前知识并进行相应推理对于机器智能至关重要且具有挑战性。本文介绍了一个新的基准,通过情境抽象和基于逻辑的问题回答来评估现实世界视频中的情境推理能力,称为现实世界视频中的情境推理基准(STAR基准)。该基准建立在与人类动作或互动相关的现实世界视频之上,这些视频自然动态、组合、逻辑。数据集包括四种类型的问题,包括互动、序列、预测和可行性。我们通过连接提取的原子实体和关系(例如动作、人物、物体和关系)的超图来表示现实世界视频中的情境。除了视觉感知,情境推理还需要结构化情境理解和逻辑推理。问题和答案是通过程序生成的。每个问题的回答逻辑由基于情境超图的功能程序表示。我们比较了各种现有的视频推理模型,发现它们在这一具有挑战性的情境推理任务上都面临困难。我们进一步提出了一种诊断性神经符号模型,可以分解视觉感知、情境抽象、语言理解和功能推理,以理解这一基准的挑战。
更新时间: 2024-05-15 21:53:54
领域: cs.AI,cs.CL,cs.CV
TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
Recent advances in diffusion-based generative modeling have led to the development of text-to-video (T2V) models that can generate high-quality videos conditioned on a text prompt. Most of these T2V models often produce single-scene video clips that depict an entity performing a particular action (e.g., `a red panda climbing a tree'). However, it is pertinent to generate multi-scene videos since they are ubiquitous in the real-world (e.g., `a red panda climbing a tree' followed by `the red panda sleeps on the top of the tree'). To generate multi-scene videos from the pretrained T2V model, we introduce Time-Aligned Captions (TALC) framework. Specifically, we enhance the text-conditioning mechanism in the T2V architecture to recognize the temporal alignment between the video scenes and scene descriptions. For instance, we condition the visual features of the earlier and later scenes of the generated video with the representations of the first scene description (e.g., `a red panda climbing a tree') and second scene description (e.g., `the red panda sleeps on the top of the tree'), respectively. As a result, we show that the T2V model can generate multi-scene videos that adhere to the multi-scene text descriptions and be visually consistent (e.g., entity and background). Further, we finetune the pretrained T2V model with multi-scene video-text data using the TALC framework. We show that the TALC-finetuned model outperforms the baseline methods by 15.5 points in the overall score, which averages visual consistency and text adherence using human evaluation. The project website is https://talc-mst2v.github.io/.
Updated: 2024-05-15 21:44:31
标题: TALC:用于多场景文本到视频生成的时间对齐字幕
摘要: 最近在基于扩散的生成建模方面取得了进展,导致了文本到视频(T2V)模型的发展,可以生成基于文本提示的高质量视频。大多数这些T2V模型通常会产生描述实体执行特定动作的单场景视频剪辑(例如,“一只红熊猫爬树”)。然而,生成多场景视频是必要的,因为它们在现实世界中是普遍存在的(例如,“一只红熊猫爬树”,随后是“这只红熊猫睡在树顶上”)。为了从预训练的T2V模型生成多场景视频,我们引入了时间对齐字幕(TALC)框架。具体来说,我们增强了T2V架构中的文本调节机制,以识别视频场景和场景描述之间的时间对齐。例如,我们分别使用生成视频的前后场景的视觉特征与第一个场景描述(例如,“一只红熊猫爬树”)和第二个场景描述(例如,“这只红熊猫睡在树顶上”)的表示进行条件调节。结果,我们表明T2V模型可以生成符合多场景文本描述并在视觉上保持一致(例如,实体和背景)的多场景视频。此外,我们使用TALC框架对预训练的T2V模型进行多场景视频文本数据的微调。我们展示,TALC微调的模型在整体评分中比基线方法高出15.5个点,这个评分平均了视觉一致性和文本遵从度。项目网站是https://talc-mst2v.github.io/。
更新时间: 2024-05-15 21:44:31
领域: cs.CV,cs.AI,cs.LG
No More Mumbles: Enhancing Robot Intelligibility through Speech Adaptation
Spoken language interaction is at the heart of interpersonal communication, and people flexibly adapt their speech to different individuals and environments. It is surprising that robots, and by extension other digital devices, are not equipped to adapt their speech and instead rely on fixed speech parameters, which often hinder comprehension by the user. We conducted a speech comprehension study involving 39 participants who were exposed to different environmental and contextual conditions. During the experiment, the robot articulated words using different vocal parameters, and the participants were tasked with both recognising the spoken words and rating their subjective impression of the robot's speech. The experiment's primary outcome shows that spaces with good acoustic quality positively correlate with intelligibility and user experience. However, increasing the distance between the user and the robot exacerbated the user experience, while distracting background sounds significantly reduced speech recognition accuracy and user satisfaction. We next built an adaptive voice for the robot. For this, the robot needs to know how difficult it is for a user to understand spoken language in a particular setting. We present a prediction model that rates how annoying the ambient acoustic environment is and, consequentially, how hard it is to understand someone in this setting. Then, we develop a convolutional neural network model to adapt the robot's speech parameters to different users and spaces, while taking into account the influence of ambient acoustics on intelligibility. Finally, we present an evaluation with 27 users, demonstrating superior intelligibility and user experience with adaptive voice parameters compared to fixed voice.
Updated: 2024-05-15 21:28:55
标题: 没有更多的含糊:通过语音适应增强机器人的可理解性
摘要: 口语交流是人际沟通的核心,人们会灵活地调整自己的言语以适应不同的个体和环境。令人惊讶的是,机器人,以及其他数字设备,并没有能力调整自己的言语,而是依赖固定的言语参数,这往往会影响用户的理解。我们进行了一个包括39名参与者的言语理解研究,这些参与者接受了不同的环境和语境条件。在实验中,机器人使用不同的声音参数发音单词,参与者需要识别这些单词并对机器人的言语表达进行主观评价。实验的主要结果表明,声学质量良好的空间与可懂性和用户体验呈正相关。然而,用户与机器人之间的距离增加会加剧用户体验,而分散的背景声会显著降低言语识别准确性和用户满意度。接下来,我们为机器人建立了一个自适应语音。为此,机器人需要知道在特定环境中用户理解口语的难度。我们提出了一个预测模型,评估环境的环境音有多么烦人,以及在这种环境中理解某人有多难。然后,我们开发了一个卷积神经网络模型,将机器人的言语参数调整到不同的用户和空间,同时考虑环境音对可懂性的影响。最后,我们通过27名用户进行了评估,证明了自适应语音参数相比固定语音具有更好的可懂性和用户体验。
更新时间: 2024-05-15 21:28:55
领域: cs.RO,cs.AI,stat.CO
Pulse Shape Simulation and Discrimination using Machine-Learning Techniques
An essential metric for the quality of a particle-identification experiment is its statistical power to discriminate between signal and background. Pulse shape discrimination (PSD) is a basic method for this purpose in many nuclear, high-energy and rare-event search experiments where scintillation detectors are used. Conventional techniques exploit the difference between decay-times of the pulses from signal and background events or pulse signals caused by different types of radiation quanta to achieve good discrimination. However, such techniques are efficient only when the total light-emission is sufficient to get a proper pulse profile. This is only possible when adequate amount of energy is deposited from recoil of the electrons or the nuclei of the scintillator materials caused by the incident particle on the detector. But, rare-event search experiments like direct search for dark matter do not always satisfy these conditions. Hence, it becomes imperative to have a method that can deliver a very efficient discrimination in these scenarios. Neural network based machine-learning algorithms have been used for classification problems in many areas of physics especially in high-energy experiments and have given better results compared to conventional techniques. We present the results of our investigations of two network based methods \viz Dense Neural Network and Recurrent Neural Network, for pulse shape discrimination and compare the same with conventional methods.
Updated: 2024-05-15 21:26:24
标题: 脉冲形状模拟和辨识技术的机器学习应用
摘要: 一个粒子识别实验质量的重要度量标准是其统计功率,用于区分信号和背景。脉冲形状辨识(PSD)是许多核能、高能和稀有事件搜索实验中用于此目的的基本方法,其中使用闪烁探测器。传统技术利用信号和背景事件的脉冲衰减时间之间的差异或不同类型辐射量引起的脉冲信号来实现良好的区分。然而,仅当总发光足够时这些技术才有效,才能获得适当的脉冲轮廓。这仅在由入射粒子对探测器上的电子或闪烁体材料的核的反冲引起的足够能量沉积时才可能。但是,像直接搜索暗物质这样的稀有事件搜索实验并不总是满足这些条件。因此,在这些情况下,开发一种可以提供非常高效区分的方法变得至关重要。基于神经网络的机器学习算法已经在许多物理领域中用于分类问题,特别是在高能实验中,与传统技术相比取得了更好的结果。我们展示了我们对两种基于网络的方法(即密集神经网络和循环神经网络)进行的脉冲形状辨识的研究结果,并将其与传统方法进行比较。
更新时间: 2024-05-15 21:26:24
领域: physics.ins-det,cs.LG
Point2SSM++: Self-Supervised Learning of Anatomical Shape Models from Point Clouds
Correspondence-based statistical shape modeling (SSM) stands as a powerful technology for morphometric analysis in clinical research. SSM facilitates population-level characterization and quantification of anatomical shapes such as bones and organs, aiding in pathology and disease diagnostics and treatment planning. Despite its potential, SSM remains under-utilized in medical research due to the significant overhead associated with automatic construction methods, which demand complete, aligned shape surface representations. Additionally, optimization-based techniques rely on bias-inducing assumptions or templates and have prolonged inference times as the entire cohort is simultaneously optimized. To overcome these challenges, we introduce Point2SSM++, a principled, self-supervised deep learning approach that directly learns correspondence points from point cloud representations of anatomical shapes. Point2SSM++ is robust to misaligned and inconsistent input, providing SSM that accurately samples individual shape surfaces while effectively capturing population-level statistics. Additionally, we present principled extensions of Point2SSM++ to adapt it for dynamic spatiotemporal and multi-anatomy use cases, demonstrating the broad versatility of the Point2SSM++ framework. Furthermore, we present extensions of Point2SSM++ tailored for dynamic spatiotemporal and multi-anatomy scenarios, showcasing the broad versatility of the framework. Through extensive validation across diverse anatomies, evaluation metrics, and clinically relevant downstream tasks, we demonstrate Point2SSM++'s superiority over existing state-of-the-art deep learning models and traditional approaches. Point2SSM++ substantially enhances the feasibility of SSM generation and significantly broadens its array of potential clinical applications.
Updated: 2024-05-15 21:13:54
标题: Point2SSM++:从点云中进行解剖形态模型的自监督学习
摘要: 基于对应关系的统计形状建模(SSM)是临床研究中形态测量分析的强大技术。SSM促进了对解剖形状(如骨骼和器官)的人群级别特征化和量化,有助于病理学和疾病诊断以及治疗规划。尽管具有潜力,但由于与自动构建方法相关的重大开销,SSM在医学研究中仍未得到充分利用,这些方法要求完整、对齐的形状表面表示。此外,基于优化的技术依赖于引入偏见的假设或模板,并且由于整个队列同时被优化,推断时间延长。为了克服这些挑战,我们引入了Point2SSM ++,这是一种合理的、自监督的深度学习方法,直接从解剖形状的点云表示中学习对应点。Point2SSM++对不对齐和不一致的输入具有鲁棒性,提供了准确采样个体形状表面的SSM,同时有效地捕捉了人群级别的统计数据。此外,我们提出了Point2SSM++的合理扩展,使其适应动态时空和多解剖学用例,展示了Point2SSM++框架的广泛多功能性。此外,我们提出了专门针对动态时空和多解剖场景定制的Point2SSM++扩展,展示了该框架的广泛多功能性。通过在不同解剖学、评估指标和临床相关下游任务中进行广泛验证,我们证明了Point2SSM++相对于现有最先进的深度学习模型和传统方法的优趀。Point2SSM++显著提升了SSM生成的可行性,并显著拓宽了其潜在的临床应用范围。
更新时间: 2024-05-15 21:13:54
领域: cs.CV,cs.LG
Permissible Knowledge Pooling
Information pooling has been extensively formalised across various logical frameworks in distributed systems, characterized by diverse information-sharing patterns. These approaches generally adopt an intersection perspective, aggregating all possible information, regardless of whether it is known or unknown to the agents. In contrast, this work adopts a unique stance, emphasising that sharing knowledge means distributing what is known, rather than what remains uncertain. This paper introduces new modal logics for knowledge pooling and sharing, ranging from a novel language of knowledge pooling to a dynamic mechanism for knowledge sharing. It also outlines their axiomatizations and discusses a potential framework for permissible knowledge pooling.
Updated: 2024-05-15 21:05:03
标题: 可允许的知识汇聚
摘要: 信息汇集已在分布式系统的各种逻辑框架中得到广泛形式化,其特点是多样化的信息共享模式。这些方法通常采用交集的视角,汇总所有可能的信息,不论代理是否已知或未知。相比之下,这项工作采用了一种独特的立场,强调分享知识意味着分发已知的内容,而不是仍然不确定的内容。本文介绍了新的知识汇集和共享的模态逻辑,从知识汇集的新语言到知识共享的动态机制。它还概述了它们的公理化,并讨论了一个可行的知识汇集框架。
更新时间: 2024-05-15 21:05:03
领域: cs.LO,cs.AI
CaloFlow for CaloChallenge Dataset 1
CaloFlow is a new and promising approach to fast calorimeter simulation based on normalizing flows. Applying CaloFlow to the photon and charged pion Geant4 showers of Dataset 1 of the Fast Calorimeter Simulation Challenge 2022, we show how it can produce high-fidelity samples with a sampling time that is several orders of magnitude faster than Geant4. We demonstrate the fidelity of the samples using calorimeter shower images, histograms of high-level features, and aggregate metrics such as a classifier trained to distinguish CaloFlow from Geant4 samples.
Updated: 2024-05-15 20:56:03
标题: CaloFlow用于CaloChallenge数据集1
摘要: CaloFlow是一种基于正规化流的快速量能器模拟的新方法,展现了很大的潜力。将CaloFlow应用于2022年快速量能器模拟挑战的Dataset 1中的光子和带电π介子Geant4淋浴,我们展示了它如何能够产生高保真度的样本,采样时间比Geant4快几个数量级。我们通过量能器淋浴图像、高级特征的直方图以及分类器等聚合指标展示了样本的保真度,该分类器经过训练能够区分CaloFlow样本和Geant4样本。
更新时间: 2024-05-15 20:56:03
领域: physics.ins-det,cs.LG,hep-ex,hep-ph,physics.data-an
Guardians of the Quantum GAN
Quantum Generative Adversarial Networks (qGANs) are at the forefront of image-generating quantum machine learning models. To accommodate the growing demand for Noisy Intermediate-Scale Quantum (NISQ) devices to train and infer quantum machine learning models, the number of third-party vendors offering quantum hardware as a service is expected to rise. This expansion introduces the risk of untrusted vendors potentially stealing proprietary information from the quantum machine learning models. To address this concern we propose a novel watermarking technique that exploits the noise signature embedded during the training phase of qGANs as a non-invasive watermark. The watermark is identifiable in the images generated by the qGAN allowing us to trace the specific quantum hardware used during training hence providing strong proof of ownership. To further enhance the security robustness, we propose the training of qGANs on a sequence of multiple quantum hardware, embedding a complex watermark comprising the noise signatures of all the training hardware that is difficult for adversaries to replicate. We also develop a machine learning classifier to extract this watermark robustly, thereby identifying the training hardware (or the suite of hardware) from the images generated by the qGAN validating the authenticity of the model. We note that the watermark signature is robust against inferencing on hardware different than the hardware that was used for training. We obtain watermark extraction accuracy of 100% and ~90% for training the qGAN on individual and multiple quantum hardware setups (and inferencing on different hardware), respectively. Since parameter evolution during training is strongly modulated by quantum noise, the proposed watermark can be extended to other quantum machine learning models as well.
Updated: 2024-05-15 20:55:24
标题: 量子GAN的守护者
摘要: 量子生成对抗网络(qGANs)处于生成图像的量子机器学习模型的前沿。为了满足对嘈杂中间规模量子(NISQ)设备进行训练和推断的不断增长的需求,第三方供应商提供量子硬件作为服务的数量预计将增加。这种扩展引入了未经信任的供应商可能从量子机器学习模型中窃取专有信息的风险。为了解决这一问题,我们提出了一种利用在qGANs训练阶段嵌入的噪声特征作为非侵入式水印的新颖水印技术。水印在由qGAN生成的图像中是可识别的,使我们能够追踪在训练过程中使用的特定量子硬件,从而提供强有力的所有权证据。为了进一步增强安全性的强度,我们建议在一系列多个量子硬件上训练qGANs,嵌入一个包含所有训练硬件的噪声特征的复杂水印,这对于对手来说很难复制。我们还开发了一个机器学习分类器来稳健地提取这个水印,从而从由qGAN生成的图像中识别训练硬件(或硬件套件),验证模型的真实性。我们注意到水印签名对不同于用于训练的硬件的硬件进行推断具有强韧性。我们分别在单个和多个量子硬件设置上训练qGAN,并在不同硬件上进行推断,获得了水印提取的准确率为100%和约90%。由于在训练过程中参数演变受量子噪声强烈调制,所提出的水印也可以扩展到其他量子机器学习模型。
更新时间: 2024-05-15 20:55:24
领域: quant-ph,cs.AR,cs.CR,cs.LG
Modeling User Preferences via Brain-Computer Interfacing
Present Brain-Computer Interfacing (BCI) technology allows inference and detection of cognitive and affective states, but fairly little has been done to study scenarios in which such information can facilitate new applications that rely on modeling human cognition. One state that can be quantified from various physiological signals is attention. Estimates of human attention can be used to reveal preferences and novel dimensions of user experience. Previous approaches have tackled these incredibly challenging tasks using a variety of behavioral signals, from dwell-time to click-through data, and computational models of visual correspondence to these behavioral signals. However, behavioral signals are only rough estimations of the real underlying attention and affective preferences of the users. Indeed, users may attend to some content simply because it is salient, but not because it is really interesting, or simply because it is outrageous. With this paper, we put forward a research agenda and example work using BCI to infer users' preferences, their attentional correlates towards visual content, and their associations with affective experience. Subsequently, we link these to relevant applications, such as information retrieval, personalized steering of generative models, and crowdsourcing population estimates of affective experiences.
Updated: 2024-05-15 20:41:46
标题: 通过脑-计算机接口建模用户偏好
摘要: 目前的脑-计算机接口(BCI)技术允许推断和检测认知和情感状态,但对于研究这种信息如何促进依赖模拟人类认知的新应用的情况,做得还相对较少。可以从各种生理信号中量化的一种状态是注意力。人类注意力的估计可以用来揭示用户体验的偏好和新颖维度。以前的方法使用各种行为信号,从停留时间到点击数据,以及计算模型与这些行为信号的视觉对应关系来解决这些极具挑战性的任务。然而,行为信号只是对用户真实的注意力和情感偏好的粗略估计。实际上,用户可能只是因为某些内容很显著而关注它,而不是因为它真的很有趣,或者仅仅因为它很令人震惊。通过本文,我们提出了一个研究议程和利用BCI推断用户偏好、他们对视觉内容的注意力相关性以及他们与情感体验的关联的示例工作。随后,我们将这些与相关应用程序联系起来,例如信息检索、个性化引导生成模型以及众包对情感体验的人口估计。
更新时间: 2024-05-15 20:41:46
领域: cs.HC,cs.AI
Generalized Holographic Reduced Representations
Deep learning has achieved remarkable success in recent years. Central to its success is its ability to learn representations that preserve task-relevant structure. However, massive energy, compute, and data costs are required to learn general representations. This paper explores Hyperdimensional Computing (HDC), a computationally and data-efficient brain-inspired alternative. HDC acts as a bridge between connectionist and symbolic approaches to artificial intelligence (AI), allowing explicit specification of representational structure as in symbolic approaches while retaining the flexibility of connectionist approaches. However, HDC's simplicity poses challenges for encoding complex compositional structures, especially in its binding operation. To address this, we propose Generalized Holographic Reduced Representations (GHRR), an extension of Fourier Holographic Reduced Representations (FHRR), a specific HDC implementation. GHRR introduces a flexible, non-commutative binding operation, enabling improved encoding of complex data structures while preserving HDC's desirable properties of robustness and transparency. In this work, we introduce the GHRR framework, prove its theoretical properties and its adherence to HDC properties, explore its kernel and binding characteristics, and perform empirical experiments showcasing its flexible non-commutativity, enhanced decoding accuracy for compositional structures, and improved memorization capacity compared to FHRR.
Updated: 2024-05-15 20:37:48
标题: 广义全息减少表示
摘要: 深度学习近年来取得了显著的成功。其成功的关键在于其能够学习保持与任务相关结构的表示。然而,学习通用表示需要大量的能量、计算和数据成本。本文探讨了高维计算(HDC),这是一种计算和数据高效的受大脑启发的替代方法。HDC充当了连接主义和符号方法之间的桥梁,允许明确指定表示结构,就像符号方法一样,同时保留连接主义方法的灵活性。然而,HDC的简单性对于编码复杂的组合结构,特别是其绑定操作,提出了挑战。为了解决这个问题,我们提出了广义全息减少表示(GHRR),这是傅里叶全息减少表示(FHRR)的一个扩展,这是一种特定的HDC实现。GHRR引入了一种灵活的、非交换的绑定操作,可以改善复杂数据结构的编码,同时保持HDC的鲁棒性和透明性。在这项工作中,我们介绍了GHRR框架,证明了其理论特性及其遵守HDC特性,探讨了其核心和绑定特性,并进行了实证实验,展示了其灵活的非交换性,对组合结构的增强解码准确度,以及相对于FHRR的改进的记忆能力。
更新时间: 2024-05-15 20:37:48
领域: cs.LG,cs.AI,cs.SC
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
In response to rising concerns surrounding the safety, security, and trustworthiness of Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red-teaming as a key component of their strategies for identifying and mitigating these risks. However, despite AI red-teaming's central role in policy discussions and corporate messaging, significant questions remain about what precisely it means, what role it can play in regulation, and how it relates to conventional red-teaming practices as originally conceived in the field of cybersecurity. In this work, we identify recent cases of red-teaming activities in the AI industry and conduct an extensive survey of relevant research literature to characterize the scope, structure, and criteria for AI red-teaming practices. Our analysis reveals that prior methods and practices of AI red-teaming diverge along several axes, including the purpose of the activity (which is often vague), the artifact under evaluation, the setting in which the activity is conducted (e.g., actors, resources, and methods), and the resulting decisions it informs (e.g., reporting, disclosure, and mitigation). In light of our findings, we argue that while red-teaming may be a valuable big-tent idea for characterizing GenAI harm mitigations, and that industry may effectively apply red-teaming and other strategies behind closed doors to safeguard AI, gestures towards red-teaming (based on public definitions) as a panacea for every possible risk verge on security theater. To move toward a more robust toolbox of evaluations for generative AI, we synthesize our recommendations into a question bank meant to guide and scaffold future AI red-teaming practices.
Updated: 2024-05-15 20:31:20
标题: 生成式人工智能的红队测试:灵丹还是安全剧院?
摘要: 为了应对人工智能生成模型(GenAI)的安全性、安全性和可信度日益引起的关注,从业者和监管机构都指出人工智能红队作为识别和减轻这些风险的关键组成部分。然而,尽管人工智能红队在政策讨论和公司信息传播中起着核心作用,但仍然存在重要问题,即它究竟意味着什么,在监管中可以扮演什么角色,以及它与最初在网络安全领域构想的传统红队实践有何关联。在这项工作中,我们确定了人工智能行业中红队活动的最新案例,并对相关研究文献进行了广泛调查,以描述人工智能红队实践的范围、结构和标准。我们的分析表明,以前的人工智能红队方法和实践在几个方面存在分歧,包括活动的目的(通常模糊不清)、评估对象、活动进行的环境(例如,参与者、资源和方法)以及其所涉及的决策(例如,报告、披露和减轻)。鉴于我们的研究结果,我们认为,虽然红队可能是一个有价值的大篷车理念,用于描述GenAI危害减轻措施,而且行业可能会在幕后有效地应用红队和其他策略来保护人工智能,但基于公共定义的红队做法,作为每种可能风险的灵丹妙药,边缘化于安全剧场。为了朝着更强大的评估工具箱迈进,我们将我们的建议综合成一个旨在引导和支持未来人工智能红队实践的问题库。
更新时间: 2024-05-15 20:31:20
领域: cs.CY,cs.HC,cs.LG
Benchmark Early and Red Team Often: A Framework for Assessing and Managing Dual-Use Hazards of AI Foundation Models
A concern about cutting-edge or "frontier" AI foundation models is that an adversary may use the models for preparing chemical, biological, radiological, nuclear, (CBRN), cyber, or other attacks. At least two methods can identify foundation models with potential dual-use capability; each has advantages and disadvantages: A. Open benchmarks (based on openly available questions and answers), which are low-cost but accuracy-limited by the need to omit security-sensitive details; and B. Closed red team evaluations (based on private evaluation by CBRN and cyber experts), which are higher-cost but can achieve higher accuracy by incorporating sensitive details. We propose a research and risk-management approach using a combination of methods including both open benchmarks and closed red team evaluations, in a way that leverages advantages of both methods. We recommend that one or more groups of researchers with sufficient resources and access to a range of near-frontier and frontier foundation models run a set of foundation models through dual-use capability evaluation benchmarks and red team evaluations, then analyze the resulting sets of models' scores on benchmark and red team evaluations to see how correlated those are. If, as we expect, there is substantial correlation between the dual-use potential benchmark scores and the red team evaluation scores, then implications include the following: The open benchmarks should be used frequently during foundation model development as a quick, low-cost measure of a model's dual-use potential; and if a particular model gets a high score on the dual-use potential benchmark, then more in-depth red team assessments of that model's dual-use capability should be performed. We also discuss limitations and mitigations for our approach, e.g., if model developers try to game benchmarks by including a version of benchmark test data in a model's training data.
Updated: 2024-05-15 20:28:15
标题: 基准测试早期并经常进行红队评估:评估和管理AI基础模型的双重用途危险的框架
摘要: 对于尖端或“前沿”人工智能基础模型的一个担忧是,对手可能利用这些模型来准备化学、生物、放射性、核(CBRN)、网络或其他攻击。至少有两种方法可以识别具有潜在双重用途能力的基础模型;每种方法都有优势和劣势:A.开放基准(基于公开提供的问题和答案),成本较低,但由于需要省略安全敏感细节而受到准确性限制;B.封闭的红队评估(基于化学、生物、放射性、核和网络专家的私人评估),成本更高,但可以通过纳入敏感细节来实现更高的准确性。我们提出了一种研究和风险管理方法,结合了开放基准和封闭红队评估等多种方法,以利用这两种方法的优势。我们建议,具有足够资源和接触各种近前沿和前沿基础模型的研究人员组或单独的研究人员应对一组基础模型进行双重用途能力评估基准和红队评估,然后分析基准评估和红队评估结果集上的模型得分,以查看这些得分之间的相关性。如果像我们所预期的那样,在双重用途潜力基准得分和红队评估得分之间存在实质性相关性,则相关含义包括:在基础模型开发过程中频繁使用开放基准,作为快速、低成本衡量模型双重用途潜力的方法;如果特定模型在双重用途潜力基准上得分较高,则应对该模型的双重用途能力进行更深入的红队评估。我们还讨论了我们方法的局限性和缓解措施,例如,如果模型开发者试图通过在模型的训练数据中包含基准测试数据的版本来操纵基准。
更新时间: 2024-05-15 20:28:15
领域: cs.CR,cs.AI,cs.CY,cs.LG
From Local to Global Order: A Theory of Neural Synaptic Balance
We develop a theory of neural synaptic balance and how it can emerge or be enforced in neural networks. For a given additive cost function $R$ (regularizer), a neuron is said to be in balance if the total cost of its input weights is equal to the total cost of its output weights. The basic example is provided by feedforward networks of ReLU units trained with $L_2$ regularizers, which exhibit balance after proper training. The theory explains this phenomenon and extends it in several directions. The first direction is the extension to bilinear and other activation functions. The second direction is the extension to more general regularizers, including all $L_p$ ($p>0$) regularizers. The third direction is the extension to non-layered architectures, recurrent architectures, convolutional architectures, as well as architectures with mixed activation functions. The theory is based on two local neuronal operations: scaling which is commutative, and balancing which is not commutative. Finally, and most importantly, given any initial set of weights, when local balancing operations are applied to each neuron in a stochastic manner, global order always emerges through the convergence of the stochastic balancing algorithm to the same unique set of balanced weights. The reason for this convergence is the existence of an underlying strictly convex optimization problem where the relevant variables are constrained to a linear, only architecture-dependent, manifold. The theory is corroborated through various simulations carried out on benchmark data sets. Scaling and balancing operations are entirely local and thus physically plausible in biological and neuromorphic networks.
Updated: 2024-05-15 20:27:56
标题: 从局部到全局秩序:神经突触平衡理论
摘要: 我们发展了一个关于神经突触平衡的理论,以及它如何在神经网络中出现或被强制实施。对于给定的加法成本函数$R$(正则化器),如果一个神经元的输入权重的总成本等于其输出权重的总成本,则称其处于平衡状态。基本示例是使用$L_2$正则化器训练的ReLU单元的前馈网络,在适当训练后表现出平衡。该理论解释了这一现象并在几个方向上进行了扩展。第一个方向是将其扩展到双线性和其他激活函数。第二个方向是将其扩展到更一般的正则化器,包括所有$L_p$($p>0$)正则化器。第三个方向是将其扩展到非分层架构、循环架构、卷积架构,以及混合激活函数的架构。该理论基于两种局部神经元操作:可交换的缩放和不可交换的平衡。最重要的是,鉴于任何初始权重集,当对每个神经元以随机方式应用局部平衡操作时,通过随机平衡算法的收敛,全局秩序总是通过收敛到相同独特的平衡权重集来实现。这种收敛的原因是存在一个基础的严格凸优化问题,其中相关变量被约束在一个线性、仅依赖于架构的流形上。该理论通过在基准数据集上进行的各种模拟得到了验证。缩放和平衡操作完全是局部的,因此在生物和神经形态网络中具有物理可信性。
更新时间: 2024-05-15 20:27:56
领域: cs.NE,cs.AI,cs.LG
Randomized Confidence Bounds for Stochastic Partial Monitoring
The partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback. On each round, a learning agent plays an action while the environment simultaneously chooses an outcome. The agent then observes a feedback signal that is only partially informative about the (unobserved) outcome. The agent leverages the received feedback signals to select actions that minimize the (unobserved) cumulative loss. In contextual PM, the outcomes depend on some side information that is observable by the agent before selecting the action on each round. In this paper, we consider the contextual and non-contextual PM settings with stochastic outcomes. We introduce a new class of PM strategies based on the randomization of deterministic confidence bounds. We also extend regret guarantees to settings where existing stochastic strategies are not applicable. Our experiments show that the proposed RandCBP and RandCBPsidestar strategies have favorable performance against state-of-the-art baselines in multiple PM games. To advocate for the adoption of the PM framework, we design a use case on the real-world problem of monitoring the error rate of any deployed classification system.
Updated: 2024-05-15 20:10:05
标题: 随机部分监测的置信边界随机化
摘要: 部分监控(PM)框架提供了一个理论公式,用于描述具有不完整反馈的顺序学习问题。在每一轮中,学习代理执行一个动作,同时环境选择一个结果。然后,代理观察到一个反馈信号,该信号仅部分地提供有关(未观察到的)结果的信息。代理利用接收到的反馈信号来选择最小化(未观察到的)累积损失的动作。在上下文PM中,结果取决于代理在每一轮选择动作之前可以观察到的一些附加信息。在本文中,我们考虑具有随机结果的上下文和非上下文PM设置。我们介绍了一种基于确定性置信界的随机化的新类PM策略。我们还将遗憾保证扩展到现有随机策略不适用的情况。我们的实验表明,所提出的RandCBP和RandCBPsidestar策略在多个PM游戏中表现优于现有技术基线。为了倡导采用PM框架,我们设计了一个使用案例,用于监控任何部署的分类系统的错误率。
更新时间: 2024-05-15 20:10:05
领域: cs.LG
On-device Online Learning and Semantic Management of TinyML Systems
Recent advances in Tiny Machine Learning (TinyML) empower low-footprint embedded devices for real-time on-device Machine Learning. While many acknowledge the potential benefits of TinyML, its practical implementation presents unique challenges. This study aims to bridge the gap between prototyping single TinyML models and developing reliable TinyML systems in production: (1) Embedded devices operate in dynamically changing conditions. Existing TinyML solutions primarily focus on inference, with models trained offline on powerful machines and deployed as static objects. However, static models may underperform in the real world due to evolving input data distributions. We propose online learning to enable training on constrained devices, adapting local models towards the latest field conditions. (2) Nevertheless, current on-device learning methods struggle with heterogeneous deployment conditions and the scarcity of labeled data when applied across numerous devices. We introduce federated meta-learning incorporating online learning to enhance model generalization, facilitating rapid learning. This approach ensures optimal performance among distributed devices by knowledge sharing. (3) Moreover, TinyML's pivotal advantage is widespread adoption. Embedded devices and TinyML models prioritize extreme efficiency, leading to diverse characteristics ranging from memory and sensors to model architectures. Given their diversity and non-standardized representations, managing these resources becomes challenging as TinyML systems scale up. We present semantic management for the joint management of models and devices at scale. We demonstrate our methods through a basic regression example and then assess them in three real-world TinyML applications: handwritten character image classification, keyword audio classification, and smart building presence detection, confirming our approaches' effectiveness.
Updated: 2024-05-15 20:09:26
标题: 设备上的在线学习和微型机器学习系统的语义管理
摘要: 最近关于微型机器学习(TinyML)的进展使得低占用空间的嵌入式设备能够实时进行设备端机器学习。尽管许多人意识到TinyML的潜在好处,但其实际实施面临着独特的挑战。本研究旨在弥合在原型设计单个TinyML模型和开发可靠的TinyML系统在生产中之间的差距:(1)嵌入式设备在动态变化的条件下运行。现有的TinyML解决方案主要关注推断,其中模型在强大的机器上离线训练并部署为静态对象。然而,由于输入数据分布的演变,静态模型在现实世界中可能表现不佳。我们提出在线学习以实现在受限设备上进行训练,使本地模型适应最新的现场条件。(2)然而,当前的设备端学习方法在异构部署条件和应用于众多设备时的标记数据稀缺方面存在困难。我们引入联邦元学习,将在线学习融入其中以增强模型泛化能力,促进快速学习。这种方法通过知识共享确保分布式设备之间的最佳性能。(3)此外,TinyML的关键优势是被广泛采用。嵌入式设备和TinyML模型优先考虑极端效率,导致从内存和传感器到模型架构等多样化特征。由于它们的多样性和非标准化表示,随着TinyML系统的扩展,管理这些资源变得困难。我们提出了用于在规模上管理模型和设备的语义管理。我们通过一个基本的回归示例演示了我们的方法,然后在三个真实的TinyML应用中评估它们:手写字符图像分类、关键字音频分类和智能建筑存在检测,确认了我们的方法的有效性。
更新时间: 2024-05-15 20:09:26
领域: cs.LG,cs.AI,cs.DB,cs.DC
DC4L: Distribution Shift Recovery via Data-Driven Control for Deep Learning Models
Deep neural networks have repeatedly been shown to be non-robust to the uncertainties of the real world, even to naturally occurring ones. A vast majority of current approaches have focused on data-augmentation methods to expand the range of perturbations that the classifier is exposed to while training. A relatively unexplored avenue that is equally promising involves sanitizing an image as a preprocessing step, depending on the nature of perturbation. In this paper, we propose to use control for learned models to recover from distribution shifts online. Specifically, our method applies a sequence of semantic-preserving transformations to bring the shifted data closer in distribution to the training set, as measured by the Wasserstein distance. Our approach is to 1) formulate the problem of distribution shift recovery as a Markov decision process, which we solve using reinforcement learning, 2) identify a minimum condition on the data for our method to be applied, which we check online using a binary classifier, and 3) employ dimensionality reduction through orthonormal projection to aid in our estimates of the Wasserstein distance. We provide theoretical evidence that orthonormal projection preserves characteristics of the data at the distributional level. We apply our distribution shift recovery approach to the ImageNet-C benchmark for distribution shifts, demonstrating an improvement in average accuracy of up to 14.21% across a variety of state-of-the-art ImageNet classifiers. We further show that our method generalizes to composites of shifts from the ImageNet-C benchmark, achieving improvements in average accuracy of up to 9.81%. Finally, we test our method on CIFAR-100-C and report improvements of up to 8.25%.
Updated: 2024-05-15 19:56:59
标题: DC4L:通过数据驱动控制实现深度学习模型的分布偏移恢复
摘要: 深度神经网络反复显示出对现实世界的不确定性,甚至对自然发生的不确定性也不具有稳健性。目前绝大多数方法都集中在使用数据增强方法来扩展分类器在训练过程中所暴露的扰动范围。一个同样具有潜在前景但相对未被探索的途径是在预处理步骤中对图像进行清理,具体取决于扰动的性质。在本文中,我们提出利用学习模型的控制来在线恢复分布偏移。具体来说,我们的方法应用一系列语义保留的转换,将偏移的数据与训练集的分布更接近,通过Wasserstein距离来衡量。我们的方法是:1)将分布偏移恢复问题表述为马尔可夫决策过程,通过强化学习来解决;2)确定我们的方法适用于数据的最小条件,并通过在线使用二元分类器进行检查;3)通过正交投影进行降维以帮助我们估计Wasserstein距离。我们提供理论证据表明正交投影在分布水平上保持数据特征。我们将我们的分布偏移恢复方法应用于ImageNet-C基准测试,展示了在各种最新的ImageNet分类器中平均准确率提高了高达14.21%。我们进一步展示我们的方法推广到了ImageNet-C基准测试中的偏移组合,实现了高达9.81%的平均准确率提高。最后,我们在CIFAR-100-C上测试我们的方法,并报告了高达8.25%的提高。
更新时间: 2024-05-15 19:56:59
领域: cs.LG,cs.CV
Simulating Policy Impacts: Developing a Generative Scenario Writing Method to Evaluate the Perceived Effects of Regulation
The rapid advancement of AI technologies yields numerous future impacts on individuals and society. Policy-makers are therefore tasked to react quickly and establish policies that mitigate those impacts. However, anticipating the effectiveness of policies is a difficult task, as some impacts might only be observable in the future and respective policies might not be applicable to the future development of AI. In this work we develop a method for using large language models (LLMs) to evaluate the efficacy of a given piece of policy at mitigating specified negative impacts. We do so by using GPT-4 to generate scenarios both pre- and post-introduction of policy and translating these vivid stories into metrics based on human perceptions of impacts. We leverage an already established taxonomy of impacts of generative AI in the media environment to generate a set of scenario pairs both mitigated and non-mitigated by the transparency legislation of Article 50 of the EU AI Act. We then run a user study (n=234) to evaluate these scenarios across four risk-assessment dimensions: severity, plausibility, magnitude, and specificity to vulnerable populations. We find that this transparency legislation is perceived to be effective at mitigating harms in areas such as labor and well-being, but largely ineffective in areas such as social cohesion and security. Through this case study on generative AI harms we demonstrate the efficacy of our method as a tool to iterate on the effectiveness of policy on mitigating various negative impacts. We expect this method to be useful to researchers or other stakeholders who want to brainstorm the potential utility of different pieces of policy or other mitigation strategies.
Updated: 2024-05-15 19:44:54
标题: 模拟政策影响:开发一种生成式场景撰写方法,以评估监管的感知效果
摘要: 人工智能技术的快速发展将对个人和社会产生许多未来影响。因此,政策制定者的任务是迅速做出反应,并制定能够减轻这些影响的政策。然而,预测政策的有效性是一项困难的任务,因为一些影响可能只在未来才能观察到,并且相应的政策可能不适用于人工智能未来发展。在这项工作中,我们开发了一种方法,利用大型语言模型(LLMs)评估给定政策在减轻指定负面影响方面的有效性。我们通过使用GPT-4生成政策引入前后的场景,并将这些生动故事转化为基于人类对影响的感知的度量标准。我们利用已建立的媒体环境中生成式人工智能影响的分类法,生成了一组经由欧盟人工智能法案第50条透明度立法减轻和未减轻的情景对。然后,我们进行了一项用户研究(n=234),评估了这些情景在四个风险评估维度上的情况:严重性、可信度、规模和对弱势人群的特定性。我们发现,这项透明度立法被认为在减轻劳动和福祉等领域的危害方面是有效的,但在社会凝聚力和安全等领域基本无效。通过这个生成式人工智能危害的案例研究,我们展示了我们的方法作为评估政策对减轻各种负面影响效果的工具的有效性。我们希望这种方法对想要探讨不同政策或其他减轻策略的潜在效用的研究人员或其他利益相关者是有用的。
更新时间: 2024-05-15 19:44:54
领域: cs.CL,cs.AI
Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints
Controllable layout generation refers to the process of creating a plausible visual arrangement of elements within a graphic design (e.g., document and web designs) with constraints representing design intentions. Although recent diffusion-based models have achieved state-of-the-art FID scores, they tend to exhibit more pronounced misalignment compared to earlier transformer-based models. In this work, we propose the $\textbf{LA}$yout $\textbf{C}$onstraint diffusion mod$\textbf{E}$l (LACE), a unified model to handle a broad range of layout generation tasks, such as arranging elements with specified attributes and refining or completing a coarse layout design. The model is based on continuous diffusion models. Compared with existing methods that use discrete diffusion models, continuous state-space design can enable the incorporation of differentiable aesthetic constraint functions in training. For conditional generation, we introduce conditions via masked input. Extensive experiment results show that LACE produces high-quality layouts and outperforms existing state-of-the-art baselines.
Updated: 2024-05-15 19:32:58
标题: 朝向通过美学约束的扩散模型实现对齐布局生成
摘要: 可控布局生成是指在图形设计(例如文档和网页设计)中创建元素的合理视觉排列过程,其中约束表示设计意图。虽然最近基于扩散的模型已经实现了最先进的FID分数,但与早期基于变压器的模型相比,它们往往表现出更为明显的错位。在这项工作中,我们提出了LAyout COnstraint diffusion modEl(LACE),这是一个统一的模型,用于处理广泛的布局生成任务,例如根据指定属性排列元素以及完善或完成粗糙布局设计。该模型基于连续扩散模型。与使用离散扩散模型的现有方法相比,连续状态空间设计可以在训练中实现可微的美学约束函数的整合。对于条件生成,我们通过掩码输入引入条件。大量实验结果表明,LACE生成高质量的布局,并优于现有的最先进基线模型。
更新时间: 2024-05-15 19:32:58
领域: cs.CV,cs.LG
LoRA Learns Less and Forgets Less
Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. LoRA saves memory by training only low rank perturbations to selected weight matrices. In this work, we compare the performance of LoRA and full finetuning on two target domains, programming and mathematics. We consider both the instruction finetuning ($\approx$100K prompt-response pairs) and continued pretraining ($\approx$10B unstructured tokens) data regimes. Our results show that, in most settings, LoRA substantially underperforms full finetuning. Nevertheless, LoRA exhibits a desirable form of regularization: it better maintains the base model's performance on tasks outside the target domain. We show that LoRA provides stronger regularization compared to common techniques such as weight decay and dropout; it also helps maintain more diverse generations. We show that full finetuning learns perturbations with a rank that is 10-100X greater than typical LoRA configurations, possibly explaining some of the reported gaps. We conclude by proposing best practices for finetuning with LoRA.
Updated: 2024-05-15 19:27:45
标题: LoRA 学得少,忘得少
摘要: 低秩适应(LoRA)是一种广泛使用的参数高效的大型语言模型微调方法。LoRA通过仅训练选择的权重矩阵的低秩扰动来节省内存。在这项工作中,我们比较了LoRA和完全微调在两个目标领域(编程和数学)上的性能。我们考虑了指令微调(约100K套提示-响应对)和持续预训练(约10B个非结构化标记)数据制度。我们的结果显示,在大多数情况下,LoRA明显表现不佳。然而,LoRA展现出一种理想的正则化形式:它更好地保持了基础模型在目标领域之外任务上的性能。我们展示了LoRA相比于常见技术如权重衰减和丢弃提供了更强的正则化;它还有助于保持更多样化的生成。我们展示了完全微调学习的扰动的秩比典型的LoRA配置高10-100倍,可能解释了一些报道的差距。最后,我们提出了使用LoRA进行微调的最佳实践。
更新时间: 2024-05-15 19:27:45
领域: cs.LG,cs.AI,cs.CL
Improved Baselines with Visual Instruction Tuning
Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks. Our final 13B checkpoint uses merely 1.2M publicly available data, and finishes full training in ~1 day on a single 8-A100 node. We hope this can make state-of-the-art LMM research more accessible. Code and model will be publicly available.
Updated: 2024-05-15 19:22:44
标题: 通过视觉指导调整改进基准线
摘要: 最近,大型多模态模型(LMM)在视觉指导调整方面取得了令人鼓舞的进展。在这篇论文中,我们展示了LLaVA中的全连接视觉-语言跨模态连接器出人意料地强大且高效。通过对LLaVA进行简单的修改,即使用带有MLP投影的CLIP-ViT-L-336px并添加以学术任务为导向的VQA数据,我们建立了更强的基线,在11个基准测试中实现了最先进的结果。我们的最终13B检查点仅使用了120万个公开可用的数据,并在单个8-A100节点上约1天内完成了完整训练。我们希望这能让最先进的LMM研究更易于接触。代码和模型将公开可用。
更新时间: 2024-05-15 19:22:44
领域: cs.CV,cs.AI,cs.CL,cs.LG
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 3,668 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations. RMU reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at https://wmdp.ai
Updated: 2024-05-15 19:16:09
标题: WMDP基准:通过遗忘测量和减少恶意使用
摘要: 白宫关于人工智能的行政命令强调了大型语言模型(LLMs)赋予恶意行为者在开发生物、网络和化学武器方面的风险。为了衡量这些恶意使用的风险,政府机构和主要人工智能实验室正在开发对LLMs中危险能力的评估。然而,目前的评估是私人的,阻碍了进一步研究如何减轻风险。此外,它们只关注了一些高度具体的恶意使用路径。为了填补这些空白,我们公开发布了大规模杀伤性武器代理(WMDP)基准测试,这是一个包含3,668道多项选择题的数据集,用作生物安全、网络安全和化学安全危险知识的代理测量。WMDP是由一群学者和技术顾问联合开发的,并在公开发布之前进行了严格筛选以消除敏感信息。WMDP发挥了两个作用:首先,作为LLMs中危险知识的评估,其次,作为去除此类危险知识的遗忘方法的基准。为了指导遗忘的进展,我们开发了RMU,这是一种基于控制模型表示的最先进的遗忘方法。RMU降低了模型在WMDP上的性能,同时保持了在生物学和计算机科学等领域的一般能力,这表明遗忘可能是减少LLMs恶意使用的一条具体途径。我们在https://wmdp.ai上公开发布我们的基准测试和代码。
更新时间: 2024-05-15 19:16:09
领域: cs.LG,cs.AI,cs.CL,cs.CY
Large-Scale Security Analysis of Real-World Backend Deployments Speaking IoT-Focused Protocols
Internet-of-Things devices, ranging from smart home assistants to health devices, are pervasive: Forecasts estimate their number to reach 29 billion by 2030. Understanding the security of their machine-to-machine communication is crucial. Prior work focused on identifying devices' vulnerabilities or proposed protocol-specific solutions. Instead, in this paper, we investigate the security of backends speaking Internet-of-Things (IoT) protocols at scale, that is, the backbone of the entire IoT ecosystem. We focus on three real-world protocols used by IoT for our large-scale analysis: MQTT, CoAP, and XMPP. We gather a dataset of over 337,000 backends, augment it with geographical and provider data, and perform non-invasive active measurements to investigate three major security threats: information leakage, weak authentication, and denial of service. Our results provide quantitative evidence of a problematic immaturity in the IoT security ecosystem. Among other issues, we find that 9.44% backends expose information, 30.38% CoAP-speaking backends are vulnerable to denial of service attacks, and 99.84% of MQTT-speaking and XMPP-speaking backends use insecure transport protocols (only 0.16% adopt TLS, of which 70.93% adopt a vulnerable version).
Updated: 2024-05-15 19:04:30
标题: 真实世界后端部署的大规模安全分析:讨论以物联网为重点的协议
摘要: 物联网设备,从智能家居助手到健康设备,已经无处不在:预测估计到2030年其数量将达到290亿。了解它们之间的机器对机器通信的安全性至关重要。以往的研究侧重于识别设备的漏洞或提出特定于协议的解决方案。相反,在本文中,我们调查了以规模为特点的后端使用物联网(IoT)协议的安全性,即整个IoT生态系统的基础。 我们关注用于我们的大规模分析的IoT使用的三种现实世界协议:MQTT,CoAP和XMPP。我们收集了超过33.7万个后端的数据集,增加了地理和提供商数据,并进行了非侵入式主动测量以调查三个主要安全威胁:信息泄漏,弱认证和拒绝服务。我们的结果提供了IoT安全生态系统中存在问题的量化证据。在其他问题中,我们发现9.44%的后端暴露信息,30.38%的CoAP后端容易受到拒绝服务攻击,99.84%的MQTT和XMPP后端使用不安全的传输协议(仅有0.16%采用TLS,其中70.93%采用易受攻击的版本)。
更新时间: 2024-05-15 19:04:30
领域: cs.CR,cs.NI
Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning
Two-time-scale optimization is a framework introduced in Zeng et al. (2024) that abstracts a range of policy evaluation and policy optimization problems in reinforcement learning (RL). Akin to bi-level optimization under a particular type of stochastic oracle, the two-time-scale optimization framework has an upper level objective whose gradient evaluation depends on the solution of a lower level problem, which is to find the root of a strongly monotone operator. In this work, we propose a new method for solving two-time-scale optimization that achieves significantly faster convergence than the prior arts. The key idea of our approach is to leverage an averaging step to improve the estimates of the operators in both lower and upper levels before using them to update the decision variables. These additional averaging steps eliminate the direct coupling between the main variables, enabling the accelerated performance of our algorithm. We characterize the finite-time convergence rates of the proposed algorithm under various conditions of the underlying objective function, including strong convexity, convexity, Polyak-Lojasiewicz condition, and general non-convexity. These rates significantly improve over the best-known complexity of the standard two-time-scale stochastic approximation algorithm. When applied to RL, we show how the proposed algorithm specializes to novel online sample-based methods that surpass or match the performance of the existing state of the art. Finally, we support our theoretical results with numerical simulations in RL.
Updated: 2024-05-15 19:03:08
标题: 快速双时间尺度随机梯度方法及其在强化学习中的应用
摘要: 双时间尺度优化是Zeng等人(2024年)引入的一个框架,它抽象了强化学习(RL)中一系列政策评估和政策优化问题。类似于在特定类型的随机oracle下的双层优化,双时间尺度优化框架具有一个上层目标,其梯度评估取决于解决一个强单调算子的根的下层问题。在这项工作中,我们提出了一种新的方法来解决双时间尺度优化问题,其收敛速度显著快于先前的方法。我们方法的关键思想是利用平均步骤来改善下层和上层操作者的估计,然后再使用它们来更新决策变量。这些额外的平均步骤消除了主要变量之间的直接耦合,从而加速了我们算法的性能。我们对所提算法在各种目标函数条件下的有限时间收敛速率进行了表征,包括强凸性、凸性、Polyak-Lojasiewicz条件和一般非凸性。这些速率显著改善了标准双时间尺度随机逼近算法的已知复杂度。当应用于RL时,我们展示了所提算法如何专门化为超越或匹配现有最先进技术表现的新型在线基于样本的方法。最后,我们通过RL中的数值模拟支持我们的理论结果。
更新时间: 2024-05-15 19:03:08
领域: math.OC,cs.LG
Calorimeter shower superresolution
Calorimeter shower simulation is a major bottleneck in the Large Hadron Collider computational pipeline. There have been recent efforts to employ deep-generative surrogate models to overcome this challenge. However, many of best performing models have training and generation times that do not scale well to high-dimensional calorimeter showers. In this work, we introduce SuperCalo, a flow-based superresolution model, and demonstrate that high-dimensional fine-grained calorimeter showers can be quickly upsampled from coarse-grained showers. This novel approach presents a way to reduce computational cost, memory requirements and generation time associated with fast calorimeter simulation models. Additionally, we show that the showers upsampled by SuperCalo possess a high degree of variation. This allows a large number of high-dimensional calorimeter showers to be upsampled from much fewer coarse showers with high-fidelity, which results in additional reduction in generation time.
Updated: 2024-05-15 18:53:45
标题: 量热器淋浴超分辨率
摘要: 卡仪器淋浴模拟是大型强子对撞机计算流程中的一个主要瓶颈。最近,有人开始尝试使用深度生成式替代模型来克服这一挑战。然而,许多表现最好的模型在训练和生成时间上并不能很好地适应高维度卡仪器淋浴。在这项工作中,我们介绍了SuperCalo,一个基于流的超分辨率模型,并展示了高维度细粒度卡仪器淋浴可以从粗粒度淋浴中快速上采样。这种新颖的方法提供了一种降低与快速卡仪器模拟模型相关的计算成本、内存需求和生成时间的方式。此外,我们还展示了由SuperCalo上采样的淋浴具有高度的变化性。这使得大量高维度卡仪器淋浴可以从较少的粗淋浴中进行高保真度上采样,从而进一步减少生成时间。
更新时间: 2024-05-15 18:53:45
领域: physics.ins-det,cs.LG,hep-ex,hep-ph,physics.data-an
Detecting Continuous Integration Skip : A Reinforcement Learning-based Approach
The software industry is experiencing a surge in the adoption of Continuous Integration (CI) practices, both in commercial and open-source environments. CI practices facilitate the seamless integration of code changes by employing automated building and testing processes. Some frameworks, such as Travis CI and GitHub Actions have significantly contributed to simplifying and enhancing the CI process, rendering it more accessible and efficient for development teams. Despite the availability these CI tools , developers continue to encounter difficulties in accurately flagging commits as either suitable for CI execution or as candidates for skipping especially for large projects with many dependencies. Inaccurate flagging of commits can lead to resource-intensive test and build processes, as even minor commits may inadvertently trigger the Continuous Integration process. The problem of detecting CI-skip commits, can be modeled as binary classification task where we decide to either build a commit or to skip it. This study proposes a novel solution that leverages Deep Reinforcement Learning techniques to construct an optimal Decision Tree classifier that addresses the imbalanced nature of the data. We evaluate our solution by running a within and a cross project validation benchmark on diverse range of Open-Source projects hosted on GitHub which showcased superior results when compared with existing state-of-the-art methods.
Updated: 2024-05-15 18:48:57
标题: 检测持续集成跳过:基于强化学习的方法
摘要: 软件行业正在经历持续集成(CI)实践的采用激增,无论是在商业环境还是开源环境中。CI实践通过采用自动化构建和测试流程,促进代码变更的无缝集成。一些框架,如Travis CI和GitHub Actions,显著地简化和增强了CI流程,使其对开发团队更加易于访问和高效。尽管这些CI工具可用,开发人员仍然遇到难以准确标记提交是否适用于CI执行或是否适合跳过的困难,尤其是对于具有许多依赖项的大型项目。提交的不准确标记可能导致资源密集型的测试和构建流程,因为即使是较小的提交也可能无意中触发持续集成流程。检测CI跳过提交的问题可以建模为一个二元分类任务,我们要么构建一个提交,要么跳过它。本研究提出了一种新颖的解决方案,利用深度强化学习技术构建一个优化的决策树分类器,以解决数据不平衡的问题。我们通过在GitHub上托管的各种开源项目上运行内部和跨项目验证基准测试来评估我们的解决方案,与现有的最先进方法相比,显示出更优秀的结果。
更新时间: 2024-05-15 18:48:57
领域: cs.SE,cs.AI
Fairness Without Demographics in Human-Centered Federated Learning
Federated learning (FL) enables collaborative model training while preserving data privacy, making it suitable for decentralized human-centered AI applications. However, a significant research gap remains in ensuring fairness in these systems. Current fairness strategies in FL require knowledge of bias-creating/sensitive attributes, clashing with FL's privacy principles. Moreover, in human-centered datasets, sensitive attributes may remain latent. To tackle these challenges, we present a novel bias mitigation approach inspired by "Fairness without Demographics" in machine learning. The presented approach achieves fairness without needing knowledge of sensitive attributes by minimizing the top eigenvalue of the Hessian matrix during training, ensuring equitable loss landscapes across FL participants. Notably, we introduce a novel FL aggregation scheme that promotes participating models based on error rates and loss landscape curvature attributes, fostering fairness across the FL system. This work represents the first approach to attaining "Fairness without Demographics" in human-centered FL. Through comprehensive evaluation, our approach demonstrates effectiveness in balancing fairness and efficacy across various real-world applications, FL setups, and scenarios involving single and multiple bias-inducing factors, representing a significant advancement in human-centered FL.
Updated: 2024-05-15 18:40:42
标题: 人类中心的联邦学习中无需人口统计数据的公平性
摘要: 联邦学习(FL)能够在保护数据隐私的同时进行合作模型训练,适用于去中心化的以人为中心的AI应用。然而,在这些系统中确保公平性仍存在重大研究空白。当前在FL中的公平性策略需要了解产生偏见/敏感属性,这与FL的隐私原则相冲突。此外,在以人为中心的数据集中,敏感属性可能仍然是潜在的。为了解决这些挑战,我们提出了一种受机器学习中“无需人口统计数据的公平性”启发的新的偏见缓解方法。所提出的方法通过在训练过程中最小化Hessian矩阵的最大特征值来实现公平性,确保FL参与者之间的损失景观是公平的。特别地,我们引入了一种新颖的FL聚合方案,根据错误率和损失景观曲率属性促进参与模型,促进FL系统的公平性。这项工作代表了在以人为中心的FL中实现“无需人口统计数据的公平性”的第一种方法。通过全面评估,我们的方法在平衡公平性和效力方面在各种真实应用、FL设置和涉及单个和多个产生偏见因素的情景中展现出有效性,代表了在以人为中心的FL领域的重大进展。
更新时间: 2024-05-15 18:40:42
领域: cs.LG,cs.AI,cs.DC
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Text-guided image editing is widely needed in daily life, ranging from personal use to professional applications such as Photoshop. However, existing methods are either zero-shot or trained on an automatically synthesized dataset, which contains a high volume of noise. Thus, they still require lots of manual tuning to produce desirable outcomes in practice. To address this issue, we introduce MagicBrush (https://osu-nlp-group.github.io/MagicBrush/), the first large-scale, manually annotated dataset for instruction-guided real image editing that covers diverse scenarios: single-turn, multi-turn, mask-provided, and mask-free editing. MagicBrush comprises over 10K manually annotated triplets (source image, instruction, target image), which supports trainining large-scale text-guided image editing models. We fine-tune InstructPix2Pix on MagicBrush and show that the new model can produce much better images according to human evaluation. We further conduct extensive experiments to evaluate current image editing baselines from multiple dimensions including quantitative, qualitative, and human evaluations. The results reveal the challenging nature of our dataset and the gap between current baselines and real-world editing needs.
Updated: 2024-05-15 18:20:28
标题: MagicBrush:一个用于指导图像编辑的手动注释数据集
摘要: 文本引导的图像编辑在日常生活中被广泛需要,从个人使用到专业应用如Photoshop。然而,现有方法要么是零样本,要么是在自动合成的数据集上训练,该数据集包含大量噪音。因此,在实践中仍然需要大量手动调整才能产生理想的结果。为了解决这个问题,我们介绍了MagicBrush(https://osu-nlp-group.github.io/MagicBrush/),这是第一个大规模手动注释的指导真实图像编辑的数据集,涵盖了多种场景:单转、多转、提供蒙版和无蒙版编辑。MagicBrush包括超过10,000个手动注释的三元组(源图像、指令、目标图像),支持训练大规模文本引导的图像编辑模型。我们在MagicBrush上微调InstructPix2Pix,并展示新模型可以根据人类评估产生更好的图像。我们进一步进行广泛实验,从多个维度包括定量、定性和人类评估来评估当前图像编辑基线。结果揭示了我们数据集的挑战性质以及当前基线和实际编辑需求之间的差距。
更新时间: 2024-05-15 18:20:28
领域: cs.CV,cs.AI,cs.CL
A Mathematical Theory for Learning Semantic Languages by Abstract Learners
Recent advances in Large Language Models (LLMs) have demonstrated the emergence of capabilities (learned skills) when the number of system parameters and the size of training data surpass certain thresholds. The exact mechanisms behind such phenomena are not fully understood and remain a topic of active research. Inspired by the skill-text bipartite graph model proposed by Arora and Goyal for modeling semantic languages, we develop a mathematical theory to explain the emergence of learned skills, taking the learning (or training) process into account. Our approach models the learning process for skills in the skill-text bipartite graph as an iterative decoding process in Low-Density Parity Check (LDPC) codes and Irregular Repetition Slotted ALOHA (IRSA). Using density evolution analysis, we demonstrate the emergence of learned skills when the ratio of the number of training texts to the number of skills exceeds a certain threshold. Our analysis also yields a scaling law for testing errors relative to this ratio. Upon completion of the training, the association of learned skills can also be acquired to form a skill association graph. We use site percolation analysis to derive the conditions for the existence of a giant component in the skill association graph. Our analysis can also be extended to the setting with a hierarchy of skills, where a fine-tuned model is built upon a foundation model. It is also applicable to the setting with multiple classes of skills and texts. As an important application, we propose a method for semantic compression and discuss its connections to semantic communication.
Updated: 2024-05-15 18:05:54
标题: 通过抽象学习者学习语义语言的数学理论
摘要: 最近大型语言模型(LLMs)的进展表明,在系统参数数量和训练数据规模超过一定阈值时,学习技能的能力(学习技能)出现了。这种现象背后的确切机制尚未完全了解,仍然是活跃研究的一个课题。受Arora和Goyal提出的用于建模语义语言的技能-文本二部图模型的启发,我们开发了一个数学理论来解释学习技能的出现,考虑到学习(或训练)过程。我们的方法将技能-文本二部图中的技能学习过程建模为低密度奇偶校验(LDPC)码和不规则重复分时ALOHA(IRSA)中的迭代解码过程。通过密度演化分析,我们展示了当训练文本数量与技能数量的比值超过一定阈值时学习技能的出现。我们的分析还提供了一个与该比值相关的测试错误的缩放定律。在训练完成后,学习技能的关联也可以被获取以形成技能关联图。我们使用站点渗透分析推导了技能关联图中出现巨大组件的条件。我们的分析还可以扩展到具有技能层次结构的设置,其中在基础模型上构建了一个精调模型。它还适用于具有多类技能和文本的设置。作为一个重要应用,我们提出了一种用于语义压缩的方法,并讨论了其与语义通信的联系。
更新时间: 2024-05-15 18:05:54
领域: cs.CL,cs.IT,cs.LG,math.IT
Hoaxpedia: A Unified Wikipedia Hoax Articles Dataset
Hoaxes are a recognised form of disinformation created deliberately, with potential serious implications in the credibility of reference knowledge resources such as Wikipedia. What makes detecting Wikipedia hoaxes hard is that they often are written according to the official style guidelines. In this work, we first provide a systematic analysis of the similarities and discrepancies between legitimate and hoax Wikipedia articles, and introduce Hoaxpedia, a collection of 311 Hoax articles (from existing literature as well as official Wikipedia lists) alongside semantically similar real articles. We report results of binary classification experiments in the task of predicting whether a Wikipedia article is real or hoax, and analyze several settings as well as a range of language models. Our results suggest that detecting deceitful content in Wikipedia based on content alone, despite not having been explored much in the past, is a promising direction.
Updated: 2024-05-15 17:56:25
标题: Hoaxpedia:一个统一的维基百科恶作剧文章数据集
摘要: 骗局是一种故意制造的虚假信息形式,对维基百科等参考知识资源的可信度可能造成严重影响。检测维基百科骗局的困难在于它们通常是按照官方风格指南编写的。在这项工作中,我们首先对合法和骗局维基百科文章之间的相似之处和差异进行了系统分析,并介绍了Hoaxpedia,这是一组包含311篇骗局文章(来自现有文献以及官方维基百科列表)以及语义上类似的真实文章。我们报告了在预测维基百科文章是真实还是骗局的二元分类实验结果,并分析了几种设置以及一系列语言模型。我们的结果表明,基于内容单独检测维基百科中的欺诈性内容,尽管以前并没有受到很多关注,但是是一个有前途的方向。
更新时间: 2024-05-15 17:56:25
领域: cs.CL,cs.AI,cs.LG
Spectral complexity of deep neural networks
It is well-known that randomly initialized, push-forward, fully-connected neural networks weakly converge to isotropic Gaussian processes, in the limit where the width of all layers goes to infinity. In this paper, we propose to use the angular power spectrum of the limiting field to characterize the complexity of the network architecture. In particular, we define sequences of random variables associated with the angular power spectrum, and provide a full characterization of the network complexity in terms of the asymptotic distribution of these sequences as the depth diverges. On this basis, we classify neural networks as low-disorder, sparse, or high-disorder; we show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks. Our theoretical results are also validated by numerical simulations.
Updated: 2024-05-15 17:55:05
标题: 深度神经网络的频谱复杂性
摘要: 这篇论文提出了一个观点,即在所有层的宽度趋于无穷大的极限情况下,随机初始化、向前推进的全连接神经网络会弱收敛到各向同性高斯过程。我们建议使用极限场的角功率谱来表征网络架构的复杂性。具体而言,我们定义了与角功率谱相关的随机变量序列,并提供了随着深度发散这些序列的渐近分布来全面描述网络复杂性。基于这一基础,我们将神经网络分类为低混乱、稀疏或高混乱;我们展示了这种分类如何突出标准激活函数的多个显著特征,特别是ReLU网络的稀疏性质。我们的理论结果也得到了数值模拟的验证。
更新时间: 2024-05-15 17:55:05
领域: stat.ML,cs.LG,math.PR,68T07, 60G60, 33C55, 62M15
SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising
Denoising hyperspectral images (HSIs) is a crucial preprocessing procedure due to the noise originating from intra-imaging mechanisms and environmental factors. Utilizing domain-specific knowledge of HSIs, such as spectral correlation, spatial self-similarity, and spatial-spectral correlation, is essential for deep learning-based denoising. Existing methods are often constrained by running time, space complexity, and computational complexity, employing strategies that explore these priors separately. While these strategies can avoid some redundant information, they inevitably overlook broader and more underlying long-range spatial-spectral information that positively impacts image restoration. This paper proposes a Spatial-Spectral Selective State Space Model-based U-shaped network, termed Spatial-Spectral U-Mamba (SSUMamba), for hyperspectral image denoising. We can obtain complete global spatial-spectral correlation within a module thanks to the linear space complexity in State Space Model (SSM) computations. We introduce a Spatial-Spectral Alternating Scan (SSAS) strategy for HSIs, which helps model the information flow in multiple directions in 3-D HSIs. Experimental results demonstrate that our method outperforms compared methods. The source code is available at https://github.com/lronkitty/SSUMamba.
Updated: 2024-05-15 17:53:48
标题: SSUMamba:用于高光谱图像去噪的空间-光谱选择状态空间模型
摘要: 去噪高光谱图像(HSIs)是一项关键的预处理程序,因为噪声源自图像内部机制和环境因素。利用HSIs的领域特定知识,如光谱相关性、空间自相似性和空间-光谱相关性,对基于深度学习的去噪至关重要。现有方法通常受到运行时间、空间复杂度和计算复杂度的限制,采用探索这些先验的策略。虽然这些策略可以避免一些冗余信息,但它们不可避免地忽视了更广泛和更基础的长程空间-光谱信息,这对图像恢复有积极影响。本文提出了一种基于空间-光谱选择性状态空间模型的U-shaped网络,称为Spatial-Spectral U-Mamba(SSUMamba),用于高光谱图像去噪。由于在状态空间模型(SSM)计算中的线性空间复杂性,我们可以在模块内获得完全的全局空间-光谱相关性。我们引入了一种空间-光谱交替扫描(SSAS)策略,用于HSIs,有助于在3-D HSIs中模拟信息流动。实验结果表明,我们的方法优于比较方法。源代码可在https://github.com/lronkitty/SSUMamba上获得。
更新时间: 2024-05-15 17:53:48
领域: eess.IV,cs.CV,cs.LG
Prospects of Privacy Advantage in Quantum Machine Learning
Ensuring data privacy in machine learning models is critical, particularly in distributed settings where model gradients are typically shared among multiple parties to allow collaborative learning. Motivated by the increasing success of recovering input data from the gradients of classical models, this study addresses a central question: How hard is it to recover the input data from the gradients of quantum machine learning models? Focusing on variational quantum circuits (VQC) as learning models, we uncover the crucial role played by the dynamical Lie algebra (DLA) of the VQC ansatz in determining privacy vulnerabilities. While the DLA has previously been linked to the classical simulatability and trainability of VQC models, this work, for the first time, establishes its connection to the privacy of VQC models. In particular, we show that properties conducive to the trainability of VQCs, such as a polynomial-sized DLA, also facilitate the extraction of detailed snapshots of the input. We term this a weak privacy breach, as the snapshots enable training VQC models for distinct learning tasks without direct access to the original input. Further, we investigate the conditions for a strong privacy breach where the original input data can be recovered from these snapshots by classical or quantum-assisted polynomial time methods. We establish conditions on the encoding map such as classical simulatability, overlap with DLA basis, and its Fourier frequency characteristics that enable such a privacy breach of VQC models. Our findings thus play a crucial role in detailing the prospects of quantum privacy advantage by guiding the requirements for designing quantum machine learning models that balance trainability with robust privacy protection.
Updated: 2024-05-15 17:46:34
标题: 量子机器学习中隐私优势的前景
摘要: 确保机器学习模型中的数据隐私至关重要,特别是在分布式设置中,模型梯度通常在多方之间共享,以实现协作学习。受到从经典模型的梯度中恢复输入数据取得成功的越来越多的启发,本研究探讨了一个核心问题:从量子机器学习模型的梯度中恢复输入数据有多困难?以变分量子电路(VQC)作为学习模型,我们揭示了VQC谱系的动力李代数(DLA)在确定隐私漏洞方面的关键作用。尽管DLA先前被证明与VQC模型的经典模拟和可训练性有关,但本研究首次建立了其与VQC模型隐私性的联系。具体而言,我们展示了有利于VQC可训练性的特性,如多项式大小的DLA,也有利于提取输入的详细快照。我们将这称为弱隐私泄漏,因为这些快照使得能够在没有直接访问原始输入的情况下为不同的学习任务训练VQC模型成为可能。此外,我们调查了强隐私泄漏的条件,其中通过经典或量子辅助的多项式时间方法可以从这些快照中恢复原始输入数据。我们建立了编码映射的条件,如经典模拟性、与DLA基底的重叠以及其傅里叶频率特性,这些条件使得VQC模型存在这种隐私泄漏的可能性。因此,我们的发现在详细介绍量子隐私优势的前景方面发挥了关键作用,通过指导设计既具有可训练性又具有强大隐私保护的量子机器学习模型的要求。
更新时间: 2024-05-15 17:46:34
领域: quant-ph,cs.LG
Wasserstein Gradient Boosting: A General Framework with Applications to Posterior Regression
Gradient boosting is a sequential ensemble method that fits a new base learner to the gradient of the remaining loss at each step. We propose a novel family of gradient boosting, Wasserstein gradient boosting, which fits a new base learner to an exactly or approximately available Wasserstein gradient of a loss functional on the space of probability distributions. Wasserstein gradient boosting returns a set of particles that approximates a target probability distribution assigned at each input. In probabilistic prediction, a parametric probability distribution is often specified on the space of output variables, and a point estimate of the output-distribution parameter is produced for each input by a model. Our main application of Wasserstein gradient boosting is a novel distributional estimate of the output-distribution parameter, which approximates the posterior distribution over the output-distribution parameter determined pointwise at each data point. We empirically demonstrate the superior performance of the probabilistic prediction by Wasserstein gradient boosting in comparison with various existing methods.
Updated: 2024-05-15 17:45:59
标题: Wasserstein梯度提升:一个通用框架及其在后验回归中的应用
摘要: Gradient boosting 是一种顺序集成方法,它在每一步将一个新的基学习器拟合到剩余损失的梯度上。我们提出了一种新颖的梯度提升方法,Wasserstein 梯度提升,它将一个新的基学习器拟合到一个在概率分布空间上的损失函数的确切或近似可用的 Wasserstein 梯度上。Wasserstein 梯度提升返回一个粒子集合,用于近似每个输入处分配的目标概率分布。在概率预测中,通常会在输出变量空间上指定一个参数概率分布,并通过模型为每个输入产生一个输出分布参数的点估计。我们对 Wasserstein 梯度提升的主要应用是对输出分布参数的新颖分布估计,该估计在每个数据点处以点方式确定输出分布参数的后验分布。我们通过实证研究展示了与各种现有方法相比,Wasserstein 梯度提升在概率预测中的优越性能。
更新时间: 2024-05-15 17:45:59
领域: stat.ME,cs.LG,stat.ML
Restoring balance: principled under/oversampling of data for optimal classification
Class imbalance in real-world data poses a common bottleneck for machine learning tasks, since achieving good generalization on under-represented examples is often challenging. Mitigation strategies, such as under or oversampling the data depending on their abundances, are routinely proposed and tested empirically, but how they should adapt to the data statistics remains poorly understood. In this work, we determine exact analytical expressions of the generalization curves in the high-dimensional regime for linear classifiers (Support Vector Machines). We also provide a sharp prediction of the effects of under/oversampling strategies depending on class imbalance, first and second moments of the data, and the metrics of performance considered. We show that mixed strategies involving under and oversampling of data lead to performance improvement. Through numerical experiments, we show the relevance of our theoretical predictions on real datasets, on deeper architectures and with sampling strategies based on unsupervised probabilistic models.
Updated: 2024-05-15 17:45:34
标题: 恢复平衡:基于原则的数据欠采样/过采样以实现最佳分类
摘要: Real-world数据中的类别不平衡对机器学习任务构成了一个常见的瓶颈,因为在代表性不足的示例上实现良好的泛化通常是具有挑战性的。缓解策略,比如根据它们的丰度对数据进行欠采样或过采样,经常被提出并在经验上进行测试,但它们如何适应数据统计仍然知之甚少。在这项工作中,我们确定了线性分类器(支持向量机)在高维情况下的泛化曲线的精确解析表达式。我们还对根据类别不平衡、数据的第一和第二矩以及考虑的性能指标的欠采样/过采样策略的效果进行了明确预测。我们表明,涉及数据的欠采样和过采样的混合策略会导致性能改进。通过数值实验,我们展示了我们在真实数据集上、深层结构和基于无监督概率模型的采样策略上的理论预测的相关性。
更新时间: 2024-05-15 17:45:34
领域: cond-mat.dis-nn,cs.LG
Energy-Efficient Sleep Mode Optimization of 5G mmWave Networks Using Deep Contextual MAB
Millimeter-wave (mmWave) networks, integral to 5G communication, offer a vast spectrum that addresses the issue of spectrum scarcity and enhances peak rate and capacity. However, their dense deployment, necessary to counteract propagation losses, leads to high power consumption. An effective strategy to reduce this energy consumption in mobile networks is the sleep mode optimization (SMO) of base stations (BSs). In this paper, we propose a novel SMO approach for mmWave BSs in a 3D urban environment. This approach, which incorporates a neural network (NN) based contextual multi-armed bandit (C-MAB) with an epsilon decay algorithm, accommodates the dynamic and diverse traffic of user equipment (UE) by clustering the UEs in their respective tracking areas (TAs). Our strategy includes beamforming, which helps reduce energy consumption from the UE side, while SMO minimizes energy use from the BS perspective. We extended our investigation to include Random, Epsilon Greedy, Upper Confidence Bound (UCB), and Load Based sleep mode (SM) strategies. We compared the performance of our proposed C-MAB based SM algorithm with those of All On and other alternative approaches. Simulation results show that our proposed method outperforms all other SM strategies in terms of the $10^{th}$ percentile of user rate and average throughput while demonstrating comparable average throughput to the All On approach. Importantly, it outperforms all approaches in terms of energy efficiency (EE).
Updated: 2024-05-15 17:37:28
标题: 使用深度上下文多臂老虎机的5G毫米波网络节能睡眠模式优化
摘要: 毫米波(mmWave)网络是5G通信的重要组成部分,提供了广阔的频谱,解决了频谱稀缺问题,提高了峰值速率和容量。然而,为了抵消传播损耗,它们的密集部署导致高能耗。在移动网络中降低这种能耗的有效策略是对基站(BS)进行睡眠模式优化(SMO)。本文提出了一种新颖的针对3D城市环境中的mmWave基站的SMO方法。该方法结合了基于神经网络(NN)的上下文多臂赌博机(C-MAB)和ε衰减算法,通过将用户设备(UE)聚类到各自的跟踪区域(TAs)来适应动态和多样化的流量。我们的策略包括波束成形,有助于减少UE方面的能耗,而SMO则从BS角度最小化能耗。我们将随机、ε贪心、置信上界(UCB)和基于负载的睡眠模式(SM)策略纳入研究范围。我们比较了我们提出的基于C-MAB的SM算法与全开和其他替代方法的性能。模拟结果显示,我们提出的方法在用户速率的第10百分位和平均吞吐量方面优于所有其他SM策略,同时在EE方面表现出与全开方法可比的平均吞吐量。重要的是,在能效方面超越了所有其他方法。
更新时间: 2024-05-15 17:37:28
领域: eess.SP,cs.AI
Improved classical shadows from local symmetries in the Schur basis
We study the sample complexity of the classical shadows task: what is the fewest number of copies of an unknown state you need to measure to predict expected values with respect to some class of observables? Large joint measurements are likely required in order to minimize sample complexity, but previous joint measurement protocols only work when the unknown state is pure. We present the first joint measurement protocol for classical shadows whose sample complexity scales with the rank of the unknown state. In particular we prove $\mathcal O(\sqrt{rB}/\epsilon^2)$ samples suffice, where $r$ is the rank of the state, $B$ is a bound on the squared Frobenius norm of the observables, and $\epsilon$ is the target accuracy. In the low-rank regime, this is a nearly quadratic advantage over traditional approaches that use single-copy measurements. We present several intermediate results that may be of independent interest: a solution to a new formulation of classical shadows that captures functions of non-identical input states; a generalization of a ``nice'' Schur basis used for optimal qubit purification and quantum majority vote; and a measurement strategy that allows us to use local symmetries in the Schur basis to avoid intractable Weingarten calculations in the analysis.
Updated: 2024-05-15 17:33:10
标题: 在Schur基中来自局部对称性的改进的经典阴影
摘要: 我们研究了经典阴影任务的样本复杂度:对于某些可观测性类别,您需要测量未知状态的最少副本数量来预测期望值是多少?为了最小化样本复杂度,可能需要进行大规模的联合测量,但是先前的联合测量协议只有在未知状态是纯净的情况下才有效。我们提出了第一个经典阴影的联合测量协议,其样本复杂度与未知状态的秩成比例。具体而言,我们证明了 $\mathcal O(\sqrt{rB}/\epsilon^2)$ 个样本就足够,其中 $r$ 是状态的秩,$B$ 是可观测性的平方弗罗贝尼乌斯范数的上界,$\epsilon$ 是目标精度。在低秩情况下,这几乎是传统方法使用单个副本测量的平方优势。 我们提出了几个可能具有独立兴趣的中间结果:解决了一个新的经典阴影表述问题,其中捕捉了非相同输入状态的函数;一个用于最佳量子比特纯化和量子多数投票的“好”舒尔基的概括;以及一种测量策略,使我们能够利用舒尔基中的局部对称性来避免在分析中不可解的Weingarten计算。
更新时间: 2024-05-15 17:33:10
领域: quant-ph,cs.DS,cs.IT,cs.LG,math.IT
Comparative Analysis of Predicting Subsequent Steps in Hénon Map
This paper explores the prediction of subsequent steps in H\'enon Map using various machine learning techniques. The H\'enon map, well known for its chaotic behaviour, finds applications in various fields including cryptography, image encryption, and pattern recognition. Machine learning methods, particularly deep learning, are increasingly essential for understanding and predicting chaotic phenomena. This study evaluates the performance of different machine learning models including Random Forest, Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) networks, Support Vector Machines (SVM), and Feed Forward Neural Networks (FNN) in predicting the evolution of the H\'enon map. Results indicate that LSTM network demonstrate superior predictive accuracy, particularly in extreme event prediction. Furthermore, a comparison between LSTM and FNN models reveals the LSTM's advantage, especially for longer prediction horizons and larger datasets. This research underscores the significance of machine learning in elucidating chaotic dynamics and highlights the importance of model selection and dataset size in forecasting subsequent steps in chaotic systems.
Updated: 2024-05-15 17:32:31
标题: Hénon映射中预测后续步骤的比较分析
摘要: 本文探讨了使用各种机器学习技术预测H\'enon Map中后续步骤的方法。H\'enon Map以其混沌行为而闻名,在密码学、图像加密和模式识别等各个领域都有应用。机器学习方法,特别是深度学习,对于理解和预测混沌现象变得越来越重要。本研究评估了不同机器学习模型(包括随机森林、循环神经网络(RNN)、长短期记忆(LSTM)网络、支持向量机(SVM)和前馈神经网络(FNN))在预测H\'enon Map演变中的表现。结果表明,LSTM网络展现出卓越的预测准确性,特别是在极端事件预测方面。此外,LSTM和FNN模型的比较显示了LSTM的优势,尤其是在更长的预测时间跨度和更大的数据集上。这项研究强调了机器学习在阐明混沌动态中的重要性,并突显了模型选择和数据集大小在预测混沌系统后续步骤中的重要性。
更新时间: 2024-05-15 17:32:31
领域: cs.LG,nlin.CD
ContourCraft: Learning to Resolve Intersections in Neural Multi-Garment Simulations
Learning-based approaches to cloth simulation have started to show their potential in recent years. However, handling collisions and intersections in neural simulations remains a largely unsolved problem. In this work, we present \moniker{}, a learning-based solution for handling intersections in neural cloth simulations. Unlike conventional approaches that critically rely on intersection-free inputs, \moniker{} robustly recovers from intersections introduced through missed collisions, self-penetrating bodies, or errors in manually designed multi-layer outfits. The technical core of \moniker{} is a novel intersection contour loss that penalizes interpenetrations and encourages rapid resolution thereof. We integrate our intersection loss with a collision-avoiding repulsion objective into a neural cloth simulation method based on graph neural networks (GNNs). We demonstrate our method's ability across a challenging set of diverse multi-layer outfits under dynamic human motions. Our extensive analysis indicates that \moniker{} significantly improves collision handling for learned simulation and produces visually compelling results.
Updated: 2024-05-15 17:25:59
标题: ContourCraft:学习在神经网络多服装模拟中解决交叉点
摘要: 学习驱动的布料模拟方法近年来开始展现出潜力。然而,在神经网络模拟中处理碰撞和相交仍然是一个尚未解决的问题。在这项工作中,我们提出了\moniker{},这是一个处理神经网络布料模拟中相交的学习驱动解决方案。与传统方法不同,\moniker{} 可以从错过的碰撞、自相交的物体或手动设计的多层服装中引入的相交中稳健地恢复。\moniker{} 的技术核心是一种新颖的相交轮廓损失函数,惩罚相互穿插并鼓励快速解决。我们将我们的相交损失与碰撞避免的排斥目标集成到基于图神经网络(GNNs)的神经布料模拟方法中。我们展示了我们的方法在动态人体运动下的一组具有挑战性的多层服装中的能力。我们的广泛分析表明,\moniker{} 显著改善了学习模拟的碰撞处理,并产生了视觉上引人注目的结果。
更新时间: 2024-05-15 17:25:59
领域: cs.GR,cs.LG
Towards a fully declarative neuro-symbolic language
Neuro-symbolic systems (NeSy), which claim to combine the best of both learning and reasoning capabilities of artificial intelligence, are missing a core property of reasoning systems: Declarativeness. The lack of declarativeness is caused by the functional nature of neural predicates inherited from neural networks. We propose and implement a general framework for fully declarative neural predicates, which hence extends to fully declarative NeSy frameworks. We first show that the declarative extension preserves the learning and reasoning capabilities while being able to answer arbitrary queries while only being trained on a single query type.
Updated: 2024-05-15 17:24:34
标题: 走向完全声明性的神经符号语言
摘要: 神经符号系统(NeSy)声称结合了人工智能学习和推理能力的最佳部分,但缺少推理系统的核心属性:声明性。缺乏声明性是由于从神经网络继承的神经谓词的功能性特性。我们提出并实现了一个通用框架,用于完全声明性的神经谓词,因此扩展到完全声明性的NeSy框架。我们首先展示声明性扩展保留了学习和推理能力,同时能够回答任意查询,而仅在单个查询类型上接受训练。
更新时间: 2024-05-15 17:24:34
领域: cs.AI
Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models
The ability to build and leverage world models is essential for a general-purpose AI agent. Testing such capabilities is hard, in part because the building blocks of world models are ill-defined. We present Elements of World Knowledge (EWOK), a framework for evaluating world modeling in language models by testing their ability to use knowledge of a concept to match a target text with a plausible/implausible context. EWOK targets specific concepts from multiple knowledge domains known to be vital for world modeling in humans. Domains range from social interactions (help/hinder) to spatial relations (left/right). Both, contexts and targets are minimal pairs. Objects, agents, and locations in the items can be flexibly filled in enabling easy generation of multiple controlled datasets. We then introduce EWOK-CORE-1.0, a dataset of 4,374 items covering 11 world knowledge domains. We evaluate 20 openweights large language models (1.3B--70B parameters) across a battery of evaluation paradigms along with a human norming study comprising 12,480 measurements. The overall performance of all tested models is worse than human performance, with results varying drastically across domains. These data highlight simple cases where even large models fail and present rich avenues for targeted research on LLM world modeling capabilities.
Updated: 2024-05-15 17:19:42
标题: 世界知识元素(EWOK):一个受认知启发的框架,用于评估语言模型中的基础世界知识
摘要: 建立和利用世界模型的能力对于一般用途的人工智能代理至关重要。测试这种能力很困难,部分原因是世界模型的基本构件定义不清晰。我们提出了世界知识要素(EWOK),这是一个评估语言模型中世界建模能力的框架,通过测试它们利用概念知识来匹配目标文本和可能/不可能的背景。EWOK针对多个知识领域中人类世界建模中至关重要的特定概念。这些领域包括社交互动(帮助/阻碍)和空间关系(左/右)。背景和目标都是最小对。项中的对象、代理和位置可以灵活填充,以便轻松生成多个受控数据集。然后我们介绍了EWOK-CORE-1.0,这是一个包含11个世界知识领域的4,374个项目的数据集。我们评估了20个开放权重的大型语言模型(1.3B-70B参数)在一系列评估范式中的表现,以及一个人类规范研究,包括12,480个测量。所有测试模型的整体表现都比人类表现差,结果在不同领域之间变化很大。这些数据突出了即使是大型模型也会失败的简单案例,并提供了丰富的研究途径,以便针对LLM世界建模能力进行有针对性的研究。
更新时间: 2024-05-15 17:19:42
领域: cs.CL,cs.AI,cs.LG
Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis
Many algorithms have been recently proposed for causal machine learning. Yet, there is little to no theory on their quality, especially considering finite samples. In this work, we propose a theory based on generalization bounds that provides such guarantees. By introducing a novel change-of-measure inequality, we are able to tightly bound the model loss in terms of the deviation of the treatment propensities over the population, which we show can be empirically limited. Our theory is fully rigorous and holds even in the face of hidden confounding and violations of positivity. We demonstrate our bounds on semi-synthetic and real data, showcasing their remarkable tightness and practical utility.
Updated: 2024-05-15 17:17:27
标题: 因果回归的泛化界限:见解、保证和敏感性分析
摘要: 最近已经提出了许多用于因果机器学习的算法。然而,关于它们的质量几乎没有理论,特别是考虑到有限样本。在这项工作中,我们提出了一种基于泛化界限的理论,为此提供了保证。通过引入一种新的测度变换不等式,我们能够紧密地限制模型损失,以治疗倾向在人口中的偏差为基础,我们证明这些偏差在实验中是有限的。我们的理论是完全严格的,即使在存在隐藏混淆和违反正性的情况下也成立。我们在半合成和真实数据上展示了我们的界限,展示了它们的显著紧密度和实际效用。
更新时间: 2024-05-15 17:17:27
领域: stat.ML,cs.LG
A Reinforcement Learning Approach to Dairy Farm Battery Management using Q Learning
Dairy farming consumes a significant amount of energy, making it an energy-intensive sector within agriculture. Integrating renewable energy generation into dairy farming could help address this challenge. Effective battery management is important for integrating renewable energy generation. Managing battery charging and discharging poses significant challenges because of fluctuations in electrical consumption, the intermittent nature of renewable energy generation, and fluctuations in energy prices. Artificial Intelligence (AI) has the potential to significantly improve the use of renewable energy in dairy farming, however, there is limited research conducted in this particular domain. This research considers Ireland as a case study as it works towards attaining its 2030 energy strategy centered on the utilization of renewable sources. This study proposes a Q-learning-based algorithm for scheduling battery charging and discharging in a dairy farm setting. This research also explores the effect of the proposed algorithm by adding wind generation data and considering additional case studies. The proposed algorithm reduces the cost of imported electricity from the grid by 13.41%, peak demand by 2%, and 24.49% when utilizing wind generation. These results underline how reinforcement learning is highly effective in managing batteries in the dairy farming sector.
Updated: 2024-05-15 17:11:35
标题: 使用Q学习的强化学习方法管理奶牛场电池
摘要: 乳业耗能量大,使其成为农业中一个能源密集型行业。将可再生能源发电整合到乳业中有助于解决这一挑战。有效的电池管理对于整合可再生能源发电至关重要。管理电池充电和放电面临重大挑战,因为电力消耗波动大,可再生能源发电间歇性,能源价格波动。人工智能(AI)有潜力显著提高乳业中可再生能源的利用,但在这个特定领域进行的研究有限。这项研究以爱尔兰作为案例研究,因为该国正在努力实现其以可再生能源利用为中心的2030年能源战略。本研究提出了一种基于Q学习的算法,用于安排乳业场景中的电池充电和放电。该研究还通过添加风力发电数据和考虑额外案例研究来探讨所提算法的影响。所提算法将从电网导入的电力成本降低了13.41%,峰值需求降低了2%,利用风力发电时降低了24.49%。这些结果彰显了强化学习在乳业领域管理电池方面的高效性。
更新时间: 2024-05-15 17:11:35
领域: cs.LG,cs.AI
Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck
Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that the encoded features can generalize to domain-shifted data and detect semanticshifted data, while remaining compact for transmission. In this paper, we propose a novel approach based on the information bottleneck (IB) principle and invariant risk minimization (IRM) framework. The proposed method aims to extract compact and informative features that possess high capability for effective domain-shift generalization and accurate semantic-shift detection without any knowledge of the test data during training. Specifically, we propose an invariant feature encoding approach based on the IB principle and IRM framework for domainshift generalization, which aims to find the causal relationship between the input data and task result by minimizing the complexity and domain dependence of the encoded feature. Furthermore, we enhance the task-oriented communication with the label-dependent feature encoding approach for semanticshift detection which achieves joint gains in IB optimization and detection performance. To avoid the intractable computation of the IB-based objective, we leverage variational approximation to derive a tractable upper bound for optimization. Extensive simulation results on image classification tasks demonstrate that the proposed scheme outperforms state-of-the-art approaches and achieves a better rate-distortion tradeoff.
Updated: 2024-05-15 17:07:55
标题: 解决面向任务的通信中的分布转移问题:信息瓶颈方法
摘要: 任务导向通信旨在提取和传输与任务相关的信息,以显著减少通信开销和传输延迟。然而,训练和测试数据之间的不可预测的分布转变,包括领域转变和语义转变,可能会显著削弱系统性能。为了解决这些挑战,关键是要确保编码特征能够推广到领域转变的数据并检测语义转变的数据,同时保持紧凑以便传输。在本文中,我们提出了一种基于信息瓶颈(IB)原则和不变风险最小化(IRM)框架的新方法。所提出的方法旨在提取紧凑且信息丰富的特征,具有高效的领域转变概括能力和准确的语义转变检测能力,而在训练期间不需要任何关于测试数据的知识。具体地,我们提出了一种基于IB原则和IRM框架的不变特征编码方法用于领域转变概括,旨在通过最小化编码特征的复杂性和领域依赖性来找到输入数据和任务结果之间的因果关系。此外,我们利用依赖于标签的特征编码方法增强了面向任务的通信,用于语义转变检测,实现了IB优化和检测性能的联合增益。为了避免基于IB的目标的难以计算,我们利用变分近似推导出可行的优化上界。对图像分类任务的广泛仿真结果表明,所提出的方案优于现有方法,并实现更好的速率失真折衷。
更新时间: 2024-05-15 17:07:55
领域: eess.SP,cs.IT,cs.LG,math.IT
Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming
This study evaluates the performance of Recurrent Neural Network (RNN) and Transformer in replicating cross-language structural priming: a key indicator of abstract grammatical representations in human language processing. Focusing on Chinese-English priming, which involves two typologically distinct languages, we examine how these models handle the robust phenomenon of structural priming, where exposure to a particular sentence structure increases the likelihood of selecting a similar structure subsequently. Additionally, we utilize large language models (LLM) to measure the cross-lingual structural priming effect. Our findings indicate that Transformer outperform RNN in generating primed sentence structures, challenging the conventional belief that human sentence processing primarily involves recurrent and immediate processing and suggesting a role for cue-based retrieval mechanisms. Overall, this work contributes to our understanding of how computational models may reflect human cognitive processes in multilingual contexts.
Updated: 2024-05-15 17:01:02
标题: 模拟双语句子处理:评估用于跨语言结构启动的循环神经网络和Transformer架构
摘要: 这项研究评估了循环神经网络(RNN)和Transformer在复制跨语言结构启动中的表现:这是人类语言处理中抽象语法表示的一个关键指标。 着眀于涉及两种类型不同语言的中英文启动,我们研究了这些模型如何处理结构启动这一强大现象,其中暴露于特定句子结构增加了随后选择类似结构的可能性。此外,我们利用大型语言模型(LLM)来衡量跨语言结构启动效应。我们的发现表明,Transformer在生成启动的句子结构方面优于RNN,挑战了人类句子处理主要涉及循环和即时处理的传统信念,并暗示了基于线索的检索机制的作用。总体来说,这项工作有助于我们了解计算模型在多语言环境中如何反映人类认知过程。
更新时间: 2024-05-15 17:01:02
领域: cs.CL,cs.LG
QueryNER: Segmentation of E-commerce Queries
We present QueryNER, a manually-annotated dataset and accompanying model for e-commerce query segmentation. Prior work in sequence labeling for e-commerce has largely addressed aspect-value extraction which focuses on extracting portions of a product title or query for narrowly defined aspects. Our work instead focuses on the goal of dividing a query into meaningful chunks with broadly applicable types. We report baseline tagging results and conduct experiments comparing token and entity dropping for null and low recall query recovery. Challenging test sets are created using automatic transformations and show how simple data augmentation techniques can make the models more robust to noise. We make the QueryNER dataset publicly available.
Updated: 2024-05-15 16:58:35
标题: QueryNER:电子商务查询的分割
摘要: 我们提出了QueryNER,这是一个手动注释的数据集及其相关模型,用于电子商务查询分割。先前在电子商务中的序列标注工作主要集中在方面-值提取上,侧重于提取产品标题或查询的部分以获得狭义方面。相比之下,我们的工作侧重于将查询分割为具有广泛适用类型的有意义的块。我们报告基线标记结果,并进行实验,比较对于空和低召回查询恢复的标记和实体删除。我们使用自动转换创建具有挑战性的测试集,并展示简单的数据增强技术如何使模型更能抵抗噪声。我们公开提供QueryNER数据集。
更新时间: 2024-05-15 16:58:35
领域: cs.CL,cs.AI
Importance of realism in procedurally-generated synthetic images for deep learning: case studies in maize and canola
Artificial neural networks are often used to identify features of crop plants. However, training their models requires many annotated images, which can be expensive and time-consuming to acquire. Procedural models of plants, such as those developed with Lindenmayer-systems (L-systems) can be created to produce visually realistic simulations, and hence images of plant simulations, where annotations are implicitly known. These synthetic images can either augment or completely replace real images in training neural networks for phenotyping tasks. In this paper, we systematically vary amounts of real and synthetic images used for training in both maize and canola to better understand situations where synthetic images generated from L-systems can help prediction on real images. This work also explores the degree to which realism in the synthetic images improves prediction. We have five different variants of a procedural canola model (these variants were created by tuning the realism while using calibration), and the deep learning results showed how drastically these results improve as the canola synthetic images are made to be more realistic. Furthermore, we see how neural network predictions can be used to help calibrate L-systems themselves, creating a feedback loop.
Updated: 2024-05-15 16:55:03
标题: 深度学习中程序生成合成图像的真实性重要性:玉米和油菜案例研究
摘要: 人工神经网络经常用于识别作物植物的特征。然而,训练它们的模型需要许多带有注释的图像,这可能是昂贵且耗时的。使用Lindenmayer系统(L系统)开发的植物程序模型可以创建视觉逼真的模拟,并因此生成植物模拟图像,其中注释是隐含已知的。这些合成图像可以用于在表型学任务中训练神经网络,可以增加或完全替代真实图像。在本文中,我们系统地变化了在玉米和油菜中用于训练的真实和合成图像的数量,以更好地了解使用从L系统生成的合成图像有助于在真实图像上进行预测的情况。这项工作还探讨了合成图像的逼真程度如何改善预测结果。我们有五种不同的油菜程序模型变体(这些变体是通过调整逼真度同时使用校准创建的),深度学习结果显示,随着油菜合成图像变得更加逼真,这些结果的显著改善。此外,我们看到神经网络预测如何帮助校准L系统本身,从而创建一个反馈循环。
更新时间: 2024-05-15 16:55:03
领域: cs.CV,cs.LG
ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata
We introduce ParaNames, a massively multilingual parallel name resource consisting of 140 million names spanning over 400 languages. Names are provided for 16.8 million entities, and each entity is mapped from a complex type hierarchy to a standard type (PER/LOC/ORG). Using Wikidata as a source, we create the largest resource of this type to date. We describe our approach to filtering and standardizing the data to provide the best quality possible. ParaNames is useful for multilingual language processing, both in defining tasks for name translation/transliteration and as supplementary data for tasks such as named entity recognition and linking. We demonstrate the usefulness of ParaNames on two tasks. First, we perform canonical name translation between English and 17 other languages. Second, we use it as a gazetteer for multilingual named entity recognition, obtaining performance improvements on all 10 languages evaluated.
Updated: 2024-05-15 16:44:54
标题: ParaNames 1.0: 使用Wikidata为400多种语言创建实体名称语料库
摘要: 我们介绍了ParaNames,这是一个包含1.4亿个名字的大规模多语言平行命名资源,涵盖了超过400种语言。这些名字为1680万个实体提供了命名,并且每个实体都被映射到一个标准类型(PER/LOC/ORG)中,从一个复杂的类型层次结构中。使用Wikidata作为数据源,我们创建了迄今为止最大的资源。我们描述了我们过滤和标准化数据的方法,以提供可能的最佳质量。ParaNames对多语言语言处理非常有用,既可以定义名称翻译/音译任务,也可以作为命名实体识别和链接等任务的补充数据。我们展示了ParaNames在两个任务上的用途。首先,我们在英语和其他17种语言之间执行规范名称翻译。其次,我们将其用作多语言命名实体识别的地名词典,在评估的所有10种语言中获得了性能改进。
更新时间: 2024-05-15 16:44:54
领域: cs.CL,cs.AI
Constrained Learning for Causal Inference and Semiparametric Statistics
Causal estimation (e.g. of the average treatment effect) requires estimating complex nuisance parameters (e.g. outcome models). To adjust for errors in nuisance parameter estimation, we present a novel correction method that solves for the best plug-in estimator under the constraint that the first-order error of the estimator with respect to the nuisance parameter estimate is zero. Our constrained learning framework provides a unifying perspective to prominent first-order correction approaches including debiasing (a.k.a. augmented inverse probability weighting) and targeting (a.k.a. targeted maximum likelihood estimation). Our semiparametric inference approach, which we call the "C-Learner", can be implemented with modern machine learning methods such as neural networks and tree ensembles, and enjoys standard guarantees like semiparametric efficiency and double robustness. Empirically, we demonstrate our approach on several datasets, including those with text features that require fine-tuning language models. We observe the C-Learner matches or outperforms other asymptotically optimal estimators, with better performance in settings with less estimated overlap.
Updated: 2024-05-15 16:38:28
标题: 限制学习用于因果推断和半参数统计学
摘要: 因果估计(例如平均处理效应)需要对复杂的干扰参数(例如结果模型)进行估计。为了调整干扰参数估计中的误差,我们提出了一种新颖的校正方法,该方法解决了在约束条件下寻找最佳插值估计器,使得估计器对于干扰参数估计的一阶误差为零。我们的受约束学习框架提供了一个统一的视角,包括去偏(也称为增强逆概率加权)和定位(也称为定位最大似然估计)等突出的一阶校正方法。我们的半参数推断方法,我们称之为“C-Learner”,可以使用现代机器学习方法,如神经网络和树集成来实现,并且具有半参数效率和双重稳健性等标准保证。在实证方面,我们在几个数据集上展示了我们的方法,包括涉及需要微调语言模型的文本特征的数据集。我们观察到C-Learner与其他渐近最优估计器相匹配或表现更好,在估计重叠较少的情况下性能更好。
更新时间: 2024-05-15 16:38:28
领域: stat.ML,cs.LG
MGSER-SAM: Memory-Guided Soft Experience Replay with Sharpness-Aware Optimization for Enhanced Continual Learning
Deep neural networks suffer from the catastrophic forgetting problem in the field of continual learning (CL). To address this challenge, we propose MGSER-SAM, a novel memory replay-based algorithm specifically engineered to enhance the generalization capabilities of CL models. We first intergrate the SAM optimizer, a component designed for optimizing flatness, which seamlessly fits into well-known Experience Replay frameworks such as ER and DER++. Then, MGSER-SAM distinctively addresses the complex challenge of reconciling conflicts in weight perturbation directions between ongoing tasks and previously stored memories, which is underexplored in the SAM optimizer. This is effectively accomplished by the strategic integration of soft logits and the alignment of memory gradient directions, where the regularization terms facilitate the concurrent minimization of various training loss terms integral to the CL process. Through rigorous experimental analysis conducted across multiple benchmarks, MGSER-SAM has demonstrated a consistent ability to outperform existing baselines in all three CL scenarios. Comparing to the representative memory replay-based baselines ER and DER++, MGSER-SAM not only improves the testing accuracy by $24.4\%$ and $17.6\%$ respectively, but also achieves the lowest forgetting on each benchmark.
Updated: 2024-05-15 16:37:09
标题: MGSER-SAM:具有锐度感知优化的记忆引导软经验重播,以增强持续学习
摘要: 深度神经网络在不断学习(CL)领域中存在灾难性遗忘问题。为了解决这一挑战,我们提出了MGSER-SAM,这是一种基于记忆重放的新颖算法,专门设计用于增强CL模型的泛化能力。我们首先集成了SAM优化器,这是一个专为优化平坦性而设计的组件,可以无缝地融入已知的经验重放框架,如ER和DER++。然后,MGSER-SAM独特地解决了在SAM优化器中尚未充分探讨的权重扰动方向在进行中的任务和之前存储的记忆之间的冲突问题。这是通过软标签的战略整合和记忆梯度方向的对齐有效地实现的,其中正则化项有助于同时最小化与CL过程密切相关的各种训练损失项。通过在多个基准测试中进行严格的实验分析,MGSER-SAM已经展现出在所有三种CL场景中超越现有基线的一致能力。与代表性的基于记忆重放的基线ER和DER++相比,MGSER-SAM不仅分别提高了测试精度24.4%和17.6%,还在每个基准测试中实现了最低的遗忘。
更新时间: 2024-05-15 16:37:09
领域: cs.LG
Automatic Programming: Large Language Models and Beyond
Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance
Updated: 2024-05-15 16:33:57
标题: 自动编程:大型语言模型及其进展
摘要: 自动编程由于出现了依赖于大型语言模型(LLMs)的工具(如GitHub Copilot)而变得越来越受欢迎。与此同时,自动生成的代码在部署过程中面临质量和信任方面的挑战。在本文中,我们研究了自动编码的一般意义,并研究了围绕代码质量、安全性以及程序员责任相关问题的关注点。这些是组织在决定使用自动生成的代码时的关键问题。我们讨论了软件工程方面的进展,如程序修复和分析如何实现自动编程。最后,我们展望未来,关注即将到来的编程环境,程序员可能需要转变角色以充分利用自动编程的力量。从LLMs自动生成的程序的自动修复可以帮助产生更高保证的代码,并提供保证的证据。
更新时间: 2024-05-15 16:33:57
领域: cs.SE,cs.AI,cs.LG
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
Recent advances in text-to-music generation models have opened new avenues in musical creativity. However, music generation usually involves iterative refinements, and how to edit the generated music remains a significant challenge. This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and instrument, while maintaining other aspects unchanged. Our method transforms text editing to \textit{latent space manipulation} while adding an extra constraint to enforce consistency. It seamlessly integrates with existing pretrained text-to-music diffusion models without requiring additional training. Experimental results demonstrate superior performance over both zero-shot and certain supervised baselines in style and timbre transfer evaluations. Additionally, we showcase the practical applicability of our approach in real-world music editing scenarios.
Updated: 2024-05-15 16:30:23
标题: 音乐魔法师:通过扩散模型进行零-shot文本到音乐编辑
摘要: 最近在文本到音乐生成模型方面取得了进展,为音乐创作开辟了新的途径。然而,音乐生成通常涉及迭代改进,如何编辑生成的音乐仍然是一个重大挑战。本文介绍了一种新颖的方法来编辑由这些模型生成的音乐,可以修改特定属性,如流派、情绪和乐器,同时保持其他方面不变。我们的方法将文本编辑转化为潜在空间操作,同时添加额外约束以强制一致性。它与现有的预训练文本到音乐扩散模型无缝集成,无需额外训练。实验结果表明,在风格和音色转移评估中,我们的方法在零-shot和某些监督基线模型上表现出优越性能。此外,我们展示了我们的方法在真实音乐编辑场景中的实际适用性。
更新时间: 2024-05-15 16:30:23
领域: cs.SD,cs.AI,cs.MM,eess.AS
DemOpts: Fairness corrections in COVID-19 case prediction models
COVID-19 forecasting models have been used to inform decision making around resource allocation and intervention decisions e.g., hospital beds or stay-at-home orders. State of the art deep learning models often use multimodal data such as mobility or socio-demographic data to enhance COVID-19 case prediction models. Nevertheless, related work has revealed under-reporting bias in COVID-19 cases as well as sampling bias in mobility data for certain minority racial and ethnic groups, which could in turn affect the fairness of the COVID-19 predictions along race labels. In this paper, we show that state of the art deep learning models output mean prediction errors that are significantly different across racial and ethnic groups; and which could, in turn, support unfair policy decisions. We also propose a novel de-biasing method, DemOpts, to increase the fairness of deep learning based forecasting models trained on potentially biased datasets. Our results show that DemOpts can achieve better error parity that other state of the art de-biasing approaches, thus effectively reducing the differences in the mean error distributions across more racial and ethnic groups.
Updated: 2024-05-15 16:22:46
标题: DemOpts:COVID-19病例预测模型中的公平性修正
摘要: COVID-19预测模型已被用于支持资源分配和干预决策,例如医院床位或居家隔离令。最先进的深度学习模型通常使用多模态数据,如流动性或社会人口数据,以增强COVID-19病例预测模型。然而,相关工作已经揭示了COVID-19病例的报告偏倚以及对特定少数族裔群体的流动性数据的抽样偏倚,这可能会影响COVID-19预测在种族标签上的公平性。在本文中,我们展示了最先进的深度学习模型输出的平均预测误差在不同种族和族裔群体之间存在显著差异;这可能支持不公平的政策决策。我们还提出了一种新颖的去偏差方法,DemOpts,以增加基于潜在偏倚数据集训练的深度学习预测模型的公平性。我们的结果表明,DemOpts可以实现更好的误差平等,比其他最先进的去偏倚方法,在更多种族和族裔群体之间有效减少平均误差分布的差异。
更新时间: 2024-05-15 16:22:46
领域: cs.LG,cs.CY
Harmonizing Human Insights and AI Precision: Hand in Hand for Advancing Knowledge Graph Task
Knowledge graph embedding (KGE) has caught significant interest for its effectiveness in knowledge graph completion (KGC), specifically link prediction (LP), with recent KGE models cracking the LP benchmarks. Despite the rapidly growing literature, insufficient attention has been paid to the cooperation between humans and AI on KG. However, humans' capability to analyze graphs conceptually may further improve the efficacy of KGE models with semantic information. To this effect, we carefully designed a human-AI team (HAIT) system dubbed KG-HAIT, which harnesses the human insights on KG by leveraging fully human-designed ad-hoc dynamic programming (DP) on KG to produce human insightful feature (HIF) vectors that capture the subgraph structural feature and semantic similarities. By integrating HIF vectors into the training of KGE models, notable improvements are observed across various benchmarks and metrics, accompanied by accelerated model convergence. Our results underscore the effectiveness of human-designed DP in the task of LP, emphasizing the pivotal role of collaboration between humans and AI on KG. We open avenues for further exploration and innovation through KG-HAIT, paving the way towards more effective and insightful KG analysis techniques.
Updated: 2024-05-15 16:16:37
标题: 协调人类洞察力和AI精度:携手推进知识图任务
摘要: 知识图谱嵌入(KGE)因其在知识图谱完成(KGC)中的有效性而引起了广泛关注,特别是在链接预测(LP)方面,最近的KGE模型已经打破了LP的基准。尽管文献迅速增长,但对人类与人工智能在知识图谱上的合作尚未得到足够重视。然而,人类分析图表的能力可能进一步提高KGE模型的效力,带来语义信息。为此,我们精心设计了一个名为KG-HAIT的人工智能团队(HAIT)系统,通过利用完全由人类设计的特定动态规划(DP)在知识图谱上获取人类见解,以生成捕捉子图结构特征和语义相似性的人类见解特征(HIF)向量。通过将HIF向量整合到KGE模型的训练中,观察到在各种基准和指标上显着改进,伴随着模型收敛的加速。我们的结果强调了人类设计的DP在LP任务中的有效性,强调了人类与人工智能在知识图谱上合作的关键作用。通过KG-HAIT,我们为进一步探索和创新开辟了道路,为更有效和深刻的知识图谱分析技术铺平了道路。
更新时间: 2024-05-15 16:16:37
领域: cs.LG,cs.AI
Double Machine Learning for Static Panel Models with Fixed Effects
Recent advances in causal inference have seen the development of methods which make use of the predictive power of machine learning algorithms. In this paper, we use double machine learning (DML) (Chernozhukov et al., 2018) to approximate high-dimensional and non-linear nuisance functions of the confounders to make inferences about the effects of policy interventions from panel data. We propose new estimators by adapting correlated random effects, within-group and first-difference estimation for linear models to an extension of Robinson (1988)'s partially linear regression model to static panel data models with individual fixed effects and unspecified non-linear confounder effects. Using Monte Carlo simulations, we compare the relative performance of different machine learning algorithms and find that conventional least squares estimators performs well when the data generating process is mildly non-linear and smooth, but there are substantial performance gains with DML in terms of bias reduction when the true effect of the regressors is non-linear and discontinuous. However, inference based on individual learners can lead to badly biased inference. Finally, we provide an illustrative example of DML for observational panel data showing the impact of the introduction of the minimum wage on voting behavior in the UK.
Updated: 2024-05-15 16:15:31
标题: 双机器学习用于带有固定效应的静态面板模型
摘要: 最近因果推断领域取得了一些进展,其中发展了利用机器学习算法的预测能力的方法。在本文中,我们使用双机器学习(DML)(Chernozhukov等人,2018)来近似高维和非线性的混淆变量的干扰函数,以便从面板数据中推断政策干预的效果。我们通过将相关随机效应、组内和差分估计方法应用于线性模型,将Robinson(1988年)的部分线性回归模型扩展到具有个体固定效应和未指定的非线性混淆效应的静态面板数据模型中,提出了新的估计量。通过蒙特卡洛模拟,我们比较了不同机器学习算法的相对性能,并发现在数据生成过程略微非线性且平滑时,传统的最小二乘估计表现良好,但在真实的回归变量效应为非线性且不连续时,DML 在偏差减少方面具有相当大的性能提升。然而,基于个体学习者的推断可能导致严重的偏差。最后,我们提供了一个关于观测性面板数据的DML示例,展示了英国最低工资引入对选民行为的影响。
更新时间: 2024-05-15 16:15:31
领域: econ.EM,cs.LG,stat.ML
AirIMU: Learning Uncertainty Propagation for Inertial Odometry
Inertial odometry (IO) using strap-down inertial measurement units (IMUs) is critical in many robotic applications where precise orientation and position tracking are essential. Prior kinematic motion model-based IO methods often use a simplified linearized IMU noise model and thus usually encounter difficulties in modeling non-deterministic errors arising from environmental disturbances and mechanical defects. In contrast, data-driven IO methods struggle to accurately model the sensor motions, often leading to generalizability and interoperability issues. To address these challenges, we present AirIMU, a hybrid approach to estimate the uncertainty, especially the non-deterministic errors, by data-driven methods and increase the generalization abilities using model-based methods. We demonstrate the adaptability of AirIMU using a full spectrum of IMUs, from low-cost automotive grades to high-end navigation grades. We also validate its effectiveness on various platforms, including hand-held devices, vehicles, and a helicopter that covers a trajectory of 262 kilometers. In the ablation study, we validate the effectiveness of our learned uncertainty in an IMU-GPS pose graph optimization experiment, achieving a 31.6\% improvement in accuracy. Experiments demonstrate that jointly training the IMU noise correction and uncertainty estimation synergistically benefits both tasks.
Updated: 2024-05-15 16:14:00
标题: AirIMU:学习惯性里程计的不确定性传播
摘要: 使用带有固连式惯性测量单元(IMUs)的惯性测距(IO)在许多机器人应用中至关重要,其中精确的方向和位置跟踪是必不可少的。以往基于运动模型的IO方法通常使用简化的线性化IMU噪声模型,因此通常难以对由环境干扰和机械缺陷引起的非确定性误差进行建模。相反,基于数据的IO方法往往难以准确建模传感器运动,经常导致泛化性和互操作性问题。为了解决这些挑战,我们提出了AirIMU,一种混合方法,通过数据驱动方法来估计不确定性,特别是非确定性误差,并使用基于模型的方法增强泛化能力。我们展示了AirIMU在各种IMUs上的适应性,从低成本的汽车级到高端导航级。我们还验证了其在包括手持设备、车辆和直升机在内的各种平台上的有效性,这些平台覆盖了262公里的轨迹。在消融研究中,我们验证了我们学到的不确定性在IMU-GPS位姿图优化实验中的有效性,实现了31.6\%的准确度提升。实验表明,联合训练IMU噪声校正和不确定性估计对两项任务都有协同效益。
更新时间: 2024-05-15 16:14:00
领域: cs.RO,cs.AI
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer
In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study.
Updated: 2024-05-15 16:05:24
标题: 朝向通过音频风格转移评估自动语音识别系统的鲁棒性
摘要: 鉴于自动语音识别(ASR)系统的广泛应用,它们的安全性问题比以往任何时候都更受关注,主要是由于深度神经网络的易受攻击性。先前的研究表明,暗中制造对抗性扰动可以操纵语音识别系统,导致恶意命令的生成。这些攻击方法大多需要在$\ell_p$规范约束下添加噪声扰动,不可避免地留下手动修改的痕迹。最近的研究通过操纵样式向量来合成基于文本转语音(TTS)合成音频的对抗性示例,从而缓解了这一限制。然而,基于优化目标的样式修改显著降低了音频样式的可控性和可编辑性。在本文中,我们提出了一种基于用户定制样式转移的ASR系统攻击方法。我们首先测试了样式转移攻击(STA)的效果,该攻击将样式转移和对抗性攻击按顺序结合。然后,作为改进,我们提出了迭代样式编码攻击(SCA)以保持音频质量。实验结果表明,我们的方法可以满足用户定制样式的需求,并在攻击中取得82%的成功率,同时保持声音的自然性,这得益于我们的用户研究。
更新时间: 2024-05-15 16:05:24
领域: cs.SD,cs.CR,cs.LG,eess.AS
Flashback: Enhancing Proposer-Builder Design with Future-Block Auctions in Proof-of-Stake Ethereum
Maximal extractable value (MEV) in which block proposers unethically gain profits by manipulating the order in which transactions are included within a block, is a key challenge facing blockchains such as Ethereum today. Left unchecked, MEV can lead to a centralization of stake distribution thereby ultimately compromising the security of blockchain consensus. To preserve proposer decentralization (and hence security) of the blockchain, Ethereum has advocated for a proposer-builder separation (PBS) in which the functionality of transaction ordering is separated from proposers and assigned to separate entities called builders. Builders accept transaction bundles from searchers, who compete to find the most profitable bundles. Builders then bid completed blocks to proposers, who accept the most profitable blocks for publication. The auction mechanisms used between searchers, builders and proposers are crucial to the overall health of the blockchain. In this paper, we consider PBS design in Ethereum as a game between searchers, builders and proposers. A key novelty in our design is the inclusion of future block proposers, as all proposers of an epoch are decided ahead of time in proof-of-stake (PoS) Ethereum within the game model. Our analysis shows the existence of alternative auction mechanisms that result in a better (more profitable) equilibrium to players compared to state-of-the-art. Experimental evaluations based on synthetic and real-world data traces corroborate the analysis. Our results highlight that a rethinking of auction mechanism designs is necessary in PoS Ethereum to prevent disruption.
Updated: 2024-05-15 15:58:21
标题: Flashback:在权益证明以太坊中使用未来区块拍卖增强提议者-构建者设计
摘要: 最大可提取价值(MEV)是指区块提议者通过操纵在区块中包含交易的顺序而不道德地获利,这是当今以太坊等区块链面临的关键挑战。如果不加以控制,MEV可能导致权益分配的集中化,从而最终危及区块链共识的安全性。为了保持区块提议者的分散性(因此也保证了区块链的安全性),以太坊提倡了提议者-构建者分离(PBS),其中交易排序功能被从提议者中分离出来,并分配给称为构建者的独立实体。构建者接受来自搜索者的交易捆绑包,搜索者竞争找到最有利可图的捆绑包。然后,构建者向提议者出价已完成的区块,提议者接受最有利可图的区块进行发布。在搜索者、构建者和提议者之间使用的拍卖机制对区块链的整体健康至关重要。本文将以太坊中的PBS设计视为搜索者、构建者和提议者之间的一场博弈。我们设计的一个关键创新是将未来的区块提议者纳入其中,因为在PoS以太坊中的游戏模型中,每个时代的所有提议者都是提前决定的。我们的分析显示存在替代的拍卖机制,与现有技术相比,这些机制对玩家具有更好(更有利可图)的平衡。基于合成和真实数据迹的实验评估证实了分析结果。我们的研究结果强调,在PoS以太坊中,有必要重新思考拍卖机制的设计,以防止混乱。
更新时间: 2024-05-15 15:58:21
领域: cs.CR
SWAT: A System-Wide Approach to Tunable Leakage Mitigation in Encrypted Data Stores
Numerous studies have underscored the significant privacy risks associated with various leakage patterns in encrypted data stores. While many solutions have been proposed to mitigate these leakages, they either (1) incur substantial overheads, (2) focus on specific subsets of leakage patterns, or (3) apply the same security notion across various workloads, thereby impeding the attainment of fine-tuned privacy-efficiency trade-offs. In light of various detrimental leakage patterns, this paper starts with an investigation into which specific leakage patterns require our focus in the contexts of key-value, range-query, and dynamic workloads, respectively. Subsequently, we introduce new security notions tailored to the specific privacy requirements of these workloads. Accordingly, we propose and instantiate SWAT, an efficient construction that progressively enables these workloads, while provably mitigating system-wide leakage via a suite of algorithms with tunable privacy-efficiency trade-offs. We conducted extensive experiments and compiled a detailed result analysis, showing the efficiency of our solution. SWATis about an order of magnitude slower than an encryption-only data store that reveals various leakage patterns and is two orders of magnitude faster than a trivial zero-leakage solution. Meanwhile, the performance of SWATremains highly competitive compared to other designs that mitigate specific types of leakage.
Updated: 2024-05-15 15:55:13
标题: SWAT:一种系统范围的方法,用于可调的加密数据存储中的泄漏缓解
摘要: 许多研究强调了加密数据存储中各种泄漏模式所带来的重要隐私风险。虽然已经提出了许多解决方案来减轻这些泄漏,但它们要么(1)产生重大开销,要么(2)专注于特定子集的泄漏模式,要么(3)在各种工作负载中应用相同的安全概念,从而妨碍了实现精细调节的隐私效率权衡。鉴于各种有害的泄漏模式,本文从调查在键-值、范围查询和动态工作负载的情境下需要我们关注的特定泄漏模式开始。随后,我们引入了针对这些工作负载的特定隐私需求定制的新安全概念。因此,我们提出并实现了SWAT,这是一种有效的构造,逐步使这些工作负载得以实现,同时通过一套可调节的算法明显减轻系统范围内的泄漏。我们进行了大量实验并编制了详细的结果分析,展示了我们解决方案的效率。与仅加密数据存储相比,SWAT的速度慢了一个数量级,但能够显示各种泄漏模式,比一个无泄漏的解决方案快两个数量级。与其他减轻特定泄漏类型设计相比,SWAT的性能仍然具有很高的竞争力。
更新时间: 2024-05-15 15:55:13
领域: cs.CR
Fourier Boundary Features Network with Wider Catchers for Glass Segmentation
Glass largely blurs the boundary between the real world and the reflection. The special transmittance and reflectance quality have confused the semantic tasks related to machine vision. Therefore, how to clear the boundary built by glass, and avoid over-capturing features as false positive information in deep structure, matters for constraining the segmentation of reflection surface and penetrating glass. We proposed the Fourier Boundary Features Network with Wider Catchers (FBWC), which might be the first attempt to utilize sufficiently wide horizontal shallow branches without vertical deepening for guiding the fine granularity segmentation boundary through primary glass semantic information. Specifically, we designed the Wider Coarse-Catchers (WCC) for anchoring large area segmentation and reducing excessive extraction from a structural perspective. We embed fine-grained features by Cross Transpose Attention (CTA), which is introduced to avoid the incomplete area within the boundary caused by reflection noise. For excavating glass features and balancing high-low layers context, a learnable Fourier Convolution Controller (FCC) is proposed to regulate information integration robustly. The proposed method has been validated on three different public glass segmentation datasets. Experimental results reveal that the proposed method yields better segmentation performance compared with the state-of-the-art (SOTA) methods in glass image segmentation.
Updated: 2024-05-15 15:52:27
标题: 傅立叶边界特征网络与更宽接收器用于玻璃分割
摘要: 玻璃在很大程度上模糊了现实世界和反射之间的界限。其特殊的透射和反射质量混淆了与机器视觉相关的语义任务。因此,如何清除玻璃建立的边界,并避免在深度结构中过度捕捉特征作为假阳性信息,对于限制反射表面的分割和穿透玻璃至关重要。我们提出了具有更宽接收器的傅立叶边界特征网络(FBWC),这可能是第一次尝试利用足够宽的水平浅分支而不是垂直加深来引导通过原始玻璃语义信息进行细粒度分割边界。具体来说,我们设计了更宽的粗接收器(WCC)来锚定大面积分割,并从结构角度减少过度提取。我们通过交叉转置注意力(CTA)嵌入细粒度特征,以避免反射噪声引起的边界内不完整区域。为了挖掘玻璃特征并平衡高低层上下文,我们提出了可学习的傅立叶卷积控制器(FCC),以强有力地调节信息集成。该方法已在三个不同的公共玻璃分割数据集上得到验证。实验结果表明,与玻璃图像分割中的最新方法相比,所提出的方法具有更好的分割性能。
更新时间: 2024-05-15 15:52:27
领域: cs.CV,cs.AI
Kuramoto Oscillators and Swarms on Manifolds for Geometry Informed Machine Learning
We propose the idea of using Kuramoto models (including their higher-dimensional generalizations) for machine learning over non-Euclidean data sets. These models are systems of matrix ODE's describing collective motions (swarming dynamics) of abstract particles (generalized oscillators) on spheres, homogeneous spaces and Lie groups. Such models have been extensively studied from the beginning of XXI century both in statistical physics and control theory. They provide a suitable framework for encoding maps between various manifolds and are capable of learning over spherical and hyperbolic geometries. In addition, they can learn coupled actions of transformation groups (such as special orthogonal, unitary and Lorentz groups). Furthermore, we overview families of probability distributions that provide appropriate statistical models for probabilistic modeling and inference in Geometric Deep Learning. We argue in favor of using statistical models which arise in different Kuramoto models in the continuum limit of particles. The most convenient families of probability distributions are those which are invariant with respect to actions of certain symmetry groups.
Updated: 2024-05-15 15:48:11
标题: Kuramoto振荡器和流形上的群集用于几何信息机器学习
摘要: 我们提出使用Kuramoto模型(包括它们的高维推广)进行非欧几里德数据集的机器学习的想法。这些模型是描述抽象粒子(广义振荡器)在球面、同质空间和李群上的集体运动(群集动力学)的矩阵ODE系统。这些模型从21世纪初开始在统计物理学和控制理论中得到了广泛研究。它们为在各种流形之间编码映射提供了合适的框架,并能够在球面和双曲几何上进行学习。此外,它们可以学习变换群的耦合作用(如特殊正交、酉和洛伦兹群)。此外,我们概述了一些概率分布族,为几何深度学习中的概率建模和推断提供适当的统计模型。我们主张使用在粒子的连续极限中出现的不同Kuramoto模型的统计模型。最方便的概率分布族是那些对某些对称群的作用不变的概率分布。
更新时间: 2024-05-15 15:48:11
领域: cs.LG,math-ph,math.MP,nlin.AO
Adversarial Consistency and the Uniqueness of the Adversarial Bayes Classifier
Adversarial training is a common technique for learning robust classifiers. Prior work showed that convex surrogate losses are not statistically consistent in the adversarial context -- or in other words, a minimizing sequence of the adversarial surrogate risk will not necessarily minimize the adversarial classification error. We connect the consistency of adversarial surrogate losses to properties of minimizers to the adversarial classification risk, known as \emph{adversarial Bayes classifiers}. Specifically, under reasonable distributional assumptions, a convex loss is statistically consistent for adversarial learning iff the adversarial Bayes classifier satisfies a certain notion of uniqueness.
Updated: 2024-05-15 15:43:18
标题: 对抗一致性与对抗贝叶斯分类器的独特性
摘要: 对抗训练是学习鲁棒分类器的常见技术。先前的研究表明,在对抗环境中,凸替代损失不具有统计一致性,换句话说,最小化的对抗替代风险序列不一定会最小化对抗分类错误。我们将对抗替代损失的一致性与最小化者的性质连接起来,这些最小化者与对抗分类风险有关,被称为\emph{对抗贝叶斯分类器}。具体而言,在合理的分布假设下,对于对抗学习,如果对抗贝叶斯分类器满足某种唯一性概念,则凸损失对统计学习是一致的。
更新时间: 2024-05-15 15:43:18
领域: cs.LG,math.ST,stat.ML,stat.TH
Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of Privacy
The high cost of model training makes it increasingly desirable to develop techniques for unlearning. These techniques seek to remove the influence of a training example without having to retrain the model from scratch. Intuitively, once a model has unlearned, an adversary that interacts with the model should no longer be able to tell whether the unlearned example was included in the model's training set or not. In the privacy literature, this is known as membership inference. In this work, we discuss adaptations of Membership Inference Attacks (MIAs) to the setting of unlearning (leading to their ``U-MIA'' counterparts). We propose a categorization of existing U-MIAs into ``population U-MIAs'', where the same attacker is instantiated for all examples, and ``per-example U-MIAs'', where a dedicated attacker is instantiated for each example. We show that the latter category, wherein the attacker tailors its membership prediction to each example under attack, is significantly stronger. Indeed, our results show that the commonly used U-MIAs in the unlearning literature overestimate the privacy protection afforded by existing unlearning techniques on both vision and language models. Our investigation reveals a large variance in the vulnerability of different examples to per-example U-MIAs. In fact, several unlearning algorithms lead to a reduced vulnerability for some, but not all, examples that we wish to unlearn, at the expense of increasing it for other examples. Notably, we find that the privacy protection for the remaining training examples may worsen as a consequence of unlearning. We also discuss the fundamental difficulty of equally protecting all examples using existing unlearning schemes, due to the different rates at which examples are unlearned. We demonstrate that naive attempts at tailoring unlearning stopping criteria to different examples fail to alleviate these issues.
Updated: 2024-05-15 15:41:57
标题: 不精确的遗忘需要更谨慎的评估,以避免虚假的隐私感知
摘要: 模型训练的高成本使得开发技术进行取消学习变得越来越有吸引力。这些技术旨在在不必从头开始重新训练模型的情况下消除训练示例的影响。直觉上,一旦模型取消学习,与模型交互的对手就不应再能够判断取消学习的示例是否包含在模型的训练集中。在隐私文献中,这被称为成员推断。在这项工作中,我们讨论了将成员推断攻击(MIAs)调整为取消学习设置(导致它们的“U-MIA”对应物)。我们提出了现有U-MIAs的分类,分为“人口U-MIAs”,其中相同的攻击者对所有示例进行实例化,以及“逐例U-MIAs”,其中为每个示例实例化专门的攻击者。我们显示,后一类别中,攻击者根据被攻击的每个示例定制其成员预测,这是显著更强大的。事实上,我们的结果表明,在取消学习文献中常用的U-MIAs高估了现有取消学习技术在视觉和语言模型上提供的隐私保护。我们的调查显示,不同示例对逐例U-MIAs的脆弱性存在很大差异。事实上,一些取消学习算法导致一些但不是所有希望取消学习的示例的脆弱性降低,而对其他示例的增加。值得注意的是,我们发现取消学习后,剩余训练示例的隐私保护可能会恶化。我们还讨论了使用现有取消学习方案平等保护所有示例的基本困难,这是由于示例取消学习的速率不同。我们证明了试图将取消学习停止标准调整为不同示例的朴素尝试未能缓解这些问题。
更新时间: 2024-05-15 15:41:57
领域: cs.LG,cs.CR
A Resource Model For Neural Scaling Law
Neural scaling laws characterize how model performance improves as the model size scales up. Inspired by empirical observations, we introduce a resource model of neural scaling. A task is usually composite hence can be decomposed into many subtasks, which compete for resources (measured by the number of neurons allocated to subtasks). On toy problems, we empirically find that: (1) The loss of a subtask is inversely proportional to its allocated neurons. (2) When multiple subtasks are present in a composite task, the resources acquired by each subtask uniformly grow as models get larger, keeping the ratios of acquired resources constants. We hypothesize these findings to be generally true and build a model to predict neural scaling laws for general composite tasks, which successfully replicates the neural scaling law of Chinchilla models reported in arXiv:2203.15556. We believe that the notion of resource used in this paper will be a useful tool for characterizing and diagnosing neural networks.
Updated: 2024-05-15 15:39:38
标题: 神经缩放定律的资源模型
摘要: 神经缩放定律描述了模型性能随着模型规模增大而提高的方式。受到经验观察的启发,我们引入了神经缩放的资源模型。一个任务通常是复合的,因此可以分解为许多子任务,这些子任务竞争资源(以分配给子任务的神经元数量来衡量)。在玩具问题上,我们经验性地发现:(1)子任务的损失与其分配的神经元成反比。 (2)当一个复合任务中存在多个子任务时,随着模型变大,每个子任务获得的资源均匀增长,保持获得资源的比例恒定。我们假设这些发现通常是真实的,并建立了一个模型来预测一般复合任务的神经缩放定律,成功复制了arXiv:2203.15556中报告的Chinchilla模型的神经缩放定律。我们相信本文中使用的资源概念将是表征和诊断神经网络的有用工具。
更新时间: 2024-05-15 15:39:38
领域: cs.LG,cs.AI,cs.NE
Desk-AId: Humanitarian Aid Desk Assessment with Geospatial AI for Predicting Landmine Areas
The process of clearing areas, namely demining, starts by assessing and prioritizing potential hazardous areas (i.e., desk assessment) to go under thorough investigation of experts, who confirm the risk and proceed with the mines clearance operations. This paper presents Desk-AId that supports the desk assessment phase by estimating landmine risks using geospatial data and socioeconomic information. Desk-AId uses a Geospatial AI approach specialized to landmines. The approach includes mixed data sampling strategies and context-enrichment by historical conflicts and key multi-domain facilities (e.g., buildings, roads, health sites). The proposed system addresses the issue of having only ground-truth for confirmed hazardous areas by implementing a new hard-negative data sampling strategy, where negative points are sampled in the vicinity of hazardous areas. Experiments validate Desk-Aid in two domains for landmine risk assessment: 1) country-wide, and 2) uncharted study areas). The proposed approach increases the estimation accuracies up to 92%, for different classification models such as RandomForest (RF), Feedforward Neural Networks (FNN), and Graph Neural Networks (GNN).
Updated: 2024-05-15 15:39:35
标题: Desk-AId:利用地理信息人工智能进行人道主义援助桌面评估,预测地雷区域
摘要: The process of demining begins with assessing and prioritizing potential hazardous areas through desk assessment, followed by thorough investigation by experts to confirm risks and conduct clearance operations. This study introduces Desk-AId, a system that utilizes geospatial data and socioeconomic information to estimate landmine risks during the desk assessment phase. Desk-AId employs a Geospatial AI approach tailored for landmines, incorporating mixed data sampling strategies and context enrichment with historical conflicts and key facilities. The system addresses the challenge of limited ground-truth data by introducing a hard-negative data sampling strategy, sampling negative points near hazardous areas. Experiments demonstrate Desk-Aid's effectiveness in landmine risk assessment for country-wide and uncharted study areas, achieving estimation accuracies of up to 92% across various classification models such as RandomForest (RF), Feedforward Neural Networks (FNN), and Graph Neural Networks (GNN).
更新时间: 2024-05-15 15:39:35
领域: cs.CY,cs.AI
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
It has been commonly observed that a teacher model with superior performance does not necessarily result in a stronger student, highlighting a discrepancy between current teacher training practices and effective knowledge transfer. In order to enhance the guidance of the teacher training process, we introduce the concept of distillation influence to determine the impact of distillation from each training sample on the student's generalization ability. In this paper, we propose Learning Good Teacher Matters (LGTM), an efficient training technique for incorporating distillation influence into the teacher's learning process. By prioritizing samples that are likely to enhance the student's generalization ability, our LGTM outperforms 10 common knowledge distillation baselines on 6 text classification tasks in the GLUE benchmark.
Updated: 2024-05-15 15:32:27
标题: 根据学生的学习水平调整指导,促进知识蒸馏
摘要: 研究发现,表现优异的教师模型并不一定会导致学生表现更好,突显了当前教师培训实践与有效知识传递之间存在差异。为了提升教师培训过程的指导,我们引入了蒸馏影响的概念,以确定每个训练样本对学生概括能力的影响。在本文中,我们提出了学习好老师很重要(LGTM)的有效训练技术,将蒸馏影响融入到教师学习过程中。通过优先处理那些可能提升学生概括能力的样本,我们的LGTM在GLUE基准测试的6个文本分类任务上优于10个常见的知识蒸馏基线模型。
更新时间: 2024-05-15 15:32:27
领域: cs.CL,cs.LG
Facilitating Opinion Diversity through Hybrid NLP Approaches
Modern democracies face a critical issue of declining citizen participation in decision-making. Online discussion forums are an important avenue for enhancing citizen participation. This thesis proposal 1) identifies the challenges involved in facilitating large-scale online discussions with Natural Language Processing (NLP), 2) suggests solutions to these challenges by incorporating hybrid human-AI technologies, and 3) investigates what these technologies can reveal about individual perspectives in online discussions. We propose a three-layered hierarchy for representing perspectives that can be obtained by a mixture of human intelligence and large language models. We illustrate how these representations can draw insights into the diversity of perspectives and allow us to investigate interactions in online discussions.
Updated: 2024-05-15 15:30:17
标题: 通过混合NLP方法促进意见多样性
摘要: 现代民主面临一个关键问题,即公民参与决策的下降。在线讨论论坛是增加公民参与的重要途径。本论文提议:1)通过自然语言处理(NLP)促进大规模在线讨论所涉及的挑战,2)建议通过整合混合人工智能技术来解决这些挑战,3)研究这些技术在在线讨论中能够揭示个人观点。我们提出了一个三层次的层次结构,用于表示可以通过人类智能和大型语言模型混合获得的观点。我们说明这些表示如何可以深入了解观点的多样性,并允许我们研究在线讨论中的互动。
更新时间: 2024-05-15 15:30:17
领域: cs.CL,cs.AI
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You
Text-to-image generation models have recently achieved astonishing results in image quality, flexibility, and text alignment, and are consequently employed in a fast-growing number of applications. Through improvements in multilingual abilities, a larger community now has access to this technology. However, our results show that multilingual models suffer from significant gender biases just as monolingual models do. Furthermore, the natural expectation that multilingual models will provide similar results across languages does not hold up. Instead, there are important differences between languages. We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models. We use MAGBIG to investigate the effect of multilingualism on gender bias in T2I models. To this end, we construct multilingual prompts requesting portraits of people with a certain occupation or trait. Our results show that not only do models exhibit strong gender biases but they also behave differently across languages. Furthermore, we investigate prompt engineering strategies, such as indirect, neutral formulations, to mitigate these biases. Unfortunately, these approaches have limited success and result in worse text-to-image alignment. Consequently, we call for more research into diverse representations across languages in image generators, as well as into steerability to address biased model behavior.
Updated: 2024-05-15 15:29:00
标题: 多语言文本到图像生成放大性别刻板印象,以及即时工程可能无济于事
摘要: 文本到图像生成模型最近在图像质量、灵活性和文本对齐方面取得了惊人的成果,因此被广泛应用于越来越多的应用中。通过多语言能力的改进,更大的社区现在可以访问这项技术。然而,我们的结果显示,多语言模型像单语模型一样存在显著的性别偏见。此外,多语言模型将在不同语言之间提供类似结果的自然期望并不成立。相反,不同语言之间存在重要差异。我们提出了一个新的基准,MAGBIG,旨在促进对多语言模型中性别偏见的研究。我们使用MAGBIG来调查多语言对T2I模型中性别偏见的影响。为此,我们构建了多语言提示,要求生成具有特定职业或特征的人的肖像。我们的结果显示,模型不仅表现出明显的性别偏见,而且在不同语言之间表现出不同的行为。此外,我们调查了提示工程策略,如间接、中立的表述,以减轻这些偏见。不幸的是,这些方法成功有限,并导致更糟糕的文本到图像对齐。因此,我们呼吁在图像生成器中进行跨语言的多元表示研究,以及调整模型偏见行为的可操纵性研究。
更新时间: 2024-05-15 15:29:00
领域: cs.CL,cs.CY,cs.LG
Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators
The rapid and wide-scale adoption of AI to generate human speech poses a range of significant ethical and safety risks to society that need to be addressed. For example, a growing number of speech generation incidents are associated with swatting attacks in the United States, where anonymous perpetrators create synthetic voices that call police officers to close down schools and hospitals, or to violently gain access to innocent citizens' homes. Incidents like this demonstrate that multimodal generative AI risks and harms do not exist in isolation, but arise from the interactions of multiple stakeholders and technical AI systems. In this paper we analyse speech generation incidents to study how patterns of specific harms arise. We find that specific harms can be categorised according to the exposure of affected individuals, that is to say whether they are a subject of, interact with, suffer due to, or are excluded from speech generation systems. Similarly, specific harms are also a consequence of the motives of the creators and deployers of the systems. Based on these insights we propose a conceptual framework for modelling pathways to ethical and safety harms of AI, which we use to develop a taxonomy of harms of speech generators. Our relational approach captures the complexity of risks and harms in sociotechnical AI systems, and yields a taxonomy that can support appropriate policy interventions and decision making for the responsible development and release of speech generation models.
Updated: 2024-05-15 15:26:42
标题: 不是我的声音!言语生成器的道德和安全伤害分类学
摘要: 人工智能广泛快速地应用于生成人类语音,给社会带来了一系列重大的伦理和安全风险,这些问题需要得到解决。例如,越来越多的语音生成事件与美国的“swatting”袭击有关,匿名的作恶者制造合成声音打电话给警察,关闭学校和医院,或者暴力闯入无辜市民的家中。这类事件表明,多模态生成人工智能的风险和伤害并不是孤立存在的,而是源自于多个利益相关者和技术人工智能系统的相互作用。本文分析了语音生成事件,研究了特定伤害模式是如何产生的。我们发现,特定伤害可以根据受影响个体的暴露程度进行分类,即他们是生成系统的主体、与之交互、因此受害、或被排除在外。同样,特定伤害也是系统的创造者和部署者动机的结果。基于这些见解,我们提出了一个概念框架,用于建模人工智能的伦理和安全伤害路径,我们利用该框架制定了语音生成器的伤害分类法。我们的关系方法捕捉了社会技术人工智能系统中风险和伤害的复杂性,并提供了一个分类法,可以支持适当的政策干预和决策,以负责任地开发和发布语音生成模型。
更新时间: 2024-05-15 15:26:42
领域: cs.CL,cs.AI,cs.CY,eess.AS
Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences
Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed point of a random algebraic equation. The analysis requires addressing several challenges which are unique to RNNs. In typical mean-field applications (e.g., feedforward neural networks), discrete updates are of magnitude $\mathcal{O}(\frac{1}{N})$ and the number of updates is $\mathcal{O}(N)$. Therefore, the system can be represented as an Euler approximation of an appropriate ODE/PDE, which it will converge to as $N \rightarrow \infty$. However, the RNN hidden layer updates are $\mathcal{O}(1)$. Therefore, RNNs cannot be represented as a discretization of an ODE/PDE and standard mean-field techniques cannot be applied. Instead, we develop a fixed point analysis for the evolution of the RNN memory states, with convergence estimates in terms of the number of update steps and the number of hidden units. The RNN hidden layer is studied as a function in a Sobolev space, whose evolution is governed by the data sequence (a Markov chain), the parameter updates, and its dependence on the RNN hidden layer at the previous time step. Due to the strong correlation between updates, a Poisson equation must be used to bound the fluctuations of the RNN around its limit equation. These mathematical methods give rise to the neural tangent kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.
Updated: 2024-05-15 15:21:17
标题: 循环神经网络在埃尔高迪克数据序列上训练的核极限
摘要: 数学方法被发展来表征随着隐藏单元数量、数据样本序列、隐藏状态更新和训练步骤同时增长到无穷大的情况下,经常性神经网络(RNN)的渐近特性。在简化权重矩阵的RNN情况下,我们证明了RNN收敛于与随机代数方程的不动点耦合的无限维ODE的解。分析需要解决几个对RNN独特的挑战。在典型的均场应用中(例如,前馈神经网络),离散更新的数量级为$\mathcal{O}(\frac{1}{N})$,更新次数的数量级为$\mathcal{O}(N)$。因此,系统可以表示为适当ODE/PDE的欧拉逼近,当$N \rightarrow \infty$时,它将收敛于该逼近。然而,RNN隐藏层的更新数量级为$\mathcal{O}(1)$。因此,RNN不能表示为ODE/PDE的离散化,也不能应用标准的均场技术。相反,我们针对RNN记忆状态的演化开发了一个固定点分析,其中收敛估计取决于更新步骤的数量和隐藏单元的数量。RNN隐藏层被研究为Sobolev空间中的一个函数,其演化受数据序列(马尔可夫链)、参数更新以及其对上一个时间步的RNN隐藏层的依赖所控制。由于更新之间的强相关性,必须使用泊松方程来限制RNN围绕其极限方程的波动。这些数学方法导致在数据序列上训练的RNN的神经切向核(NTK)极限,当数据样本数量和神经网络规模增长到无穷大时。
更新时间: 2024-05-15 15:21:17
领域: cs.LG,math.PR,stat.ML,68T07 (Primary), 68T05, 60J20 (Secondary)
Improving Label Error Detection and Elimination with Uncertainty Quantification
Identifying and handling label errors can significantly enhance the accuracy of supervised machine learning models. Recent approaches for identifying label errors demonstrate that a low self-confidence of models with respect to a certain label represents a good indicator of an erroneous label. However, latest work has built on softmax probabilities to measure self-confidence. In this paper, we argue that -- as softmax probabilities do not reflect a model's predictive uncertainty accurately -- label error detection requires more sophisticated measures of model uncertainty. Therefore, we develop a range of novel, model-agnostic algorithms for Uncertainty Quantification-Based Label Error Detection (UQ-LED), which combine the techniques of confident learning (CL), Monte Carlo Dropout (MCD), model uncertainty measures (e.g., entropy), and ensemble learning to enhance label error detection. We comprehensively evaluate our algorithms on four image classification benchmark datasets in two stages. In the first stage, we demonstrate that our UQ-LED algorithms outperform state-of-the-art confident learning in identifying label errors. In the second stage, we show that removing all identified errors from the training data based on our approach results in higher accuracies than training on all available labeled data. Importantly, besides our contributions to the detection of label errors, we particularly propose a novel approach to generate realistic, class-dependent label errors synthetically. Overall, our study demonstrates that selectively cleaning datasets with UQ-LED algorithms leads to more accurate classifications than using larger, noisier datasets.
Updated: 2024-05-15 15:17:52
标题: 使用不确定性量化改进标签错误检测和消除
摘要: 识别和处理标签错误可以显著提高监督式机器学习模型的准确性。最近用于识别标签错误的方法表明,模型对某个标签的低自信度是一个错误标签的良好指标。然而,最新的研究建立在softmax概率基础上来衡量自信度。在本文中,我们认为--因为softmax概率不能准确反映模型的预测不确定性--标签错误检测需要更复杂的模型不确定性度量。因此,我们开发了一系列新颖的、与模型无关的算法,用于基于不确定性量化的标签错误检测(UQ-LED),这些算法结合了自信度学习(CL)、蒙特卡罗辍学(MCD)、模型不确定性度量(如熵)和集成学习的技术,以增强标签错误检测。我们在两个阶段对我们的算法在四个图像分类基准数据集上进行了全面评估。在第一阶段,我们展示了我们的UQ-LED算法在识别标签错误方面优于最先进的自信度学习。在第二阶段,我们展示了根据我们的方法从训练数据中删除所有识别出的错误导致比在所有可用标记数据上训练更高的准确性。重要的是,除了我们在标签错误检测方面的贡献,我们特别提出了一种新颖的方法来合成生成逼真的、类相关的标签错误。总的来说,我们的研究表明,使用UQ-LED算法有选择地清理数据集比使用更大、更嘈杂的数据集导致更准确的分类。
更新时间: 2024-05-15 15:17:52
领域: cs.LG,cs.AI
Integrating Large Language Models in Causal Discovery: A Statistical Causal Approach
In practical statistical causal discovery (SCD), embedding domain expert knowledge as constraints into the algorithm is widely accepted as significant for creating consistent meaningful causal models, despite the recognized challenges in systematic acquisition of the background knowledge. To overcome these challenges, this paper proposes a novel methodology for causal inference, in which SCD methods and knowledge based causal inference (KBCI) with a large language model (LLM) are synthesized through ``statistical causal prompting (SCP)'' for LLMs and prior knowledge augmentation for SCD. Experiments have revealed that GPT-4 can cause the output of the LLM-KBCI and the SCD result with prior knowledge from LLM-KBCI to approach the ground truth, and that the SCD result can be further improved, if GPT-4 undergoes SCP. Furthermore, by using an unpublished real-world dataset, we have demonstrated that the background knowledge provided by the LLM can improve SCD on this dataset, even if this dataset has never been included in the training data of the LLM. The proposed approach can thus address challenges such as dataset biases and limitations, illustrating the potential of LLMs to improve data-driven causal inference across diverse scientific domains.
Updated: 2024-05-15 15:16:19
标题: 将大型语言模型整合到因果发现中:统计因果方法
摘要: 在实际统计因果发现(SCD)中,将领域专家知识作为约束嵌入算法被广泛接受为创建一致有意义的因果模型的重要因素,尽管系统获取背景知识存在挑战。为了克服这些挑战,本文提出了一种用于因果推断的新方法,其中SCD方法和基于知识的因果推断(KBCI)与大型语言模型(LLM)通过“统计因果提示(SCP)”和SCD的先验知识增强进行综合。实验证明,GPT-4可以使LLM-KBCI的输出和具有来自LLM-KBCI的先验知识的SCD结果接近实际情况,并且如果GPT-4经历SCP,SCD结果可以进一步改善。此外,通过使用未发表的真实世界数据集,我们已经证明LLM提供的背景知识可以改善该数据集上的SCD,即使该数据集从未包含在LLM的训练数据中。因此,所提出的方法可以解决数据集偏见和限制等挑战,展示了LLM在改善跨多样科学领域的数据驱动因果推断方面的潜力。
更新时间: 2024-05-15 15:16:19
领域: cs.LG,cs.AI,stat.ME,stat.ML
Invariant Risk Minimization Is A Total Variation Model
Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning. While most related works focus on new IRM settings or new application scenarios, the mathematical essence of IRM remains to be properly explained. We verify that IRM is essentially a total variation based on $L^2$ norm (TV-$\ell_2$) of the learning risk with respect to the classifier variable. Moreover, we propose a novel IRM framework based on the TV-$\ell_1$ model. It not only expands the classes of functions that can be used as the learning risk, but also has robust performance in denoising and invariant feature preservation based on the coarea formula. We also illustrate some requirements for IRM-TV-$\ell_1$ to achieve out-of-distribution generalization. Experimental results show that the proposed framework achieves competitive performance in several benchmark machine learning scenarios.
Updated: 2024-05-15 15:14:18
标题: 固定风险最小化是一个总变差模型
摘要: 不变风险最小化(IRM)是一种在机器学习中泛化不变特征到不同环境的新兴方法。虽然大多数相关工作都集中在新的IRM设置或新的应用场景上,但IRM的数学本质仍未得到适当解释。我们验证IRM本质上是学习风险相对于分类器变量的基于$L^2$范数的总变化(TV-$\ell_2$)。此外,我们提出了一种基于TV-$\ell_1$模型的新颖IRM框架。它不仅扩展了可以用作学习风险的函数类别,还在基于共面积公式的去噪和不变特征保留方面具有稳健性能。我们还阐明了IRM-TV-$\ell_1$实现超出分布泛化的一些要求。实验结果表明,所提出的框架在几个基准机器学习场景中取得了竞争性能。
更新时间: 2024-05-15 15:14:18
领域: cs.LG
On the Correspondence of Non-flat Assumption-based Argumentation and Logic Programming with Negation as Failure in the Head
The relation between (a fragment of) assumption-based argumentation (ABA) and logic programs (LPs) under stable model semantics is well-studied. However, for obtaining this relation, the ABA framework needs to be restricted to being flat, i.e., a fragment where the (defeasible) assumptions can never be entailed, only assumed to be true or false. Here, we remove this restriction and show a correspondence between non-flat ABA and LPs with negation as failure in their head. We then extend this result to so-called set-stable ABA semantics, originally defined for the fragment of non-flat ABA called bipolar ABA. We showcase how to define set-stable semantics for LPs with negation as failure in their head and show the correspondence to set-stable ABA semantics.
Updated: 2024-05-15 15:10:03
标题: 关于非扁平假设为基础的论证与逻辑编程在头部否定失败中的对应关系
摘要: 这个文献摘要讨论了基于假设的论证(ABA)与逻辑程序(LPs)在稳定模型语义下的关系。然而,为了获得这种关系,ABA框架需要被限制为平面的,即一个片段,其中(可废止的)假设永远不能被推导出,只能被假定为真或假。在这里,我们消除了这种限制,并展示了非平面ABA与在头部具有否定的LPs之间的对应关系。然后,我们将这一结果扩展到所谓的集合稳定ABA语义,最初是为非平面ABA的片段即双极ABA定义的。我们展示了如何为在头部具有否定的LPs定义集合稳定语义,并展示了与集合稳定ABA语义的对应关系。
更新时间: 2024-05-15 15:10:03
领域: cs.AI
Intelligent Tutor: Leveraging ChatGPT and Microsoft Copilot Studio to Deliver a Generative AI Student Support and Feedback System within Teams
This study explores the integration of the ChatGPT API with GPT-4 model and Microsoft Copilot Studio on the Microsoft Teams platform to develop an intelligent tutoring system. Designed to provide instant support to students, the system dynamically adjusts educational content in response to the learners' progress and feedback. Utilizing advancements in natural language processing and machine learning, it interprets student inquiries, offers tailored feedback, and facilitates the educational journey. Initial implementation highlights the system's potential in boosting students' motivation and engagement, while equipping educators with critical insights into the learning process, thus promoting tailored educational experiences and enhancing instructional effectiveness.
Updated: 2024-05-15 15:09:41
标题: 智能导师:利用ChatGPT和Microsoft Copilot Studio提供在Teams中交付生成式AI学生支持和反馈系统
摘要: 本研究探讨了在微软Teams平台上将ChatGPT API与GPT-4模型和Microsoft Copilot Studio集成,以开发智能辅导系统。该系统旨在为学生提供即时支持,根据学习者的进展和反馈动态调整教育内容。利用自然语言处理和机器学习的进展,系统解释学生的问题,提供定制反馈,并促进教育旅程。初步实施突显了该系统在提高学生动力和参与度方面的潜力,同时为教育工作者提供学习过程的关键见解,从而促进定制教育体验并增强教学效果。
更新时间: 2024-05-15 15:09:41
领域: cs.CL,cs.AI,cs.CY
Distinguishing Tor From Other Encrypted Network Traffic Through Character Analysis
For journalists reporting from a totalitarian regime, whistleblowers and resistance fighters, the anonymous use of cloud services on the Internet can be vital for survival. The Tor network provides a free and widely used anonymization service for everyone. However, there are different approaches to distinguishing Tor from non-Tor encrypted network traffic, most recently only due to the (relative) frequencies of hex digits in a single encrypted payload packet. While conventional data traffic is usually encrypted once, but at least three times in the case of Tor due to the structure and principle of the Tor network, we have examined to what extent the number of encryptions contributes to being able to distinguish Tor from non-Tor encrypted data traffic.
Updated: 2024-05-15 15:07:31
标题: 通过字符分析区分Tor与其他加密网络流量
摘要: 对于从极权主义政权、告密者和抵抗斗士报道的记者来说,互联网上匿名使用云服务对于生存至关重要。Tor网络为每个人提供了免费和广泛使用的匿名化服务。然而,有不同的方法来区分Tor和非Tor加密网络流量,最近仅仅是通过单个加密负载数据包中十六进制数字的(相对)频率。虽然传统数据流量通常只加密一次,但由于Tor网络的结构和原则,Tor至少加密三次,我们研究了加密次数在多大程度上有助于区分Tor和非Tor加密数据流量。
更新时间: 2024-05-15 15:07:31
领域: cs.CR,cs.LG,cs.NI
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. Federated learning offers a way to fine-tune LLMs using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance height possible with full-parameter tuning. However, federated full-parameter tuning of LLMs is a non-trivial problem due to the immense communication cost. This work introduces FedKSeed that employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds and scalar gradients, amounting to only a few thousand bytes, making federated full-parameter tuning of billion-sized LLMs possible on devices. Building on it, we develop a strategy enabling probability-differentiated seed sampling, prioritizing perturbations with greater impact on model accuracy. Experiments across six scenarios with various LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in both communication efficiency and new task generalization.
Updated: 2024-05-15 14:59:38
标题: 使用通信成本低于18千字节的联邦式全参数调整十亿规模语言模型
摘要: 预训练的大型语言模型(LLMs)需要微调以提高对自然语言指令的响应性。联邦学习提供了一种在不损害数据隐私的情况下利用终端设备上丰富数据对LLMs进行微调的方法。大多数现有的用于LLMs的联邦微调方法依赖于参数高效微调技术,这可能无法达到完全参数调整所可能达到的性能高度。然而,由于通信成本巨大,LLMs的联邦全参数微调是一个非常棘手的问题。本文介绍了FedKSeed,它利用了有限数量的随机种子进行零阶优化。它显著减少了服务器和客户端之间的传输要求,仅需几个随机种子和标量梯度,总共只有几千字节,使得在设备上可以对十亿级LLMs进行联邦全参数微调。在此基础上,我们开发了一种策略,使概率差异化种子采样成为可能,优先考虑对模型准确性产生更大影响的扰动。通过六种情景的实验,涉及各种LLMs、数据集和数据分区,我们的方法表现出色,无论是在通信效率还是新任务泛化方面,都优于现有的联邦LLM微调方法。
更新时间: 2024-05-15 14:59:38
领域: cs.LG,cs.DC
Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data
This research paper explores ways to apply Federated Learning (FL) and Differential Privacy (DP) techniques to population-scale Electrocardiogram (ECG) data. The study learns a multi-label ECG classification model using FL and DP based on 1,565,849 ECG tracings from 7 hospitals in Alberta, Canada. The FL approach allowed collaborative model training without sharing raw data between hospitals while building robust ECG classification models for diagnosing various cardiac conditions. These accurate ECG classification models can facilitate the diagnoses while preserving patient confidentiality using FL and DP techniques. Our results show that the performance achieved using our implementation of the FL approach is comparable to that of the pooled approach, where the model is trained over the aggregating data from all hospitals. Furthermore, our findings suggest that hospitals with limited ECGs for training can benefit from adopting the FL model compared to single-site training. In addition, this study showcases the trade-off between model performance and data privacy by employing DP during model training. Our code is available at https://github.com/vikhyatt/Hospital-FL-DP.
Updated: 2024-05-15 14:52:58
标题: 多医院规模心电图数据上的联邦学习和差分隐私技术
摘要: 本研究探讨了如何将联邦学习(FL)和差分隐私(DP)技术应用于人口规模的心电图(ECG)数据。该研究使用了来自加拿大阿尔伯塔省7家医院的1,565,849份ECG示踪数据,利用FL和DP学习了一个多标签ECG分类模型。FL方法允许在医院之间进行协作模型训练,同时构建了用于诊断各种心脏疾病的强大ECG分类模型。这些准确的ECG分类模型可以在使用FL和DP技术的同时促进诊断,保护患者的隐私。我们的结果表明,使用我们的FL方法实现的性能与汇总数据训练模型(模型在所有医院的聚合数据上训练)的性能相当。此外,我们的发现表明,对于训练ECG数量有限的医院,采用FL模型比单一地点训练更有益。此外,本研究展示了在模型训练过程中利用DP实现模型性能和数据隐私之间的权衡。我们的代码可在https://github.com/vikhyatt/Hospital-FL-DP上找到。
更新时间: 2024-05-15 14:52:58
领域: eess.SP,cs.CR,cs.LG
Stationarity without mean reversion in improper Gaussian processes
The behavior of a GP regression depends on the choice of covariance function. Stationary covariance functions are preferred in machine learning applications. However, (non-periodic) stationary covariance functions are always mean reverting and can therefore exhibit pathological behavior when applied to data that does not relax to a fixed global mean value. In this paper we show that it is possible to use improper GP priors with infinite variance to define processes that are stationary but not mean reverting. To this aim, we use of non-positive kernels that can only be defined in this limit regime. The resulting posterior distributions can be computed analytically and it involves a simple correction of the usual formulas. The main contribution of the paper is the introduction of a large family of smooth non-reverting covariance functions that closely resemble the kernels commonly used in the GP literature (e.g. squared exponential and Mat\'ern class). By analyzing both synthetic and real data, we demonstrate that these non-positive kernels solve some known pathologies of mean reverting GP regression while retaining most of the favorable properties of ordinary smooth stationary kernels.
Updated: 2024-05-15 14:52:50
标题: 不适当高斯过程中的平稳性但无均值回归
摘要: The behavior of Gaussian process regression is influenced by the choice of covariance function. Stationary covariance functions are typically preferred in machine learning applications. However, non-periodic stationary covariance functions are always mean-reverting and may exhibit problematic behavior when applied to data that does not converge to a fixed global mean value. This paper demonstrates that it is possible to use improper Gaussian process priors with infinite variance to define stationary processes that are not mean-reverting. This is achieved by utilizing non-positive kernels that can only be defined in this extreme regime. The resulting posterior distributions can be calculated analytically with a simple adjustment to the standard formulas. The main contribution of this paper is the introduction of a broad range of smooth non-reverting covariance functions that closely resemble commonly used kernels in Gaussian process literature, such as the squared exponential and Matérn class. Through analysis of synthetic and real data, it is shown that these non-positive kernels resolve certain issues associated with mean-reverting Gaussian process regression while maintaining many of the favorable properties of traditional smooth stationary kernels.
更新时间: 2024-05-15 14:52:50
领域: stat.ML,cs.LG
Encrypted Container File: Design and Implementation of a Hybrid-Encrypted Multi-Recipient File Structure
Modern software engineering trends towards Cloud-native software development by international teams of developers. Cloud-based version management services, such as GitHub, are used for the source code and other artifacts created during the development process. However, using such a service usually means that every developer has access to all data stored on the platform. Particularly, if the developers belong to different companies or organizations, it would be desirable for sensitive files to be encrypted in such a way that these can only be decrypted again by a group of previously defined people. In this paper, we examine currently available tools that address this problem, but which have certain shortcomings. We then present our own solution, Encrypted Container Files (ECF), for this problem, eliminating the deficiencies found in the other tools.
Updated: 2024-05-15 14:51:46
标题: 加密容器文件:混合加密多接收者文件结构的设计与实现
摘要: 现代软件工程趋向于由国际团队开发的云原生软件开发。云端版本管理服务,如GitHub,用于在开发过程中创建的源代码和其他工件。然而,使用这样的服务通常意味着每个开发人员都可以访问平台上存储的所有数据。特别是,如果开发人员属于不同的公司或组织,希望敏感文件以一种只能由事先定义的一组人解密的方式加密。在本文中,我们研究了目前可用的解决这一问题的工具,但这些工具存在一定的缺陷。然后,我们提出了我们自己的解决方案,加密容器文件(ECF),用于解决这一问题,消除其他工具中发现的缺陷。
更新时间: 2024-05-15 14:51:46
领域: cs.DC,cs.CR,cs.SE
$O_2$ is a multiple context-free grammar: an implementation-, formalisation-friendly proof
Classifying formal languages according to the expressiveness of grammars able to generate them is a fundamental problem in computational linguistics and, therefore, in the theory of computation. Furthermore, such kind of analysis can give insight into the classification of abstract algebraic structure such as groups, for example through the correspondence given by the word problem. While many such classification problems remain open, others have been settled. Recently, it was proved that $n$-balanced languages (i.e., whose strings contain the same occurrences of letters $a_i$ and $A_i$ with $1\leq i \leq n$) can be generated by multiple context-free grammars (MCFGs), which are one of the several slight extensions of context free grammars added to the classical Chomsky hierarchy to make the mentioned classification more precise. This paper analyses the existing proofs from the computational and the proof-theoretical point of views, systematically studying whether each proof can lead to a verified (i.e., checked by a proof assistant) algorithm parsing balanced languages via MCFGs. We conclude that none of the existing proofs is realistically suitable against this practical goal, and proceed to provide a radically new, elementary, extremely short proof for the crucial case $n \leq 2$. A comparative analysis with respect to the existing proofs is finally performed to justify why the proposed proof is a substantial step towards concretely obtaining a verified parsing algorithm for $O_2$.
Updated: 2024-05-15 14:51:11
标题: $O_2$ 是一个多重上下文无关语法:一个易于实现和形式化的证明
摘要: 根据产生它们的语法的表达能力,将形式语言分类是计算语言学中的一个基本问题,因此也是计算理论中的一个问题。此外,这种分析可以深入探讨抽象代数结构的分类,例如通过词问题给出的对应关系。虽然许多这样的分类问题仍然未解决,但其他问题已经解决。最近,证明了$n$-平衡语言(即,其字符串包含相同数量的字母$a_i$和$A_i$,其中$1 \leq i \leq n$)可以由多上下文无关文法(MCFGs)生成,这是上下文无关文法的几个轻微扩展之一,被添加到经典的乔姆斯基层次结构中,以使所述的分类更加精确。本文从计算和证明理论的角度分析了现有的证明,系统地研究每个证明是否能够通过MCFGs导出一个经过验证的(即,由证明助手检查过的)用于解析平衡语言的算法。我们得出结论,现有的证明都不太适合实现这一实际目标,并继续提供一个全新的、基础的、极为简短的证明,适用于关键情况$n \leq 2$。最后进行了与现有证明的比较分析,以证明所提出的证明是实现为$O_2$编写一个经过验证的解析算法的重要一步。
更新时间: 2024-05-15 14:51:11
领域: cs.FL,cs.AI,cs.LO,math.LO
Matching domain experts by training from scratch on domain knowledge
Recently, large language models (LLMs) have outperformed human experts in predicting the results of neuroscience experiments (Luo et al., 2024). What is the basis for this performance? One possibility is that statistical patterns in that specific scientific literature, as opposed to emergent reasoning abilities arising from broader training, underlie LLMs' performance. To evaluate this possibility, we trained (next word prediction) a relatively small 124M-parameter GPT-2 model on 1.3 billion tokens of domain-specific knowledge. Despite being orders of magnitude smaller than larger LLMs trained on trillions of tokens, small models achieved expert-level performance in predicting neuroscience results. Small models trained on the neuroscience literature succeeded when they were trained from scratch using a tokenizer specifically trained on neuroscience text or when the neuroscience literature was used to finetune a pretrained GPT-2. Our results indicate that expert-level performance may be attained by even small LLMs through domain-specific, auto-regressive training approaches.
Updated: 2024-05-15 14:50:51
标题: 用领域知识从零开始培训领域专家进行匹配
摘要: 最近,大型语言模型(LLMs)在预测神经科学实验结果方面已经超过了人类专家(Luo等,2024年)。这种表现的基础是什么?一种可能性是,与更广泛的培训中产生的新兴推理能力不同,特定科学文献中的统计模式构成了LLMs表现的基础。为了评估这种可能性,我们训练了一个相对较小的124M参数的GPT-2模型,使用13亿个领域特定知识的标记。尽管比训练在数万亿个标记上的更大LLMs小得多,但小型模型在预测神经科学结果方面达到了专家水平的表现。当从头开始使用专门针对神经科学文本进行训练的分词器或者使用神经科学文献对预训练的GPT-2进行微调时,训练在神经科学文献上的小型模型取得了成功。我们的结果表明,通过领域特定的自回归训练方法,即使是小型LLMs也可以达到专家水平的表现。
更新时间: 2024-05-15 14:50:51
领域: q-bio.NC,cs.AI,cs.CL
SA-FedLora: Adaptive Parameter Allocation for Efficient Federated Learning with LoRA Tuning
Fine-tuning large-scale pre-trained models via transfer learning is an emerging important paradigm for a wide range of downstream tasks, with performance heavily reliant on extensive data. Federated learning (FL), as a distributed framework, provides a secure solution to train models on local datasets while safeguarding raw sensitive data. However, FL networks encounter high communication costs due to the massive parameters of large-scale pre-trained models, necessitating parameter-efficient methods. Notably, parameter efficient fine tuning, such as Low-Rank Adaptation (LoRA), has shown remarkable success in fine-tuning pre-trained models. However, prior research indicates that the fixed parameter budget may be prone to the overfitting or slower convergence. To address this challenge, we propose a Simulated Annealing-based Federated Learning with LoRA tuning (SA-FedLoRA) approach by reducing trainable parameters. Specifically, SA-FedLoRA comprises two stages: initiating and annealing. (1) In the initiating stage, we implement a parameter regularization approach during the early rounds of aggregation, aiming to mitigate client drift and accelerate the convergence for the subsequent tuning. (2) In the annealing stage, we allocate higher parameter budget during the early 'heating' phase and then gradually shrink the budget until the 'cooling' phase. This strategy not only facilitates convergence to the global optimum but also reduces communication costs. Experimental results demonstrate that SA-FedLoRA is an efficient FL, achieving superior performance to FedAvg and significantly reducing communication parameters by up to 93.62%.
Updated: 2024-05-15 14:50:46
标题: SA-FedLora:自适应参数分配,用于LoRA调优的高效联邦学习
摘要: 通过迁移学习微调大规模预训练模型是一种新兴的重要范式,适用于各种下游任务,性能严重依赖于大量数据。联邦学习(FL)作为一种分布式框架,提供了一种安全的解决方案,可以在本地数据集上训练模型,同时保护原始敏感数据。然而,由于大规模预训练模型的庞大参数,FL网络面临着高通信成本的挑战,需要参数高效方法。值得注意的是,参数高效的微调方法,如Low-Rank Adaptation(LoRA),在微调预训练模型方面取得了显著成功。然而,先前的研究表明,固定的参数预算可能容易出现过拟合或较慢的收敛。为了解决这一挑战,我们提出了一种基于模拟退火的联邦学习与LoRA调整(SA-FedLoRA)方法,通过减少可训练参数。具体而言,SA-FedLoRA包括两个阶段:初始化和退火。(1)在初始化阶段,我们在早期聚合轮次中实施参数正则化方法,旨在减轻客户端漂移并加速后续调整的收敛。(2)在退火阶段,我们在早期的“加热”阶段分配更高的参数预算,然后逐渐缩减预算直至“冷却”阶段。这种策略不仅有助于收敛至全局最优解,还降低了通信成本。实验结果表明,SA-FedLoRA是一种有效的FL,比FedAvg表现更优秀,并且可将通信参数减少高达93.62%。
更新时间: 2024-05-15 14:50:46
领域: cs.LG,cs.DC
LLM Voting: Human Choices and AI Collective Decision Making
This paper investigates the voting behaviors of Large Language Models (LLMs), specifically GPT-4 and LLaMA-2, their biases, and how they align with human voting patterns. Our methodology involved using a dataset from a human voting experiment to establish a baseline for human preferences and a corresponding experiment with LLM agents. We observed that the methods used for voting input and the presentation of choices influence LLM voting behavior. We discovered that varying the persona can reduce some of these biases and enhance alignment with human choices. While the Chain-of-Thought approach did not improve prediction accuracy, it has potential for AI explainability in the voting process. We also identified a trade-off between preference diversity and alignment accuracy in LLMs, influenced by different temperature settings. Our findings indicate that LLMs may lead to less diverse collective outcomes and biased assumptions when used in voting scenarios, emphasizing the importance of cautious integration of LLMs into democratic processes.
Updated: 2024-05-15 14:50:37
标题: LLM投票:人类选择与人工智能集体决策-making
摘要: 本文研究了大型语言模型(LLMs)的投票行为,具体来说是GPT-4和LLaMA-2,它们的偏见以及它们与人类投票模式的一致性。我们的方法包括使用来自人类投票实验的数据集来建立人类偏好的基线,并进行相应的LLM代理实验。我们观察到用于投票输入和选择呈现的方法会影响LLM的投票行为。我们发现改变人设可以减少一些这些偏见,并增强与人类选择的一致性。虽然“思维链”方法并没有提高预测准确性,但在投票过程中具有AI可解释性的潜力。我们还确定了LLMs中偏好多样性和一致性准确性之间的权衡,受不同温度设置的影响。我们的发现表明,当LLMs用于投票场景时,可能会导致较少多样化的集体结果和有偏见的假设,强调了谨慎将LLMs整合到民主过程中的重要性。
更新时间: 2024-05-15 14:50:37
领域: cs.CL,cs.AI,cs.CY,cs.LG,econ.GN,q-fin.EC,68T05, 91B14, 91C20,I.2.7; J.4; K.4.1
Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes
The robust constrained Markov decision process (RCMDP) is a recent task-modelling framework for reinforcement learning that incorporates behavioural constraints and that provides robustness to errors in the transition dynamics model through the use of an uncertainty set. Simulating RCMDPs requires computing the worst-case dynamics based on value estimates for each state, an approach which has previously been used in the Robust Constrained Policy Gradient (RCPG). Highlighting potential downsides of RCPG such as not robustifying the full constrained objective and the lack of incremental learning, this paper introduces two algorithms, called RCPG with Robust Lagrangian and Adversarial RCPG. RCPG with Robust Lagrangian modifies RCPG by taking the worst-case dynamics based on the Lagrangian rather than either the value or the constraint. Adversarial RCPG also formulates the worst-case dynamics based on the Lagrangian but learns this directly and incrementally as an adversarial policy through gradient descent rather than indirectly and abruptly through constrained optimisation on a sorted value list. A theoretical analysis first derives the Lagrangian policy gradient for the policy optimisation of both proposed algorithms and then the adversarial policy gradient to learn the adversary for Adversarial RCPG. Empirical experiments injecting perturbations in inventory management and safe navigation tasks demonstrate the competitive performance of both algorithms compared to traditional RCPG variants as well as non-robust and non-constrained ablations. In particular, Adversarial RCPG ranks among the top two performing algorithms on all tests.
Updated: 2024-05-15 14:46:34
标题: 强健的拉格朗日和对抗性策略梯度用于强健受限马尔可夫决策过程
摘要: 鲁棒约束马尔可夫决策过程(RCMDP)是最近用于强化学习的任务建模框架,它结合了行为约束,并通过使用不确定性集提供对过渡动态模型中错误的鲁棒性。模拟RCMDP需要计算基于每个状态的值估计的最坏情况动态,这种方法之前已经在鲁棒约束策略梯度(RCPG)中使用过。本文首先强调了RCPG的潜在缺点,例如没有使完整的约束目标鲁棒化以及缺乏增量学习,然后介绍了两种算法,称为具有鲁棒Lagrange的RCPG和对抗性RCPG。具有鲁棒Lagrange的RCPG通过基于拉格朗日而不是值或约束来取得最坏情况动态来修改RCPG。对抗性RCPG也基于拉格朗日制定最坏情况动态,但是通过梯度下降直接和逐步地学习对抗政策,而不是通过对排序值列表进行受约束优化间接和突然地学习。理论分析首先为两种提出的算法的政策优化推导拉格朗日政策梯度,然后为对抗性RCPG学习对手的对抗政策梯度。在库存管理和安全导航任务中注入扰动的实证实验表明,与传统的RCPG变种以及非鲁棒和非约束的对照组相比,这两种算法表现出竞争性能。特别是,对抗性RCPG在所有测试中排名前两位表现最好。
更新时间: 2024-05-15 14:46:34
领域: cs.LG,cs.AI,cs.NE
Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling
We study the problem of automatically annotating relevant numerals (GAAP metrics) occurring in the financial documents with their corresponding XBRL tags. Different from prior works, we investigate the feasibility of solving this extreme classification problem using a generative paradigm through instruction tuning of Large Language Models (LLMs). To this end, we leverage metric metadata information to frame our target outputs while proposing a parameter efficient solution for the task using LoRA. We perform experiments on two recently released financial numeric labeling datasets. Our proposed model, FLAN-FinXC, achieves new state-of-the-art performances on both the datasets, outperforming several strong baselines. We explain the better scores of our proposed model by demonstrating its capability for zero-shot as well as the least frequently occurring tags. Also, even when we fail to predict the XBRL tags correctly, our generated output has substantial overlap with the ground-truth in majority of the cases.
Updated: 2024-05-15 14:43:23
标题: 大规模语言模型在极端金融数字标注中的参数高效指导
摘要: 我们研究了在财务文件中自动标注相关数字(GAAP指标)及其对应的XBRL标签的问题。与先前的研究不同,我们调查了使用生成式范式通过对大型语言模型(LLMs)进行指导调整来解决这个极端分类问题的可行性。为此,我们利用度量元数据信息来构建我们的目标输出,并提出了使用LoRA的任务的参数高效解决方案。我们在两个最近发布的财务数字标注数据集上进行了实验。我们提出的模型FLAN-FinXC在这两个数据集上取得了新的最先进性能,超过了几个强基线。我们解释了我们提出的模型得分更高的原因,通过展示它对零样本以及出现频率较低的标签的能力。此外,即使我们未能正确预测XBRL标签,我们生成的输出在大多数情况下与地面真实数据有显著重叠。
更新时间: 2024-05-15 14:43:23
领域: cs.CL,cs.CE,cs.LG
Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation
Scientific modeling applications often require estimating a distribution of parameters consistent with a dataset of observations - an inference task also known as source distribution estimation. This problem can be ill-posed, however, since many different source distributions might produce the same distribution of data-consistent simulations. To make a principled choice among many equally valid sources, we propose an approach which targets the maximum entropy distribution, i.e., prioritizes retaining as much uncertainty as possible. Our method is purely sample-based - leveraging the Sliced-Wasserstein distance to measure the discrepancy between the dataset and simulations - and thus suitable for simulators with intractable likelihoods. We benchmark our method on several tasks, and show that it can recover source distributions with substantially higher entropy than recent source estimation methods, without sacrificing the fidelity of the simulations. Finally, to demonstrate the utility of our approach, we infer source distributions for parameters of the Hodgkin-Huxley model from experimental datasets with thousands of single-neuron measurements. In summary, we propose a principled method for inferring source distributions of scientific simulator parameters while retaining as much uncertainty as possible.
Updated: 2024-05-15 14:32:14
标题: Sourcerer: 基于样本的最大熵源分布估计
摘要: 科学建模应用通常需要估计与观测数据集一致的参数分布 - 这也被称为源分布估计的推断任务。然而,这个问题可能是不适定的,因为许多不同的源分布可能会产生相同的数据一致模拟分布。为了在许多同样有效的源中做出合理的选择,我们提出了一种方法,该方法针对最大熵分布,即优先保留尽可能多的不确定性。我们的方法是纯粹基于样本的 - 利用切片Wasserstein距离来衡量数据集和模拟之间的差异 - 因此适用于具有难以处理的似然函数的模拟器。我们在几项任务上对我们的方法进行了基准测试,并表明它可以恢复具有比最近的源估计方法更高熵的源分布,而不会牺牲模拟的准确性。最后,为了展示我们方法的实用性,我们从成千上万个单神经元测量的实验数据集中推断出Hodgkin-Huxley模型的参数的源分布。总之,我们提出了一种基于原则的方法,用于推断科学模拟器参数的源分布,同时尽可能保留更多的不确定性。
更新时间: 2024-05-15 14:32:14
领域: cs.LG
On the Saturation Effect of Kernel Ridge Regression
The saturation effect refers to the phenomenon that the kernel ridge regression (KRR) fails to achieve the information theoretical lower bound when the smoothness of the underground truth function exceeds certain level. The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this long-standing conjecture.
Updated: 2024-05-15 14:15:09
标题: 关于核岭回归的饱和效应
摘要: 饱和效应是指核岭回归(KRR)在地下真相函数的平滑度超过一定水平时无法达到信息理论下限的现象。这种饱和效应在实践中被广泛观察到,并且KRR的饱和下限已经被猜测了几十年。在本文中,我们提供了这一长期猜想的证明。
更新时间: 2024-05-15 14:15:09
领域: stat.ML,cs.LG
Aggregate Representation Measure for Predictive Model Reusability
In this paper, we propose a predictive quantifier to estimate the retraining cost of a trained model in distribution shifts. The proposed Aggregated Representation Measure (ARM) quantifies the change in the model's representation from the old to new data distribution. It provides, before actually retraining the model, a single concise index of resources - epochs, energy, and carbon emissions - required for the retraining. This enables reuse of a model with a much lower cost than training a new model from scratch. The experimental results indicate that ARM reasonably predicts retraining costs for varying noise intensities and enables comparisons among multiple model architectures to determine the most cost-effective and sustainable option.
Updated: 2024-05-15 14:14:34
标题: 聚合表示度量用于预测模型的可重用性
摘要: 在这篇论文中,我们提出了一个预测性量化器,用于估计在分布转移中训练模型的重新训练成本。提出的聚合表示测量(ARM)量化了模型表示从旧数据分布到新数据分布的变化。在实际重新训练模型之前,它提供了资源(时代、能量和碳排放)的单一简洁指数,用于重新训练所需的。这使得可以以比从头开始训练新模型更低的成本重复使用模型。实验结果表明,ARM合理地预测了不同噪声强度的重新训练成本,并使得可以比较多种模型架构,以确定最具成本效益和可持续性的选项。
更新时间: 2024-05-15 14:14:34
领域: cs.LG,cs.AI,cs.CV,cs.CY
RAGFormer: Learning Semantic Attributes and Topological Structure for Fraud Detection
Fraud detection remains a challenging task due to the complex and deceptive nature of fraudulent activities. Current approaches primarily concentrate on learning only one perspective of the graph: either the topological structure of the graph or the attributes of individual nodes. However, we conduct empirical studies to reveal that these two types of features, while nearly orthogonal, are each independently effective. As a result, previous methods can not fully capture the comprehensive characteristics of the fraud graph. To address this dilemma, we present a novel framework called Relation-Aware GNN with transFormer~(RAGFormer) which simultaneously embeds both semantic and topological features into a target node. The simple yet effective network consists of a semantic encoder, a topology encoder, and an attention fusion module. The semantic encoder utilizes Transformer to learn semantic features and node interactions across different relations. We introduce Relation-Aware GNN as the topology encoder to learn topological features and node interactions within each relation. These two complementary features are interleaved through an attention fusion module to support prediction by both orthogonal features. Extensive experiments on two popular public datasets demonstrate that RAGFormer achieves state-of-the-art performance. The significant improvement of RAGFormer in an industrial credit card fraud detection dataset further validates the applicability of our method in real-world business scenarios.
Updated: 2024-05-15 14:13:52
标题: RAGFormer:学习用于欺诈检测的语义属性和拓扑结构
摘要: 欺诈检测仍然是一项具有挑战性的任务,因为欺诈活动的复杂和欺骗性质。当前的方法主要集中在学习图的一个视角:要么是图的拓扑结构,要么是个体节点的属性。然而,我们进行了经验研究,揭示了这两种特征,虽然几乎是正交的,但每种都是有效的。因此,以往的方法无法完全捕捉欺诈图的全面特征。为了解决这一困境,我们提出了一种名为关系感知GNN与转换器(RAGFormer)的新框架,它将语义和拓扑特征同时嵌入到目标节点中。这个简单而有效的网络由一个语义编码器、一个拓扑编码器和一个注意力融合模块组成。语义编码器利用Transformer学习语义特征和不同关系之间的节点交互。我们引入关系感知GNN作为拓扑编码器,学习不同关系中的拓扑特征和节点交互。这两种互补特征通过注意力融合模块交错在一起,支持通过正交特征进行预测。对两个流行的公共数据集进行的大量实验表明,RAGFormer实现了最先进的性能。RAGFormer在一个工业信用卡欺诈检测数据集中的显著改进进一步验证了我们的方法在真实商业场景中的适用性。
更新时间: 2024-05-15 14:13:52
领域: cs.LG,cs.AI
The Unfairness of $\varepsilon$-Fairness
Fairness in decision-making processes is often quantified using probabilistic metrics. However, these metrics may not fully capture the real-world consequences of unfairness. In this article, we adopt a utility-based approach to more accurately measure the real-world impacts of decision-making process. In particular, we show that if the concept of $\varepsilon$-fairness is employed, it can possibly lead to outcomes that are maximally unfair in the real-world context. Additionally, we address the common issue of unavailable data on false negatives by proposing a reduced setting that still captures essential fairness considerations. We illustrate our findings with two real-world examples: college admissions and credit risk assessment. Our analysis reveals that while traditional probability-based evaluations might suggest fairness, a utility-based approach uncovers the necessary actions to truly achieve equality. For instance, in the college admission case, we find that enhancing completion rates is crucial for ensuring fairness. Summarizing, this paper highlights the importance of considering the real-world context when evaluating fairness.
Updated: 2024-05-15 14:13:35
标题: $\varepsilon$-公平性的不公平性
摘要: 决策过程中的公平性通常使用概率度量来量化。然而,这些度量可能无法完全捕捉到不公平性的真实后果。在本文中,我们采用基于效用的方法来更准确地衡量决策过程的真实影响。特别是,我们展示了如果采用$\varepsilon$-公平性的概念,可能会导致在现实世界背景下最不公平的结果。此外,我们提出了一个减少设置的方法来解决关于假阴性数据不可用的常见问题,这仍然涵盖了基本的公平考虑。我们用两个现实世界的例子:大学入学和信用风险评估来说明我们的发现。我们的分析显示,虽然传统的基于概率的评估可能表明公平性,但基于效用的方法揭示了实现真正平等所需的行动。例如,在大学入学案例中,我们发现提高完成率对确保公平至关重要。总之,本文强调了在评估公平性时考虑现实世界背景的重要性。
更新时间: 2024-05-15 14:13:35
领域: cs.LG,econ.TH,q-fin.MF,stat.ML
Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose Prediction
Localizing oneself during endoscopic procedures can be problematic due to the lack of distinguishable textures and landmarks, as well as difficulties due to the endoscopic device such as a limited field of view and challenging lighting conditions. Expert knowledge shaped by years of experience is required for localization within the human body during endoscopic procedures. In this work, we present a deep learning method based on anatomy recognition, that constructs a surgical path in an unsupervised manner from surgical videos, modelling relative location and variations due to different viewing angles. At inference time, the model can map an unseen video's frames on the path and estimate the viewing angle, aiming to provide guidance, for instance, to reach a particular destination. We test the method on a dataset consisting of surgical videos of transsphenoidal adenomectomies, as well as on a synthetic dataset. An online tool that lets researchers upload their surgical videos to obtain anatomy detections and the weights of the trained YOLOv7 model are available at: https://surgicalvision.bmic.ethz.ch.
Updated: 2024-05-15 14:09:11
标题: 基于视觉的神经外科引导:无监督定位和相机姿态预测
摘要: 在内窥镜手术过程中进行定位可能会出现问题,这是由于缺乏可区分的纹理和地标,以及内窥镜设备的限制视野和挑战性光照条件所致。在内窥镜手术过程中,需要经验丰富的专业知识才能进行定位。在这项工作中,我们提出了一种基于解剖学识别的深度学习方法,从手术视频中以无监督的方式构建手术路径,建模由于不同视角而产生的相对位置和变化。在推断时,该模型可以将未见过的视频帧映射到路径上,并估计视角,旨在提供指导,例如到达特定目的地。我们在一个由经蝶骨腺瘤摘除手术视频组成的数据集以及一个合成数据集上测试了这种方法。一个在线工具允许研究人员上传他们的手术视频以获取解剖学检测结果和经过训练的YOLOv7模型的权重,网址为:https://surgicalvision.bmic.ethz.ch。
更新时间: 2024-05-15 14:09:11
领域: cs.CV,cs.AI
Properties that allow or prohibit transferability of adversarial attacks among quantized networks
Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Further, these adversarial examples are found to be transferable from the source network in which they are crafted to a black-box target network. As the trend of using deep learning on embedded devices grows, it becomes relevant to study the transferability properties of adversarial examples among compressed networks. In this paper, we consider quantization as a network compression technique and evaluate the performance of transfer-based attacks when the source and target networks are quantized at different bitwidths. We explore how algorithm specific properties affect transferability by considering various adversarial example generation algorithms. Furthermore, we examine transferability in a more realistic scenario where the source and target networks may differ in bitwidth and other model-related properties like capacity and architecture. We find that although quantization reduces transferability, certain attack types demonstrate an ability to enhance it. Additionally, the average transferability of adversarial examples among quantized versions of a network can be used to estimate the transferability to quantized target networks with varying capacity and architecture.
Updated: 2024-05-15 14:06:28
标题: 允许或禁止对量化网络之间的敌对攻击传递的属性
摘要: 深度神经网络(DNNs)被认为容易受到对抗性样本的攻击。此外,这些对抗性样本被发现可以从它们被创建的源网络转移到黑盒目标网络。随着在嵌入式设备上使用深度学习的趋势增长,研究对抗性样本在压缩网络中的传递性质变得相关。在本文中,我们将量化作为一种网络压缩技术,并评估在源网络和目标网络以不同位宽量化时基于传递性的攻击性能。我们探讨了算法特定属性如何影响传递性,考虑了各种对抗性样本生成算法。此外,我们在一个更现实的情景中研究了传递性,在这种情况下,源网络和目标网络可能在位宽和其他模型相关属性(如容量和架构)上有所不同。我们发现,尽管量化减少了传递性,但某些攻击类型可以增强传递性。此外,对网络的量化版本之间的对抗性样本的平均传递性可以用来估计到具有不同容量和架构的量化目标网络的传递性。
更新时间: 2024-05-15 14:06:28
领域: cs.LG,cs.AI
Unbiased Learning to Rank Meets Reality: Lessons from Baidu's Large-Scale Search Dataset
Unbiased learning-to-rank (ULTR) is a well-established framework for learning from user clicks, which are often biased by the ranker collecting the data. While theoretically justified and extensively tested in simulation, ULTR techniques lack empirical validation, especially on modern search engines. The Baidu-ULTR dataset released for the WSDM Cup 2023, collected from Baidu's search engine, offers a rare opportunity to assess the real-world performance of prominent ULTR techniques. Despite multiple submissions during the WSDM Cup 2023 and the subsequent NTCIR ULTRE-2 task, it remains unclear whether the observed improvements stem from applying ULTR or other learning techniques. In this work, we revisit and extend the available experiments on the Baidu-ULTR dataset. We find that standard unbiased learning-to-rank techniques robustly improve click predictions but struggle to consistently improve ranking performance, especially considering the stark differences obtained by choice of ranking loss and query-document features. Our experiments reveal that gains in click prediction do not necessarily translate to enhanced ranking performance on expert relevance annotations, implying that conclusions strongly depend on how success is measured in this benchmark.
Updated: 2024-05-15 14:04:20
标题: 无偏学习排序遇上现实:百度大规模搜索数据集的经验教训
摘要: 无偏学习排序(ULTR)是一个为了从用户点击中学习而建立的框架,这些点击通常会受到收集数据的排序器的偏见影响。虽然在理论上得到了证明并在模拟中进行了广泛测试,但ULTR技术缺乏经验验证,特别是在现代搜索引擎上。百度-ULTR数据集发布于WSDM Cup 2023,从百度的搜索引擎收集而来,为评估知名ULTR技术的真实世界性能提供了难得的机会。尽管在WSDM Cup 2023和随后的NTCIR ULTRE-2任务中提交了多个作品,但目前尚不清楚观察到的改进是否源自应用ULTR还是其他学习技术。 在这项工作中,我们重新审视并扩展了百度-ULTR数据集上的可用实验。我们发现标准的无偏学习排序技术能够稳健地提高点击预测,但在提高排名性能方面存在困难,尤其是考虑到排名损失和查询-文档特征选择所得到的明显差异。我们的实验表明,点击预测的增益并不一定会转化为在专家相关性注释上的排名性能提升,这意味着结论在这一基准测试中的成功程度强烈依赖于如何衡量成功。
更新时间: 2024-05-15 14:04:20
领域: cs.IR,cs.AI
Learning Reward for Robot Skills Using Large Language Models via Self-Alignment
Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement over training efficacy and efficiency, meanwhile consuming significantly fewer GPT tokens compared to the alternative mutation-based method.
Updated: 2024-05-15 13:59:19
标题: 使用大型语言模型通过自对齐学习机器人技能的奖励
摘要: 学习奖励函数仍然是为机器人配备广泛技能的瓶颈。大型语言模型(LLM)包含有助于学习奖励函数的宝贵任务相关知识。然而,提出的奖励函数可能不够精确,因此无效,需要进一步与环境信息相结合。我们提出了一种在没有人类干预的情况下更有效地学习奖励的方法。我们的方法包含两个组成部分:首先使用LLM提出奖励的特征和参数化,然后通过迭代的自我校准过程更新参数。特别是,该过程通过执行反馈减小了LLM和学习奖励函数之间的排名不一致性。该方法在两个模拟环境中的9个任务上进行了验证。它表现出对训练效果和效率的一致改进,同时与基于变异的替代方法相比,消耗了显著更少的GPT令牌。
更新时间: 2024-05-15 13:59:19
领域: cs.RO,cs.AI
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Pretrained large language models (LLMs) exhibit exceptional general language processing capabilities but come with significant demands on memory and computational resources. As a powerful compression technology, binarization can extremely reduce model weights to a mere 1 bit, lowering the expensive computation and memory requirements. However, existing quantization techniques fall short of maintaining LLM performance under ultra-low bit-widths. In response to this challenge, we present BiLLM, a groundbreaking 1-bit post-training quantization scheme tailored for pretrained LLMs. Based on the weight distribution of LLMs, BiLLM first identifies and structurally selects salient weights, and minimizes the compression loss through an effective binary residual approximation strategy. Moreover, considering the bell-shaped distribution of the non-salient weights, we propose an optimal splitting search to group and binarize them accurately. BiLLM achieving for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families and evaluation metrics, outperforms SOTA quantization methods of LLM by significant margins. Moreover, BiLLM enables the binarization process of the LLM with 7 billion weights within 0.5 hours on a single GPU, demonstrating satisfactory time efficiency. Our code is available at https://github.com/Aaronhuang-778/BiLLM.
Updated: 2024-05-15 13:55:12
标题: BiLLM:推动LLMs后训练量化的极限
摘要: 预训练的大型语言模型(LLMs)展现出出色的通用语言处理能力,但对内存和计算资源有着显著的要求。作为一种强大的压缩技术,二值化可以将模型权重极大地减少到仅1位,降低了昂贵的计算和内存需求。然而,现有的量化技术未能在极低位宽下保持LLM的性能。为了应对这一挑战,我们提出了BiLLM,这是一种为预训练LLMs量身定制的突破性1位后训练量化方案。基于LLMs的权重分布,BiLLM首先识别并结构选择显著的权重,并通过有效的二进制残差逼近策略最小化压缩损失。此外,考虑到非显著权重的钟形分布,我们提出了一种最优分组搜索来准确地对它们进行分组和二值化。BiLLM首次在各种LLMs系列和评估指标中实现了高准确性推理(例如在LLaMA2-70B上的8.41困惑度),仅使用1.08位权重,优于LLM的SOTA量化方法。此外,BiLLM能够在单个GPU上在0.5小时内对拥有70亿个权重的LLM进行二值化处理,展现出令人满意的时间效率。我们的代码可在https://github.com/Aaronhuang-778/BiLLM 上找到。
更新时间: 2024-05-15 13:55:12
领域: cs.LG,cs.AI,cs.CL
A vector quantized masked autoencoder for audiovisual speech emotion recognition
The limited availability of labeled data is a major challenge in audiovisual speech emotion recognition (SER). Self-supervised learning approaches have recently been proposed to mitigate the need for labeled data in various applications. This paper proposes the VQ-MAE-AV model, a vector quantized masked autoencoder (MAE) designed for audiovisual speech self-supervised representation learning and applied to SER. Unlike previous approaches, the proposed method employs a self-supervised paradigm based on discrete audio and visual speech representations learned by vector quantized variational autoencoders. A multimodal MAE with self- or cross-attention mechanisms is proposed to fuse the audio and visual speech modalities and to learn local and global representations of the audiovisual speech sequence, which are then used for an SER downstream task. Experimental results show that the proposed approach, which is pre-trained on the VoxCeleb2 database and fine-tuned on standard emotional audiovisual speech datasets, outperforms the state-of-the-art audiovisual SER methods. Extensive ablation experiments are also provided to assess the contribution of the different model components.
Updated: 2024-05-15 13:54:49
标题: 一个用于音频视觉语音情感识别的矢量量化掩码自动编码器
摘要: 有限的标记数据的可用性是音视频语音情感识别(SER)中的一个主要挑战。最近提出了自监督学习方法来缓解各种应用中对标记数据的需求。本文提出了VQ-MAE-AV模型,这是一种用于音视频语音自监督表示学习的矢量量化掩蔽自编码器(MAE),并应用于SER。与先前的方法不同,所提出的方法采用了基于离散音频和视觉语音表示的自监督范式,这些表示是通过矢量量化变分自编码器学习的。提出了一个多模态MAE,其中包括自注意力或交叉注意力机制,用于融合音频和视觉语音模态,并学习音视频语音序列的局部和全局表示,然后用于SER的下游任务。实验结果表明,所提出的方法,在VoxCeleb2数据库上进行了预训练,并在标准情感音视频语音数据集上进行了微调,优于最先进的音视频SER方法。还提供了大量消融实验,以评估不同模型组件的贡献。
更新时间: 2024-05-15 13:54:49
领域: cs.SD,cs.LG,cs.MM,eess.AS
When AI Eats Itself: On the Caveats of Data Pollution in the Era of Generative AI
Generative artificial intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimize training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution. However, not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimize outcomes. Currently, the previously well-controlled integration of real and synthetic data is becoming uncontrollable. The widespread and unregulated dissemination of synthetic data online leads to the contamination of datasets traditionally compiled through web scraping, now mixed with unlabeled synthetic data. This trend portends a future where generative AI systems may increasingly rely blindly on consuming self-generated data, raising concerns about model performance and ethical issues. What will happen if generative AI continuously consumes itself without discernment? What measures can we take to mitigate the potential adverse effects? There is a significant gap in the scientific literature regarding the impact of synthetic data use in generative AI, particularly in terms of the fusion of multimodal information. To address this research gap, this review investigates the consequences of integrating synthetic data blindly on training generative AI on both image and text modalities and explores strategies to mitigate these effects. The goal is to offer a comprehensive view of synthetic data's role, advocating for a balanced approach to its use and exploring practices that promote the sustainable development of generative AI technologies in the era of large models.
Updated: 2024-05-15 13:50:23
标题: 当AI吞噬自己:在生成式AI时代数据污染的注意事项
摘要: 生成人工智能(AI)技术和大型模型正在各个领域产生逼真的输出,如图像、文本、语音和音乐。创建这些先进的生成模型需要大量资源,特别是大规模和高质量的数据集。为了最小化培训成本,许多算法开发者使用模型自己创建的数据作为一种经济有效的培训解决方案。然而,并非所有合成数据都能有效地提高模型性能,这需要在使用真实数据与合成数据之间寻找战略平衡以优化结果。 目前,以前良好控制的真实和合成数据的整合正变得难以控制。在线合成数据的广泛和无监管传播导致传统通过网络抓取编制的数据集受到污染,现在混合了未标记的合成数据。这一趋势预示着未来生成AI系统可能越来越盲目地依赖消耗自动生成的数据,引发了对模型性能和伦理问题的担忧。如果生成AI持续消耗自身而缺乏辨别能力会发生什么?我们可以采取什么措施来减轻潜在的不良影响? 在关于合成数据在生成AI中使用的影响方面,科学文献存在重大差距,特别是在多模态信息融合方面。为了填补这一研究空白,本综述调查了盲目整合合成数据对训练生成AI在图像和文本模态上的影响,并探讨了减轻这些影响的策略。旨在提供对合成数据作用的全面视角,倡导对其使用采取平衡的方法,探索促进在大型模型时代可持续发展生成AI技术的实践。
更新时间: 2024-05-15 13:50:23
领域: cs.LG,cs.AI
Learning functions on symmetric matrices and point clouds via lightweight invariant features
In this work, we present a mathematical formulation for machine learning of (1) functions on symmetric matrices that are invariant with respect to the action of permutations by conjugation, and (2) functions on point clouds that are invariant with respect to rotations, reflections, and permutations of the points. To achieve this, we construct $O(n^2)$ invariant features derived from generators for the field of rational functions on $n\times n$ symmetric matrices that are invariant under joint permutations of rows and columns. We show that these invariant features can separate all distinct orbits of symmetric matrices except for a measure zero set; such features can be used to universally approximate invariant functions on almost all weighted graphs. For point clouds in a fixed dimension, we prove that the number of invariant features can be reduced, generically without losing expressivity, to $O(n)$, where $n$ is the number of points. We combine these invariant features with DeepSets to learn functions on symmetric matrices and point clouds with varying sizes. We empirically demonstrate the feasibility of our approach on molecule property regression and point cloud distance prediction.
Updated: 2024-05-15 13:48:54
标题: 通过轻量级不变特征学习对称矩阵和点云上的函数
摘要: 在这项工作中,我们提出了一种数学形式化方法,用于机器学习对称矩阵上的(1)关于排列共轭不变的函数,以及(2)关于点云上的关于旋转、反射和点的排列不变的函数。为了实现这一点,我们构建了从$n\times n$对称矩阵上的有理函数域生成器中导出的$O(n^2)$的不变特征,这些特征在行和列的联合排列下是不变的。我们展示了这些不变特征可以分离出除了一个零测集之外的所有对称矩阵的不同轨道;这些特征可以用于在几乎所有加权图上普遍逼近不变函数。对于固定维度的点云,我们证明不变特征的数量可以减少到$O(n)$,通常情况下不会失去表达能力,其中$n$是点的数量。我们将这些不变特征与DeepSets结合起来,以学习对称矩阵和点云上具有不同大小的函数。我们在分子属性回归和点云距离预测上通过实验证明了我们方法的可行性。
更新时间: 2024-05-15 13:48:54
领域: cs.LG,math.AC,68P01, 13A50
Large Language Model Bias Mitigation from the Perspective of Knowledge Editing
Existing debiasing methods inevitably make unreasonable or undesired predictions as they are designated and evaluated to achieve parity across different social groups but leave aside individual facts, resulting in modified existing knowledge. In this paper, we first establish a new bias mitigation benchmark BiasKE leveraging existing and additional constructed datasets, which systematically assesses debiasing performance by complementary metrics on fairness, specificity, and generalization. Meanwhile, we propose a novel debiasing method, Fairness Stamp (FAST), which enables editable fairness through fine-grained calibration on individual biased knowledge. Comprehensive experiments demonstrate that FAST surpasses state-of-the-art baselines with remarkable debiasing performance while not hampering overall model capability for knowledge preservation, highlighting the prospect of fine-grained debiasing strategies for editable fairness in LLMs.
Updated: 2024-05-15 13:44:13
标题: 从知识编辑的角度减轻大型语言模型的偏见
摘要: 现有的去偏见方法不可避免地会做出不合理或不期望的预测,因为它们被指定和评估以实现不同社会群体之间的平等,但却忽略了个体事实,导致现有知识被修改。在本文中,我们首先建立了一个新的偏见缓解基准 BiasKE,利用现有和额外构建的数据集,通过补充公平性、特异性和泛化的评估指标系统地评估去偏见方法的性能。同时,我们提出了一种新颖的去偏见方法 Fairness Stamp (FAST),通过对个体偏见知识进行精细校准,实现可编辑的公平性。全面的实验表明,FAST在不损害整体模型能力的情况下,超越了最先进的基线,具有显著的去偏见性能,突出了在LLMs中实现可编辑公平性的细粒度去偏见策略的前景。
更新时间: 2024-05-15 13:44:13
领域: cs.CL,cs.AI
Enhancing Maritime Trajectory Forecasting via H3 Index and Causal Language Modelling (CLM)
The prediction of ship trajectories is a growing field of study in artificial intelligence. Traditional methods rely on the use of LSTM, GRU networks, and even Transformer architectures for the prediction of spatio-temporal series. This study proposes a viable alternative for predicting these trajectories using only GNSS positions. It considers this spatio-temporal problem as a natural language processing problem. The latitude/longitude coordinates of AIS messages are transformed into cell identifiers using the H3 index. Thanks to the pseudo-octal representation, it becomes easier for language models to learn the spatial hierarchy of the H3 index. The method is compared with a classical Kalman filter, widely used in the maritime domain, and introduces the Fr\'echet distance as the main evaluation metric. We show that it is possible to predict ship trajectories quite precisely up to 8 hours with 30 minutes of context. We demonstrate that this alternative works well enough to predict trajectories worldwide.
Updated: 2024-05-15 13:43:07
标题: 通过H3指数和因果语言建模(CLM)提升海上轨迹预测
摘要: 船舶轨迹预测是人工智能研究领域中一个日益发展的领域。传统方法依赖于使用LSTM、GRU网络,甚至Transformer架构来预测时空序列。本研究提出了一种可行的替代方法,仅使用GNSS位置来预测这些轨迹。它将这个时空问题视为自然语言处理问题。AIS消息的纬度/经度坐标被转换为使用H3索引的单元标识符。由于伪八进制表示法,语言模型更容易学习H3索引的空间层次结构。该方法与在海事领域广泛使用的经典卡尔曼滤波器进行了比较,并引入Fr\'echet距离作为主要评估指标。我们展示了在30分钟的上下文下,可以精确预测长达8小时的船舶轨迹。我们证明了这种替代方法足以在全球范围内预测轨迹。
更新时间: 2024-05-15 13:43:07
领域: cs.LG,cs.AI,stat.ME
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
Large language models (LLMs), such as ChatGPT, have received substantial attention due to their capabilities for understanding and generating human language. While there has been a burgeoning trend in research focusing on the employment of LLMs in supporting different medical tasks (e.g., enhancing clinical diagnostics and providing medical education), a review of these efforts, particularly their development, practical applications, and outcomes in medicine, remains scarce. Therefore, this review aims to provide a detailed overview of the development and deployment of LLMs in medicine, including the challenges and opportunities they face. In terms of development, we provide a detailed introduction to the principles of existing medical LLMs, including their basic model structures, number of parameters, and sources and scales of data used for model development. It serves as a guide for practitioners in developing medical LLMs tailored to their specific needs. In terms of deployment, we offer a comparison of the performance of different LLMs across various medical tasks, and further compare them with state-of-the-art lightweight models, aiming to provide an understanding of the advantages and limitations of LLMs in medicine. Overall, in this review, we address the following questions: 1) What are the practices for developing medical LLMs 2) How to measure the medical task performance of LLMs in a medical setting? 3) How have medical LLMs been employed in real-world practice? 4) What challenges arise from the use of medical LLMs? and 5) How to more effectively develop and deploy medical LLMs? By answering these questions, this review aims to provide insights into the opportunities for LLMs in medicine and serve as a practical resource. We also maintain a regularly updated list of practical guides on medical LLMs at: https://github.com/AI-in-Health/MedLLMsPracticalGuide.
Updated: 2024-05-15 13:38:45
标题: 医学领域中大型语言模型的调查:进展、应用和挑战
摘要: 大型语言模型(LLMs),如ChatGPT,因其理解和生成人类语言的能力而受到广泛关注。尽管研究中关注LLMs在支持不同医学任务(如增强临床诊断和提供医学教育)方面的应用趋势日益增加,但对这些努力的回顾,特别是它们在医学中的发展、实际应用和结果方面,仍然很少。因此,本综述旨在提供对LLMs在医学中的发展和部署的详细概述,包括它们面临的挑战和机遇。在开发方面,我们介绍了现有医学LLMs的原则,包括它们的基本模型结构、参数数量以及用于模型开发的数据来源和规模。这为从业者在开发符合其特定需求的医学LLMs提供了指导。在部署方面,我们比较了不同LLMs在各种医学任务中的表现,并进一步将它们与最先进的轻量级模型进行比较,旨在了解LLMs在医学中的优势和局限性。总体而言,在本综述中,我们回答了以下问题:1)开发医学LLMs的实践是什么?2)如何在医学环境中测量LLMs的医学任务表现?3)医学LLMs如何在实际应用中被使用?4)使用医学LLMs会出现哪些挑战?5)如何更有效地开发和部署医学LLMs?通过回答这些问题,本综述旨在提供LLMs在医学中的机会见解,并作为一个实用资源。我们还在https://github.com/AI-in-Health/MedLLMsPracticalGuide 上保持定期更新的医学LLMs实用指南清单。
更新时间: 2024-05-15 13:38:45
领域: cs.CL,cs.AI
LLMs can learn self-restraint through iterative self-reflection
In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when it is confident in them. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of ``self-reflection'' consisting of iterative self-prompting and self-evaluation. We use the ReSearch algorithm to generate synthetic data on which we finetune our models. Compared to their original versions, our resulting models generate fewer \emph{hallucinations} overall at no additional inference cost, for both known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to abstain by augmenting the samples generated by the model during the search procedure with an answer expressing abstention.
Updated: 2024-05-15 13:35:43
标题: LLMs可以通过迭代的自我反思学会自我克制
摘要: 为了安全部署,大型语言模型(LLMs)必须能够根据其对特定主题的知识水平和不确定性动态调整其行为。这种自适应行为,我们称之为自我克制,很难教授,因为它取决于LLM的内部知识。默认情况下,LLMs被训练为最大化下一个标记的可能性,这并不教导模型根据自身不确定性水平调节其答案。为了学习自我克制,我们设计了一个实用函数,可以鼓励模型仅在自信时产生响应。这个实用函数可以用于评分不同长度和弃权的生成。为了优化这个函数,我们引入了ReSearch,这是一个包括迭代自提示和自评价的“自我反思”过程。我们使用ReSearch算法生成合成数据,对其进行微调。与其原始版本相比,我们的模型在已知和未知主题下总体产生较少的“幻觉”,而且没有额外的推理成本,因为模型学会了选择性地克制自己。此外,我们的方法通过在搜索过程中通过增加表达弃权的答案来增加模型生成的样本,优雅地融合了弃权的能力。
更新时间: 2024-05-15 13:35:43
领域: cs.CL,cs.LG
Content-Based Image Retrieval for Multi-Class Volumetric Radiology Images: A Benchmark Study
While content-based image retrieval (CBIR) has been extensively studied in natural image retrieval, its application to medical images presents ongoing challenges, primarily due to the 3D nature of medical images. Recent studies have shown the potential use of pre-trained vision embeddings for CBIR in the context of radiology image retrieval. However, a benchmark for the retrieval of 3D volumetric medical images is still lacking, hindering the ability to objectively evaluate and compare the efficiency of proposed CBIR approaches in medical imaging. In this study, we extend previous work and establish a benchmark for region-based and multi-organ retrieval using the TotalSegmentator dataset (TS) with detailed multi-organ annotations. We benchmark embeddings derived from pre-trained supervised models on medical images against embeddings derived from pre-trained unsupervised models on non-medical images for 29 coarse and 104 detailed anatomical structures in volume and region levels. We adopt a late interaction re-ranking method inspired by text matching for image retrieval, comparing it against the original method proposed for volume and region retrieval achieving retrieval recall of 1.0 for diverse anatomical regions with a wide size range. The findings and methodologies presented in this paper provide essential insights and benchmarks for the development and evaluation of CBIR approaches in the context of medical imaging.
Updated: 2024-05-15 13:34:07
标题: 基于内容的多类体积放射学图像检索:基准研究
摘要: 尽管基于内容的图像检索(CBIR)在自然图像检索中得到了广泛研究,但其在医学图像中的应用仍然面临挑战,主要是由于医学图像的三维特性。最近的研究表明,在放射学图像检索的背景下,预训练视觉嵌入可以用于CBIR。然而,仍然缺乏用于三维体积医学图像检索的基准,这阻碍了对提出的CBIR方法在医学成像中的效率进行客观评估和比较。在本研究中,我们延伸了先前的工作,并利用具有详细多器官注释的TotalSegmentator数据集(TS)建立了基于区域和多器官检索的基准。我们对来自医学图像的预训练监督模型和来自非医学图像的预训练无监督模型派生的嵌入进行基准测试,涵盖了29个粗略和104个详细的解剖结构,包括体积和区域级别。我们采用了一种受文本匹配启发的迟交互重排方法,用于图像检索,将其与原始方法进行比较,实现了对各种解剖区域的检索召回率为1.0,覆盖了广泛的尺寸范围。本文介绍的研究结果和方法为在医学成像背景下开发和评估CBIR方法提供了重要的见解和基准。
更新时间: 2024-05-15 13:34:07
领域: cs.CV,cs.AI,cs.IR
Simplicity within biological complexity
Heterogeneous, interconnected, systems-level, molecular data have become increasingly available and key in precision medicine. We need to utilize them to better stratify patients into risk groups, discover new biomarkers and targets, repurpose known and discover new drugs to personalize medical treatment. Existing methodologies are limited and a paradigm shift is needed to achieve quantitative and qualitative breakthroughs. In this perspective paper, we survey the literature and argue for the development of a comprehensive, general framework for embedding of multi-scale molecular network data that would enable their explainable exploitation in precision medicine in linear time. Network embedding methods map nodes to points in low-dimensional space, so that proximity in the learned space reflects the network's topology-function relationships. They have recently achieved unprecedented performance on hard problems of utilizing few omic data in various biomedical applications. However, research thus far has been limited to special variants of the problems and data, with the performance depending on the underlying topology-function network biology hypotheses, the biomedical applications and evaluation metrics. The availability of multi-omic data, modern graph embedding paradigms and compute power call for a creation and training of efficient, explainable and controllable models, having no potentially dangerous, unexpected behaviour, that make a qualitative breakthrough. We propose to develop a general, comprehensive embedding framework for multi-omic network data, from models to efficient and scalable software implementation, and to apply it to biomedical informatics. It will lead to a paradigm shift in computational and biomedical understanding of data and diseases that will open up ways to solving some of the major bottlenecks in precision medicine and other domains.
Updated: 2024-05-15 13:32:45
标题: 生物复杂性中的简单之美
摘要: 异质性、互连性、系统级、分子数据在精准医学中变得越来越重要。我们需要利用这些数据来更好地将患者分层为风险群体,发现新的生物标志物和靶点,重新利用已知药物并发现新药物,以个性化医疗治疗。现有方法学有限,需要进行范式转变以实现定量和定性突破。在这篇观点论文中,我们调查了文献并主张发展一个全面、通用的框架,用于嵌入多尺度分子网络数据,从而使它们在精准医学中的可解释利用在线性时间内实现。网络嵌入方法将节点映射到低维空间中的点,以便学习空间中的邻近反映网络的拓扑-功能关系。它们最近在利用少量组学数据解决各种生物医学应用中的难题方面取得了前所未有的性能。然而,迄今为止的研究仅限于问题和数据的特殊变体,性能取决于底层拓扑-功能网络生物学假设、生物医学应用和评估指标。多组学数据的可用性、现代图嵌入范式和计算能力要求创建和训练高效、可解释和可控模型,没有潜在危险、意外行为,从而取得质的突破。我们建议开发一个通用、全面的多组学网络数据嵌入框架,从模型到高效且可扩展的软件实现,并将其应用于生物医学信息学。这将导致计算和生物医学对数据和疾病理解的范式转变,为解决精准医学和其他领域的一些主要瓶颈打开途径。
更新时间: 2024-05-15 13:32:45
领域: q-bio.OT,cs.AI,I.2; J.3
Learning Coarse-Grained Dynamics on Graph
We consider a Graph Neural Network (GNN) non-Markovian modeling framework to identify coarse-grained dynamical systems on graphs. Our main idea is to systematically determine the GNN architecture by inspecting how the leading term of the Mori-Zwanzig memory term depends on the coarse-grained interaction coefficients that encode the graph topology. Based on this analysis, we found that the appropriate GNN architecture that will account for $K$-hop dynamical interactions has to employ a Message Passing (MP) mechanism with at least $2K$ steps. We also deduce that the memory length required for an accurate closure model decreases as a function of the interaction strength under the assumption that the interaction strength exhibits a power law that decays as a function of the hop distance. Supporting numerical demonstrations on two examples, a heterogeneous Kuramoto oscillator model and a power system, suggest that the proposed GNN architecture can predict the coarse-grained dynamics under fixed and time-varying graph topologies.
Updated: 2024-05-15 13:25:34
标题: 在图上学习粗粒度动态特性
摘要: 我们考虑了一个图神经网络(GNN)非马尔可夫建模框架,用于识别在图上的粗粒度动力系统。我们的主要思想是通过检查Mori-Zwanzig记忆项的主导项如何取决于编码图拓扑的粗粒度相互作用系数,系统地确定GNN架构。根据这一分析,我们发现适当的GNN架构必须使用至少2K步的消息传递(MP)机制,以考虑K-跳动态相互作用。我们还推断出,在假设相互作用强度随着跳跃距离的衰减呈幂律函数的情况下,所需的记忆长度以精确封闭模型减少。在两个例子上进行的支持性数值演示,一个是异质Kuramoto振荡器模型,另一个是电力系统,表明提出的GNN架构可以预测在固定和时变图拓扑下的粗粒度动态。
更新时间: 2024-05-15 13:25:34
领域: math.NA,cond-mat.dis-nn,cs.LG,cs.NA
AI Art is Theft: Labour, Extraction, and Exploitation, Or, On the Dangers of Stochastic Pollocks
Since the launch of applications such as DALL-E, Midjourney, and Stable Diffusion, generative artificial intelligence has been controversial as a tool for creating artwork. While some have presented longtermist worries about these technologies as harbingers of fully automated futures to come, more pressing is the impact of generative AI on creative labour in the present. Already, business leaders have begun replacing human artistic labour with AI-generated images. In response, the artistic community has launched a protest movement, which argues that AI image generation is a kind of theft. This paper analyzes, substantiates, and critiques these arguments, concluding that AI image generators involve an unethical kind of labour theft. If correct, many other AI applications also rely upon theft.
Updated: 2024-05-15 13:22:52
标题: AI艺术是盗窃:劳动、提取和剥削,或者说,关于随机波洛克的危险
摘要: 自从DALL-E、Midjourney和Stable Diffusion等应用程序推出以来,生成式人工智能作为创作艺术品的工具一直备受争议。一些人对这些技术作为即将到来的完全自动化未来的前兆表示了长期的担忧,更加紧迫的是生成式人工智能对当前创意劳动的影响。已经有商业领袖开始用AI生成的图像取代人类艺术劳动。作为回应,艺术界发起了一场抗议运动,认为AI图像生成是一种盗窃行为。本文分析、证实和批评了这些论点,得出结论认为AI图像生成器涉及一种不道德的劳动盗窃。如果正确,许多其他AI应用程序也依赖于盗窃。
更新时间: 2024-05-15 13:22:52
领域: cs.CY,cs.AI,K.4; K.7.4; I.2
ReconBoost: Boosting Can Achieve Modality Reconcilement
This paper explores a novel multi-modal alternating learning paradigm pursuing a reconciliation between the exploitation of uni-modal features and the exploration of cross-modal interactions. This is motivated by the fact that current paradigms of multi-modal learning tend to explore multi-modal features simultaneously. The resulting gradient prohibits further exploitation of the features in the weak modality, leading to modality competition, where the dominant modality overpowers the learning process. To address this issue, we study the modality-alternating learning paradigm to achieve reconcilement. Specifically, we propose a new method called ReconBoost to update a fixed modality each time. Herein, the learning objective is dynamically adjusted with a reconcilement regularization against competition with the historical models. By choosing a KL-based reconcilement, we show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others and help enhance the overall performance. The major difference with the classic GB is that we only preserve the newest model for each modality to avoid overfitting caused by ensembling strong learners. Furthermore, we propose a memory consolidation scheme and a global rectification scheme to make this strategy more effective. Experiments over six multi-modal benchmarks speak to the efficacy of the method. We release the code at https://github.com/huacong/ReconBoost.
Updated: 2024-05-15 13:22:39
标题: ReconBoost:增强学习可以实现模态调和
摘要: 本文探讨了一种新颖的多模态交替学习范式,寻求在利用单模态特征和探索跨模态交互之间达成和解。这是由于当前的多模态学习范式往往同时探索多模态特征。由此产生的梯度阻止了对弱模态特征的进一步利用,导致模态竞争,其中主导模态压倒了学习过程。为了解决这个问题,我们研究了模态交替学习范式来实现和解。具体来说,我们提出了一种名为ReconBoost的新方法,每次更新一个固定模态。在这里,学习目标通过与历史模型的竞争调节来动态调整和解。通过选择基于KL的和解,我们展示了所提出的方法类似于Friedman的梯度提升(GB)算法,更新的学习者可以纠正其他人的错误并帮助增强整体性能。与经典的GB的主要区别在于,我们仅保留每个模态的最新模型,以避免由强大的学习者组合引起的过拟合。此外,我们提出了一种记忆巩固方案和全局校正方案,使这种策略更加有效。对六个多模态基准的实验证明了该方法的有效性。我们在https://github.com/huacong/ReconBoost 上发布了代码。
更新时间: 2024-05-15 13:22:39
领域: cs.CV,cs.AI,cs.LG,cs.MM
Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls
In the current cybersecurity landscape, protecting military devices such as communication and battlefield management systems against sophisticated cyber attacks is crucial. Malware exploits vulnerabilities through stealth methods, often evading traditional detection mechanisms such as software signatures. The application of ML/DL in vulnerability detection has been extensively explored in the literature. However, current ML/DL vulnerability detection methods struggle with understanding the context and intent behind complex attacks. Integrating large language models (LLMs) with system call analysis offers a promising approach to enhance malware detection. This work presents a novel framework leveraging LLMs to classify malware based on system call data. The framework uses transfer learning to adapt pre-trained LLMs for malware detection. By retraining LLMs on a dataset of benign and malicious system calls, the models are refined to detect signs of malware activity. Experiments with a dataset of over 1TB of system calls demonstrate that models with larger context sizes, such as BigBird and Longformer, achieve superior accuracy and F1-Score of approximately 0.86. The results highlight the importance of context size in improving detection rates and underscore the trade-offs between computational complexity and performance. This approach shows significant potential for real-time detection in high-stakes environments, offering a robust solution to evolving cyber threats.
Updated: 2024-05-15 13:19:43
标题: 基于系统调用的预训练大型语言模型在恶意软件检测中的迁移学习
摘要: 在当前的网络安全格局中,保护军事设备(如通信和战场管理系统)免受复杂网络攻击的影响至关重要。恶意软件通过隐蔽方法利用漏洞,通常能够逃避传统的软件签名等检测机制。文献中已广泛探讨了在漏洞检测中应用机器学习/深度学习的方法。然而,目前的机器学习/深度学习漏洞检测方法在理解复杂攻击背后的上下文和意图方面存在困难。将大型语言模型(LLMs)与系统调用分析相结合,为增强恶意软件检测提供了一种有前途的方法。本文提出了一个利用LLMs基于系统调用数据对恶意软件进行分类的新框架。该框架利用迁移学习来调整预训练的LLMs以进行恶意软件检测。通过在一个包含良性和恶意系统调用的数据集上重新训练LLMs,模型被精炼以检测恶意软件活动的迹象。对一个超过1TB系统调用的数据集进行的实验表明,具有更大上下文大小的模型,如BigBird和Longformer,实现了卓越的准确性和约0.86的F1分数。结果突出了上下文大小在提高检测率方面的重要性,并强调了计算复杂性和性能之间的权衡。这种方法在高风险环境中实时检测中显示出显著潜力,为不断演变的网络威胁提供了一个强大的解决方案。
更新时间: 2024-05-15 13:19:43
领域: cs.CR,cs.LG
Online Self-Supervised Deep Learning for Intrusion Detection Systems
This paper proposes a novel Self-Supervised Intrusion Detection (SSID) framework, which enables a fully online Deep Learning (DL) based Intrusion Detection System (IDS) that requires no human intervention or prior off-line learning. The proposed framework analyzes and labels incoming traffic packets based only on the decisions of the IDS itself using an Auto-Associative Deep Random Neural Network, and on an online estimate of its statistically measured trustworthiness. The SSID framework enables IDS to adapt rapidly to time-varying characteristics of the network traffic, and eliminates the need for offline data collection. This approach avoids human errors in data labeling, and human labor and computational costs of model training and data collection. The approach is experimentally evaluated on public datasets and compared with well-known {machine learning and deep learning} models, showing that this SSID framework is very useful and advantageous as an accurate and online learning DL-based IDS for IoT systems.
Updated: 2024-05-15 13:15:02
标题: 在线无监督深度学习用于入侵检测系统
摘要: 本文提出了一种新颖的自监督入侵检测(SSID)框架,该框架实现了一种完全在线的基于深度学习(DL)的入侵检测系统(IDS),无需人工干预或先前的离线学习。所提出的框架仅基于IDS本身的决策使用自动关联深度随机神经网络对传入的流量数据包进行分析和标记,并基于其统计测量的可信度的在线估计。SSID框架使IDS能够迅速适应网络流量的时变特性,并消除了离线数据收集的需要。该方法避免了数据标记中的人为错误,以及模型训练和数据收集的人力和计算成本。该方法在公共数据集上进行了实验评估,并与众所周知的机器学习和深度学习模型进行了比较,表明这种SSID框架作为一种准确和在线学习的DL-based IDS对物联网系统非常有用和有利。
更新时间: 2024-05-15 13:15:02
领域: cs.CR,cs.LG,cs.NI
Physics-informed generative neural networks for RF propagation prediction with application to indoor body perception
Electromagnetic (EM) body models designed to predict Radio-Frequency (RF) propagation are time-consuming methods which prevent their adoption in strict real-time computational imaging problems, such as human body localization and sensing. Physics-informed Generative Neural Network (GNN) models have been recently proposed to reproduce EM effects, namely to simulate or reconstruct missing data or samples by incorporating relevant EM principles and constraints. The paper discusses a Variational Auto-Encoder (VAE) model which is trained to reproduce the effects of human motions on the EM field and incorporate EM body diffraction principles. Proposed physics-informed generative neural network models are verified against both classical diffraction-based EM tools and full-wave EM body simulations.
Updated: 2024-05-15 13:11:52
标题: 物理信息生成神经网络用于射频传播预测,应用于室内人体感知
摘要: 设计用于预测射频传播的电磁(EM)人体模型是耗时的方法,这阻碍了它们在严格实时计算成像问题中的采用,比如人体定位和传感。最近提出了基于物理信息的生成神经网络(GNN)模型,用于复制EM效应,即通过结合相关的EM原则和约束来模拟或重建缺失的数据或样本。本文讨论了一个变分自编码器(VAE)模型,该模型经过训练,能够复制人体运动对EM场的影响,并结合EM人体衍射原则。提出的基于物理信息的生成神经网络模型经过验证,与传统的基于衍射的EM工具和全波EM人体模拟进行了比较。
更新时间: 2024-05-15 13:11:52
领域: eess.SP,cs.AI,cs.SY,eess.SY
Extracting the gamma-ray source-count distribution below the Fermi-LAT detection limit with deep learning
We reconstruct the extra-galactic gamma-ray source-count distribution, or $dN/dS$, of resolved and unresolved sources by adopting machine learning techniques. Specifically, we train a convolutional neural network on synthetic 2-dimensional sky-maps, which are built by varying parameters of underlying source-counts models and incorporate the Fermi-LAT instrumental response functions. The trained neural network is then applied to the Fermi-LAT data, from which we estimate the source count distribution down to flux levels a factor of 50 below the Fermi-LAT threshold. We perform our analysis using 14 years of data collected in the $(1,10)$ GeV energy range. The results we obtain show a source count distribution which, in the resolved regime, is in excellent agreement with the one derived from catalogued sources, and then extends as $dN/dS \sim S^{-2}$ in the unresolved regime, down to fluxes of $5 \cdot 10^{-12}$ cm$^{-2}$ s$^{-1}$. The neural network architecture and the devised methodology have the flexibility to enable future analyses to study the energy dependence of the source-count distribution.
Updated: 2024-05-15 13:11:38
标题: 用深度学习在Fermi-LAT探测限以下提取伽玛射线源数量分布
摘要: 我们通过采用机器学习技术重建了外星系伽马射线源计数分布,即$dN/dS$,包括已解析和未解析的源。具体而言,我们在合成的二维天空图上训练了一个卷积神经网络,这些图是通过改变底层源计数模型的参数并结合Fermi-LAT仪器响应函数而构建的。然后将训练后的神经网络应用于Fermi-LAT数据,从中我们估计源计数分布可降至Fermi-LAT门槛以下50倍的通量水平。我们使用在$(1,10)$ GeV能量范围内收集的14年数据进行分析。我们得到的结果显示,在已解析的区域,源计数分布与来自编目源的分布非常吻合,然后在未解析的区域,下降到$5 \cdot 10^{-12}$ cm$^{-2}$ s$^{-1}$的通量水平时,$dN/dS \sim S^{-2}$。神经网络架构和设计的方法具有灵活性,可以使未来的分析能够研究源计数分布的能量依赖性。
更新时间: 2024-05-15 13:11:38
领域: astro-ph.CO,astro-ph.HE,astro-ph.IM,cs.LG
Agnostic Active Learning of Single Index Models with Linear Sample Complexity
We study active learning methods for single index models of the form $F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$, where $f:\mathbb{R} \to \mathbb{R}$ and ${\mathbf x,\mathbf w} \in \mathbb{R}^d$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $f$ is known and Lipschitz, we show that $\tilde{O}(d)$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent ${O}(d^{2})$ bound of \cite{gajjar2023active}. Second, we show that $\tilde{O}(d)$ samples suffice even in the more difficult setting when $f$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions.
Updated: 2024-05-15 13:11:28
标题: 具有线性样本复杂度的单指数模型的不可知主动学习
摘要: 我们研究形式为$F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$的单指数模型的主动学习方法,其中$f:\mathbb{R} \to \mathbb{R}$且${\mathbf x,\mathbf w} \in \mathbb{R}^d$。除了作为非线性神经网络简单示例而具有理论意义外,单指数模型由于在科学机器学习中的应用(如偏微分方程的代理建模)而受到近期重视。这些应用需要对抗性噪声下具有样本效率的主动学习方法。也就是说,这些方法可以在具有挑战性的对手学习设置中运作。 我们提供了关于单指数模型对手主动学习的两个主要结果。首先,当$f$已知且为Lipschitz函数时,我们表明通过{统计杠杆分数采样}收集的$\tilde{O}(d)$个样本足以学习接近最优的单指数模型。杠杆分数采样易于实现、高效,并已广泛用于主动学习线性模型。我们的结果不对数据分布做任何假设,直到对数因子是最优的,并在\cite{gajjar2023active}的最近一个${O}(d^{2})$边界上呈二次改进。其次,我们展示了即使在更困难的情况下,即$f$是\emph{未知}时,$\tilde{O}(d)$个样本也足够。我们的结果利用了高维概率的工具,包括达德利不等式和对偶Sudakov最小化,以及对Lipschitz函数类的新颖、分布感知的离散化。
更新时间: 2024-05-15 13:11:28
领域: cs.LG
TimeX++: Learning Time-Series Explanations with Information Bottleneck
Explaining deep learning models operating on time series data is crucial in various applications of interest which require interpretable and transparent insights from time series signals. In this work, we investigate this problem from an information theoretic perspective and show that most existing measures of explainability may suffer from trivial solutions and distributional shift issues. To address these issues, we introduce a simple yet practical objective function for time series explainable learning. The design of the objective function builds upon the principle of information bottleneck (IB), and modifies the IB objective function to avoid trivial solutions and distributional shift issues. We further present TimeX++, a novel explanation framework that leverages a parametric network to produce explanation-embedded instances that are both in-distributed and label-preserving. We evaluate TimeX++ on both synthetic and real-world datasets comparing its performance against leading baselines, and validate its practical efficacy through case studies in a real-world environmental application. Quantitative and qualitative evaluations show that TimeX++ outperforms baselines across all datasets, demonstrating a substantial improvement in explanation quality for time series data. The source code is available at \url{https://github.com/zichuan-liu/TimeXplusplus}.
Updated: 2024-05-15 13:03:41
标题: TimeX++:利用信息瓶颈学习时间序列解释
摘要: 解释深度学习模型在时间序列数据上的运作对于需要从时间序列信号中获得可解释和透明见解的各种应用至关重要。在这项工作中,我们从信息论的角度研究了这个问题,并展示了大多数现有的可解释性度量可能会受到平凡解决方案和分布转移问题的困扰。为了解决这些问题,我们引入了一个简单但实用的时间序列可解释学习目标函数。这个目标函数的设计基于信息瓶颈(IB)原则,并修改了IB目标函数以避免平凡解决方案和分布转移问题。我们进一步提出了TimeX++,一个新颖的解释框架,利用参数化网络生成嵌入解释实例,这些实例既在分布内又保持标签。我们在合成和真实世界数据集上评估了TimeX++的性能,与主要基线进行了比较,并通过一个真实世界环境应用的案例研究验证了其实用效能。定量和定性评估表明,TimeX++在所有数据集上优于基线,为时间序列数据的解释质量带来了显著提高。源代码可在\url{https://github.com/zichuan-liu/TimeXplusplus}上找到。
更新时间: 2024-05-15 13:03:41
领域: cs.LG,cs.AI
Words Blending Boxes. Obfuscating Queries in Information Retrieval using Differential Privacy
Ensuring the effectiveness of search queries while protecting user privacy remains an open issue. When an Information Retrieval System (IRS) does not protect the privacy of its users, sensitive information may be disclosed through the queries sent to the system. Recent improvements, especially in NLP, have shown the potential of using Differential Privacy to obfuscate texts while maintaining satisfactory effectiveness. However, such approaches may protect the user's privacy only from a theoretical perspective while, in practice, the real user's information need can still be inferred if perturbed terms are too semantically similar to the original ones. We overcome such limitations by proposing Word Blending Boxes, a novel differentially private mechanism for query obfuscation, which protects the words in the user queries by employing safe boxes. To measure the overall effectiveness of the proposed WBB mechanism, we measure the privacy obtained by the obfuscation process, i.e., the lexical and semantic similarity between original and obfuscated queries. Moreover, we assess the effectiveness of the privatized queries in retrieving relevant documents from the IRS. Our findings indicate that WBB can be integrated effectively into existing IRSs, offering a key to the challenge of protecting user privacy from both a theoretical and a practical point of view.
Updated: 2024-05-15 12:51:36
标题: 单词混合框:使用差分隐私在信息检索中混淆查询
摘要: 确保搜索查询的有效性同时保护用户隐私仍然是一个悬而未决的问题。当信息检索系统(IRS)不保护其用户的隐私时,敏感信息可能通过发送给系统的查询泄露出去。最近的改进,特别是在自然语言处理方面,已经显示出使用差分隐私来混淆文本并保持令人满意的效果的潜力。然而,这种方法可能仅从理论角度保护用户的隐私,而在实践中,如果扰动的术语与原始术语太语义相似,则仍然可以推断出真实用户的信息需求。我们通过提出Word Blending Boxes来克服这些限制,这是一种新颖的差分私密机制,用于查询混淆,通过使用安全盒来保护用户查询中的单词。为了衡量所提出的WBB机制的整体有效性,我们衡量通过混淆过程获得的隐私,即原始查询和混淆查询之间的词汇和语义相似性。此外,我们评估了从IRS中检索相关文档时,私密查询的有效性。我们的研究结果表明,WBB可以有效地整合到现有的IRS中,从理论和实践的角度来看,为保护用户隐私提供了一个关键。
更新时间: 2024-05-15 12:51:36
领域: cs.IR,cs.CR
Gradient Boosted Filters For Signal Processing
Gradient boosted decision trees have achieved remarkable success in several domains, particularly those that work with static tabular data. However, the application of gradient boosted models to signal processing is underexplored. In this work, we introduce gradient boosted filters for dynamic data, by employing Hammerstein systems in place of decision trees. We discuss the relationship of our approach to the Volterra series, providing the theoretical underpinning for its application. We demonstrate the effective generalizability of our approach with examples.
Updated: 2024-05-15 12:49:57
标题: 梯度提升滤波器用于信号处理
摘要: 梯度提升决策树在多个领域取得了显著成功,特别是在处理静态表格数据的领域。然而,将梯度提升模型应用于信号处理领域的研究尚未深入探讨。在本文中,我们介绍了梯度提升滤波器用于动态数据的方法,通过使用哈默斯坦系统代替决策树。我们讨论了我们方法与Volterra级数的关系,为其应用提供了理论基础。我们通过示例展示了我们方法的有效泛化能力。
更新时间: 2024-05-15 12:49:57
领域: cs.LG
Comparing the Efficacy of GPT-4 and Chat-GPT in Mental Health Care: A Blind Assessment of Large Language Models for Psychological Support
Background: Rapid advancements in natural language processing have led to the development of large language models with the potential to revolutionize mental health care. These models have shown promise in assisting clinicians and providing support to individuals experiencing various psychological challenges. Objective: This study aims to compare the performance of two large language models, GPT-4 and Chat-GPT, in responding to a set of 18 psychological prompts, to assess their potential applicability in mental health care settings. Methods: A blind methodology was employed, with a clinical psychologist evaluating the models' responses without knowledge of their origins. The prompts encompassed a diverse range of mental health topics, including depression, anxiety, and trauma, to ensure a comprehensive assessment. Results: The results demonstrated a significant difference in performance between the two models (p > 0.05). GPT-4 achieved an average rating of 8.29 out of 10, while Chat-GPT received an average rating of 6.52. The clinical psychologist's evaluation suggested that GPT-4 was more effective at generating clinically relevant and empathetic responses, thereby providing better support and guidance to potential users. Conclusions: This study contributes to the growing body of literature on the applicability of large language models in mental health care settings. The findings underscore the importance of continued research and development in the field to optimize these models for clinical use. Further investigation is necessary to understand the specific factors underlying the performance differences between the two models and to explore their generalizability across various populations and mental health conditions.
Updated: 2024-05-15 12:44:54
标题: 比较GPT-4和Chat-GPT在心理健康护理中的疗效:对大型语言模型进行盲评估,用于心理支持
摘要: 背景:自然语言处理的快速发展导致了大型语言模型的发展,这些模型有潜力彻底改变心理健康护理。这些模型已经显示出在帮助临床医生和提供支持给遇到各种心理挑战的个人方面很有前景。 目的:本研究旨在比较两个大型语言模型GPT-4和Chat-GPT在回答一组18个心理提示时的表现,以评估它们在心理健康护理设置中的潜在适用性。 方法:采用盲方法,一个临床心理学家在不知道它们来源的情况下评估模型的回应。这些提示涵盖了各种心理健康主题,包括抑郁、焦虑和创伤,以确保全面评估。 结果:结果显示了两个模型之间表现的显著差异(p > 0.05)。GPT-4的平均评分为8.29,而Chat-GPT的平均评分为6.52。临床心理学家的评估表明,GPT-4更有效地生成临床相关和富有同情心的回应,从而为潜在用户提供更好的支持和指导。 结论:本研究为大型语言模型在心理健康护理设置中的适用性增加了一份文献。研究结果强调了在该领域继续研究和发展的重要性,以优化这些模型的临床使用。进一步的调查是必要的,以了解两个模型之间性能差异背后的具体因素,并探讨它们在各种人群和心理健康条件中的泛化能力。
更新时间: 2024-05-15 12:44:54
领域: cs.CL,cs.AI,cs.HC
Wisdom of Committee: Distilling from Foundation Model to Specialized Application Model
Recent advancements in foundation models have yielded impressive performance across a wide range of tasks. Meanwhile, for specific applications, practitioners have been developing specialized application models. To enjoy the benefits of both kinds of models, one natural path is to transfer the knowledge in foundation models into specialized application models, which are generally more efficient for serving. Techniques from knowledge distillation may be applied here, where the application model learns to mimic the foundation model. However, specialized application models and foundation models have substantial gaps in capacity, employing distinct architectures, using different input features from different modalities, and being optimized on different distributions. These differences in model characteristics lead to significant challenges for distillation methods. In this work, we propose creating a teaching committee comprising both foundation model teachers and complementary teachers. Complementary teachers possess model characteristics akin to the student's, aiming to bridge the gap between the foundation model and specialized application models for a smoother knowledge transfer. Further, to accommodate the dissimilarity among the teachers in the committee, we introduce DiverseDistill, which allows the student to understand the expertise of each teacher and extract task knowledge. Our evaluations demonstrate that adding complementary teachers enhances student performance. Finally, DiverseDistill consistently outperforms baseline distillation methods, regardless of the teacher choices, resulting in significantly improved student performance.
Updated: 2024-05-15 12:42:04
标题: 委员会的智慧:从基础模型提炼到专业应用模型
摘要: 最近基础模型的进展在各种任务中取得了令人印象深刻的表现。同时,针对特定应用,从业者一直在开发专门的应用模型。为了享受这两种模型的好处,一个自然的路径是将基础模型中的知识转移到专门的应用模型中,后者通常更有效。知识蒸馏技术可以应用在这里,应用模型学习模仿基础模型。然而,专门的应用模型和基础模型在容量上存在显著差距,采用不同的架构,使用来自不同模态的不同输入特征,并在不同的分布上进行优化。这些模型特征上的差异导致了蒸馏方法面临重大挑战。在这项工作中,我们提出创建一个教学委员会,包括基础模型教师和互补教师。互补教师具有与学生相似的模型特征,旨在弥合基础模型和专门应用模型之间的差距,以便更顺畅地进行知识转移。此外,为了适应委员会中教师之间的差异,我们引入了DiverseDistill,该方法允许学生了解每位教师的专业知识并提取任务知识。我们的评估表明,添加互补教师可以提升学生的表现。最后,DiverseDistill在基线蒸馏方法的选择下始终表现优越,导致学生表现显著提高。
更新时间: 2024-05-15 12:42:04
领域: cs.LG,cs.AI
IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
Although the Retrieval-Augmented Generation (RAG) paradigms can use external knowledge to enhance and ground the outputs of Large Language Models (LLMs) to mitigate generative hallucinations and static knowledge base problems, they still suffer from limited flexibility in adopting Information Retrieval (IR) systems with varying capabilities, constrained interpretability during the multi-round retrieval process, and a lack of end-to-end optimization. To address these challenges, we propose a novel LLM-centric approach, IM-RAG, that integrates IR systems with LLMs to support multi-round RAG through learning Inner Monologues (IM, i.e., the human inner voice that narrates one's thoughts). During the IM process, the LLM serves as the core reasoning model (i.e., Reasoner) to either propose queries to collect more information via the Retriever or to provide a final answer based on the conversational context. We also introduce a Refiner that improves the outputs from the Retriever, effectively bridging the gap between the Reasoner and IR modules with varying capabilities and fostering multi-round communications. The entire IM process is optimized via Reinforcement Learning (RL) where a Progress Tracker is incorporated to provide mid-step rewards, and the answer prediction is further separately optimized via Supervised Fine-Tuning (SFT). We conduct extensive experiments with the HotPotQA dataset, a popular benchmark for retrieval-based, multi-step question-answering. The results show that our approach achieves state-of-the-art (SOTA) performance while providing high flexibility in integrating IR modules as well as strong interpretability exhibited in the learned inner monologues.
Updated: 2024-05-15 12:41:20
标题: IM-RAG:通过学习内心独白进行多轮检索增强生成
摘要: 尽管检索增强生成(RAG)范式可以利用外部知识增强和基于大型语言模型(LLMs)的输出,以减轻生成幻觉和静态知识库问题,但它们仍然受限于采用具有不同能力的信息检索(IR)系统、在多轮检索过程中受限的可解释性以及缺乏端到端优化。为了解决这些挑战,我们提出了一种新颖的以LLM为中心的方法,IM-RAG,它将IR系统与LLMs集成在一起,通过学习内心独白(IM,即人类内心的声音,叙述一个人的思维)来支持多轮RAG。在IM过程中,LLM充当核心推理模型(即Reasoner),可以通过检索器提出查询以收集更多信息,或者根据对话上下文提供最终答案。我们还引入了一个Refiner,它可以改进检索器的输出,有效地弥合了具有不同能力的Reasoner和IR模块之间的差距,并促进了多轮交流。整个IM过程通过强化学习(RL)进行优化,其中包括一个进度跟踪器用于提供中间奖励,答案预测进一步通过监督微调(SFT)进行优化。我们在HotPotQA数据集上进行了大量实验,这是一个流行的基于检索的、多步问题回答基准。结果显示我们的方法实现了最新技术(SOTA)性能,同时在学习的内心独白中展现了高度的灵活性,展现了强大的可解释性。
更新时间: 2024-05-15 12:41:20
领域: cs.CL,cs.AI,cs.IR
Tight Bounds for Online Convex Optimization with Adversarial Constraints
A well-studied generalization of the standard online convex optimization (OCO) is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after the action for that round is chosen. The objective is to design an online policy that simultaneously achieves a small regret while ensuring small cumulative constraint violation (CCV) against an adaptive adversary. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. We establish this result by effectively combining the adaptive regret bound of the AdaGrad algorithm with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.
Updated: 2024-05-15 12:37:03
标题: 在线凸优化与对抗性约束的紧密界限
摘要: 一个广泛研究的标准在线凸优化(OCO)的推广是受约束的在线凸优化(COCO)。 在COCO中,每一轮后,在选择该轮行动后,向学习者透露一个凸成本函数和一个凸约束函数。 目标是设计一个在线策略,同时实现小后悔和确保对自适应对手的小累积约束违反(CCV)。 COCO中一个长期存在的问题是,一个在线策略是否可以同时实现$O(\sqrt{T})$ 后悔和$O(\sqrt{T})$ CCV,而不需要任何限制性假设。 我们首次肯定地回答了这个问题,并展示了一个在线策略可以同时实现$O(\sqrt{T})$ 后悔和$\tilde{O}(\sqrt{T})$ CCV。 我们通过有效地将AdaGrad算法的自适应后悔界限与Lyapunov优化(控制理论中的一个经典工具)相结合来建立这一结果。 令人惊讶的是,分析简洁而优雅。
更新时间: 2024-05-15 12:37:03
领域: cs.LG,math.OC
Do language models capture implied discourse meanings? An investigation with exhaustivity implicatures of Korean morphology
Markedness in natural language is often associated with non-literal meanings in discourse. Differential Object Marking (DOM) in Korean is one instance of this phenomenon, where post-positional markers are selected based on both the semantic features of the noun phrases and the discourse features that are orthogonal to the semantic features. Previous work has shown that distributional models of language recover certain semantic features of words -- do these models capture implied discourse-level meanings as well? We evaluate whether a set of large language models are capable of associating discourse meanings with different object markings in Korean. Results suggest that discourse meanings of a grammatical marker can be more challenging to encode than that of a discourse marker.
Updated: 2024-05-15 12:34:40
标题: 语言模型是否能捕捉暗示的语篇意义?以韩语形态的穷尽含义为例进行调查
摘要: 自然语言中的显著性通常与话语中的非字面意义联系在一起。韩语中的差异性对象标记(DOM)是这种现象的一个例子,其中后置标记是基于名词短语的语义特征和与语义特征正交的话语特征而选择的。先前的研究表明,语言的分布模型可以恢复单词的某些语义特征 - 这些模型是否也能捕捉暗示的话语级别含义呢?我们评估了一组大型语言模型是否能够将话语含义与韩语中不同的对象标记相关联。结果表明,语法标记的话语含义可能比话语标记的话语含义更具挑战性。
更新时间: 2024-05-15 12:34:40
领域: cs.CL,cs.AI
Attribute reduction algorithm of rough sets based on spatial optimization
Rough set is one of the important methods for rule acquisition and attribute reduction. The current goal of rough set attribute reduction focuses more on minimizing the number of reduced attributes, but ignores the spatial similarity between reduced and decision attributes, which may lead to problems such as increased number of rules and limited generality. In this paper, a rough set attribute reduction algorithm based on spatial optimization is proposed. By introducing the concept of spatial similarity, to find the reduction with the highest spatial similarity, so that the spatial similarity between reduction and decision attributes is higher, and more concise and widespread rules are obtained. In addition, a comparative experiment with the traditional rough set attribute reduction algorithms is designed to prove the effectiveness of the rough set attribute reduction algorithm based on spatial optimization, which has made significant improvements on many datasets.
Updated: 2024-05-15 12:30:19
标题: 基于空间优化的粗糙集属性约简算法
摘要: Rough set是规则获取和属性减少的重要方法之一。目前,Rough set属性减少的目标更多地集中在减少属性数量上,但忽略了减少属性和决策属性之间的空间相似性,这可能导致问题,如规则数量增加和泛化能力有限。本文提出了一种基于空间优化的Rough set属性减少算法。通过引入空间相似性的概念,找到具有最高空间相似性的减少属性,从而使减少属性和决策属性之间的空间相似性更高,并获得更简洁和广泛的规则。此外,设计了与传统Rough set属性减少算法的比较实验,以证明基于空间优化的Rough set属性减少算法的有效性,在许多数据集上取得了显著改进。
更新时间: 2024-05-15 12:30:19
领域: cs.AI,I.2.4
Sensitivity Decouple Learning for Image Compression Artifacts Reduction
With the benefit of deep learning techniques, recent researches have made significant progress in image compression artifacts reduction. Despite their improved performances, prevailing methods only focus on learning a mapping from the compressed image to the original one but ignore the intrinsic attributes of the given compressed images, which greatly harms the performance of downstream parsing tasks. Different from these methods, we propose to decouple the intrinsic attributes into two complementary features for artifacts reduction,ie, the compression-insensitive features to regularize the high-level semantic representations during training and the compression-sensitive features to be aware of the compression degree. To achieve this, we first employ adversarial training to regularize the compressed and original encoded features for retaining high-level semantics, and we then develop the compression quality-aware feature encoder for compression-sensitive features. Based on these dual complementary features, we propose a Dual Awareness Guidance Network (DAGN) to utilize these awareness features as transformation guidance during the decoding phase. In our proposed DAGN, we develop a cross-feature fusion module to maintain the consistency of compression-insensitive features by fusing compression-insensitive features into the artifacts reduction baseline. Our method achieves an average 2.06 dB PSNR gains on BSD500, outperforming state-of-the-art methods, and only requires 29.7 ms to process one image on BSD500. Besides, the experimental results on LIVE1 and LIU4K also demonstrate the efficiency, effectiveness, and superiority of the proposed method in terms of quantitative metrics, visual quality, and downstream machine vision tasks.
Updated: 2024-05-15 12:29:35
标题: 敏感度解耦学习用于图像压缩伪影减少
摘要: 借助深度学习技术,最近的研究在图像压缩伪影减少方面取得了显著进展。尽管它们的性能得到了提升,但目前的方法仅关注学习从压缩图像到原始图像的映射,却忽视了给定压缩图像的固有属性,这极大地损害了下游解析任务的性能。与这些方法不同,我们提出将固有属性分离为两个互补特征以进行伪影减少,即,压缩不敏感特征用于在训练期间规范高级语义表示,压缩敏感特征用于意识到压缩程度。为了实现这一点,我们首先采用对抗训练来规范压缩和原始编码特征以保留高级语义,然后开发了压缩质量感知特征编码器用于压缩敏感特征。基于这两个双重互补特征,我们提出了一种双感知引导网络(DAGN),在解码阶段利用这些感知特征作为变换引导。在我们提出的DAGN中,我们开发了一个交叉特征融合模块,通过将压缩不敏感特征融合到伪影减少基线中来维持一致性。我们的方法在BSD500上实现了平均2.06 dB的PSNR增益,优于最先进的方法,并仅需要29.7毫秒来处理一幅图像。此外,LIVE1和LIU4K上的实验结果还展示了所提出方法在定量指标、视觉质量和下游机器视觉任务方面的效率、有效性和优越性。
更新时间: 2024-05-15 12:29:35
领域: cs.CV,cs.AI,eess.IV
Learning Generalized Medical Image Representations through Image-Graph Contrastive Pretraining
Medical image interpretation using deep learning has shown promise but often requires extensive expert-annotated datasets. To reduce this annotation burden, we develop an Image-Graph Contrastive Learning framework that pairs chest X-rays with structured report knowledge graphs automatically extracted from radiology notes. Our approach uniquely encodes the disconnected graph components via a relational graph convolution network and transformer attention. In experiments on the CheXpert dataset, this novel graph encoding strategy enabled the framework to outperform existing methods that use image-text contrastive learning in 1% linear evaluation and few-shot settings, while achieving comparable performance to radiologists. By exploiting unlabeled paired images and text, our framework demonstrates the potential of structured clinical insights to enhance contrastive learning for medical images. This work points toward reducing demands on medical experts for annotations, improving diagnostic precision, and advancing patient care through robust medical image understanding.
Updated: 2024-05-15 12:27:38
标题: 通过图像-图像对比预训练学习泛化医学图像表示
摘要: 使用深度学习进行医学图像解释已经显示出了潜力,但通常需要大量专家注释的数据集。为了减少这种注释负担,我们开发了一种图像-图对比学习框架,该框架将胸部X射线与从放射学报告中自动提取的结构化知识图进行配对。我们的方法通过关系图卷积网络和变压器注意力来唯一地对断开的图组件进行编码。在CheXpert数据集的实验中,这种新颖的图编码策略使得该框架在1%线性评估和少样本设置中优于使用图像-文本对比学习的现有方法,同时实现了与放射科医生相当的性能。通过利用未标记的成对图像和文本,我们的框架展示了结构化临床见解提升医学图像对比学习的潜力。这项工作指向减少医学专家对注释的需求,提高诊断精度,并通过强大的医学图像理解推进患者护理。
更新时间: 2024-05-15 12:27:38
领域: eess.IV,cs.CV,cs.LG
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including 2 languages, over 1M audio samples, and various test conditions, focus on ALM-based audio detection. As countermeasure, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose the CSAM strategy to learn a domain balanced and generalized minima. In our experiments, we first demonstrate that ADD model training with the Codecfake dataset can effectively detects ALM-based audio. Furthermore, our proposed generalization countermeasure yields the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models. The dataset and associated code are available online.
Updated: 2024-05-15 12:24:52
标题: 《Codecfake数据集及用于深度伪造音频的普遍检测对策》
摘要: 随着基于音频语言模型(ALM)的深度伪造音频的普及,迫切需要通用的检测方法。目前,基于ALM的深度伪造音频表现出广泛的高度欺骗性和类型多样性,对仅在编码数据上训练的当前音频深度伪造检测(ADD)模型构成重大挑战。为了有效检测基于ALM的深度伪造音频,我们关注ALM基础音频生成方法的机制,即从神经编解码器转换为波形。我们首先构建了Codecfake数据集,这是一个开源的大规模数据集,包括2种语言、超过100万个音频样本和各种测试条件,专注于基于ALM的音频检测。作为对策,为了实现深度伪造音频的通用检测并解决原始SAM的领域上升偏差问题,我们提出了CSAM策略,学习一个领域平衡和通用的最小值。在我们的实验中,我们首先证明了使用Codecfake数据集进行ADD模型训练可以有效检测基于ALM的音频。此外,我们提出的通用化对策相对于基准模型在所有测试条件下的平均等误差率(EER)为0.616%,表现最佳。该数据集和相关代码可在线获取。
更新时间: 2024-05-15 12:24:52
领域: cs.SD,cs.AI,eess.AS
SQL-to-Schema Enhances Schema Linking in Text-to-SQL
In sophisticated existing Text-to-SQL methods exhibit errors in various proportions, including schema-linking errors (incorrect columns, tables, or extra columns), join errors, nested errors, and group-by errors. Consequently, there is a critical need to filter out unnecessary tables and columns, directing the language models attention to relevant tables and columns with schema-linking, to reduce errors during SQL generation. Previous approaches have involved sorting tables and columns based on their relevance to the question, selecting the top-ranked ones for sorting, or directly identifying the necessary tables and columns for SQL generation. However, these methods face challenges such as lengthy model training times, high consumption of expensive GPT-4 tokens in few-shot prompts, or suboptimal performance in schema linking. Therefore, we propose an inventive schema linking method in two steps: Firstly, generate an initial SQL query by utilizing the complete database schema. Subsequently, extract tables and columns from the initial SQL query to create a concise schema. Using CodeLlama-34B, when comparing the schemas obtained by mainstream methods with ours for SQL generation, our schema performs optimally. Leveraging GPT4, our SQL generation method achieved results that are comparable to mainstream Text-to-SQL methods on the Spider dataset.
Updated: 2024-05-15 12:22:48
标题: SQL-to-Schema在文本到SQL中增强了模式链接
摘要: 在现有的高级文本到SQL方法中,存在各种比例的错误,包括模式链接错误(不正确的列、表或额外列)、连接错误、嵌套错误和分组错误。因此,有必要过滤掉不必要的表和列,引导语言模型关注与模式链接相关的表和列,以减少在生成SQL过程中的错误。先前的方法涉及根据问题的相关性对表和列进行排序,选择排名前几位的表和列进行排序,或者直接识别生成SQL所需的表和列。然而,这些方法面临挑战,如模型训练时间过长、在少量样本提示中消耗昂贵的GPT-4令牌或在模式链接中表现不佳。因此,我们提出了一个创新的模式链接方法,分为两个步骤:首先,利用完整的数据库模式生成初始SQL查询。随后,从初始SQL查询中提取表和列,创建简洁的模式。在使用CodeLlama-34B时,将我们的模式与主流方法获得的模式进行比较,我们的模式表现最佳。利用GPT4,我们的SQL生成方法在Spider数据集上取得了与主流文本到SQL方法可比的结果。
更新时间: 2024-05-15 12:22:48
领域: cs.DB,cs.AI
Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning
Operator learning for Partial Differential Equations (PDEs) is rapidly emerging as a promising approach for surrogate modeling of intricate systems. Transformers with the self-attention mechanism$\unicode{x2013}$a powerful tool originally designed for natural language processing$\unicode{x2013}$have recently been adapted for operator learning. However, they confront challenges, including high computational demands and limited interpretability. This raises a critical question: Is there a more efficient attention mechanism for Transformer-based operator learning? This paper proposes the Position-induced Transformer (PiT), built on an innovative position-attention mechanism, which demonstrates significant advantages over the classical self-attention in operator learning. Position-attention draws inspiration from numerical methods for PDEs. Different from self-attention, position-attention is induced by only the spatial interrelations of sampling positions for input functions of the operators, and does not rely on the input function values themselves, thereby greatly boosting efficiency. PiT exhibits superior performance over current state-of-the-art neural operators in a variety of complex operator learning tasks across diverse PDE benchmarks. Additionally, PiT possesses an enhanced discretization convergence feature, compared to the widely-used Fourier neural operator.
Updated: 2024-05-15 12:09:24
标题: 位置知识就是你所需要的一切:位置诱导变压器(PiT)用于操作员学习
摘要: 运算符学习对于偏微分方程(PDEs)的建模正迅速发展成为一种有前途的方法。自注意力机制是一种强大的工具,最初设计用于自然语言处理,最近已经被应用于运算符学习。然而,它们面临着高计算需求和有限的可解释性等挑战。这带来了一个关键问题:是否有一种更高效的注意力机制适用于基于Transformer的运算符学习?本文提出了一种基于创新位置注意力机制构建的Position-induced Transformer(PiT),它在运算符学习中表现出明显优势,超过了传统的自注意力机制。位置注意力机制受到PDE数值方法的启发。与自注意力不同,位置注意力仅受到运算符输入函数的采样位置之间的空间相互关系的影响,而不依赖于输入函数值本身,从而极大提高了效率。PiT在各种复杂的运算符学习任务中展现出比当前最先进的神经运算符更优异的性能,包括各种PDE基准测试。此外,与广泛使用的傅里叶神经运算符相比,PiT具有更强的离散收敛特性。
更新时间: 2024-05-15 12:09:24
领域: cs.LG,cs.NA,math.NA
Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models
Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints. Employing constraints expressed via easily-understandable human language offers considerable potential for real-world applications due to its accessibility and non-reliance on domain expertise. Previous safe RL methods with natural language constraints typically adopt a recurrent neural network, which leads to limited capabilities when dealing with various forms of human language input. Furthermore, these methods often require a ground-truth cost function, necessitating domain expertise for the conversion of language constraints into a well-defined cost function that determines constraint violation. To address these issues, we proposes to use pre-trained language models (LM) to facilitate RL agents' comprehension of natural language constraints and allow them to infer costs for safe policy learning. Through the use of pre-trained LMs and the elimination of the need for a ground-truth cost, our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints. Experiments on grid-world navigation and robot control show that the proposed method can achieve strong performance while adhering to given constraints. The usage of pre-trained LMs allows our method to comprehend complicated constraints and learn safe policies without the need for ground-truth cost at any stage of training or evaluation. Extensive ablation studies are conducted to demonstrate the efficacy of each part of our method.
Updated: 2024-05-15 12:08:21
标题: 使用自由形式自然语言约束和预训练语言模型的安全强化学习
摘要: 安全强化学习(RL)代理完成给定任务的同时遵守特定约束。通过易于理解的人类语言表达约束具有很大的潜力,因为它具有可访问性并且不依赖于领域专业知识,适用于真实世界应用。先前具有自然语言约束的安全RL方法通常采用递归神经网络,这导致在处理各种形式的人类语言输入时能力有限。此外,这些方法通常需要一个基本成本函数,需要领域专业知识将语言约束转换为确定约束违规的明确定义的成本函数。为了解决这些问题,我们提出使用预先训练的语言模型(LM)来促进RL代理理解自然语言约束,并允许它们推断安全策略学习的成本。通过使用预先训练的LM和消除对基本成本的需求,我们的方法增强了在多样化的人类自由形式自然语言约束下的安全策略学习。在网格世界导航和机器人控制的实验中表明,所提出的方法可以在遵守给定约束的同时取得良好表现。使用预先训练的LM使我们的方法能够理解复杂的约束,并学习安全策略,而无需在任何培训或评估阶段都需要基本成本。进行了广泛的消融研究,以展示我们方法的每个部分的有效性。
更新时间: 2024-05-15 12:08:21
领域: cs.LG,cs.CL
A Survey of Generative Techniques for Spatial-Temporal Data Mining
This paper focuses on the integration of generative techniques into spatial-temporal data mining, considering the significant growth and diverse nature of spatial-temporal data. With the advancements in RNNs, CNNs, and other non-generative techniques, researchers have explored their application in capturing temporal and spatial dependencies within spatial-temporal data. However, the emergence of generative techniques such as LLMs, SSL, Seq2Seq and diffusion models has opened up new possibilities for enhancing spatial-temporal data mining further. The paper provides a comprehensive analysis of generative technique-based spatial-temporal methods and introduces a standardized framework specifically designed for the spatial-temporal data mining pipeline. By offering a detailed review and a novel taxonomy of spatial-temporal methodology utilizing generative techniques, the paper enables a deeper understanding of the various techniques employed in this field. Furthermore, the paper highlights promising future research directions, urging researchers to delve deeper into spatial-temporal data mining. It emphasizes the need to explore untapped opportunities and push the boundaries of knowledge to unlock new insights and improve the effectiveness and efficiency of spatial-temporal data mining. By integrating generative techniques and providing a standardized framework, the paper contributes to advancing the field and encourages researchers to explore the vast potential of generative techniques in spatial-temporal data mining.
Updated: 2024-05-15 12:07:43
标题: 一种空间-时间数据挖掘生成技术的调查
摘要: 本文关注将生成技术整合到时空数据挖掘中,考虑到时空数据的显著增长和多样化性质。随着递归神经网络(RNNs)、卷积神经网络(CNNs)等非生成技术的进步,研究人员已经探索了它们在捕捉时空数据中的时间和空间依赖性方面的应用。然而,生成技术(如LLMs、SSL、Seq2Seq和扩散模型)的出现为进一步增强时空数据挖掘打开了新的可能性。本文对基于生成技术的时空方法进行了全面分析,并引入了专门为时空数据挖掘流程设计的标准化框架。通过提供详细的评估和对利用生成技术的时空方法论的新分类体系,本文有助于更深入地理解该领域中采用的各种技术。此外,本文强调了有前途的未来研究方向,督促研究人员深入研究时空数据挖掘。它强调了探索未开发的机会和突破知识边界以解锁新的见解,并提高时空数据挖掘的效果和效率的必要性。通过整合生成技术并提供标准化框架,本文有助于推动该领域的发展,并鼓励研究人员探索生成技术在时空数据挖掘中的巨大潜力。
更新时间: 2024-05-15 12:07:43
领域: cs.LG,cs.AI,cs.CE
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
The misuse of large language models (LLMs) has drawn significant attention from the general public and LLM vendors. One particular type of adversarial prompt, known as jailbreak prompt, has emerged as the main attack vector to bypass the safeguards and elicit harmful content from LLMs. In this paper, employing our new framework JailbreakHub, we conduct a comprehensive analysis of 1,405 jailbreak prompts spanning from December 2022 to December 2023. We identify 131 jailbreak communities and discover unique characteristics of jailbreak prompts and their major attack strategies, such as prompt injection and privilege escalation. We also observe that jailbreak prompts increasingly shift from online Web communities to prompt-aggregation websites and 28 user accounts have consistently optimized jailbreak prompts over 100 days. To assess the potential harm caused by jailbreak prompts, we create a question set comprising 107,250 samples across 13 forbidden scenarios. Leveraging this dataset, our experiments on six popular LLMs show that their safeguards cannot adequately defend jailbreak prompts in all scenarios. Particularly, we identify five highly effective jailbreak prompts that achieve 0.95 attack success rates on ChatGPT (GPT-3.5) and GPT-4, and the earliest one has persisted online for over 240 days. We hope that our study can facilitate the research community and LLM vendors in promoting safer and regulated LLMs.
Updated: 2024-05-15 12:06:31
标题: “现在做任何事情”:对大型语言模型中的实际越狱提示进行特征化和评估”
摘要: 大型语言模型(LLMs)的误用已经引起了普通大众和LLM供应商的重视。一种特定类型的对抗性提示,被称为越狱提示,已经成为绕过保障机制并从LLMs中引发有害内容的主要攻击向量。在本文中,我们利用我们的新框架JailbreakHub,对从2022年12月到2023年12月的1,405个越狱提示进行了全面分析。我们确定了131个越狱社区,并发现了越狱提示及其主要攻击策略的独特特征,如提示注入和特权升级。我们还观察到,越狱提示越来越多地从在线网络社区转移到提示聚合网站,有28个用户账号在100天内持续优化越狱提示。为了评估越狱提示可能造成的危害,我们创建了一个包含107,250个样本的问题集,涵盖了13种禁止情景。利用这个数据集,我们对六种流行的LLMs进行实验证明,它们的保障措施无法在所有情景下充分防御越狱提示。特别地,我们确定了五个高效的越狱提示,在ChatGPT(GPT-3.5)和GPT-4上实现了0.95的成功攻击率,最早的一个已经在线持续了超过240天。我们希望我们的研究能够促进研究界和LLM供应商推广更安全和受监管的LLMs。
更新时间: 2024-05-15 12:06:31
领域: cs.CR,cs.LG
Learning Decision Policies with Instrumental Variables through Double Machine Learning
A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset, which can be caused by hidden confounders. Instrumental variable (IV) regression, which utilises a key unconfounded variable known as the instrument, is a standard technique for learning causal relationships between confounded action, outcome, and context variables. Most recent IV regression algorithms use a two-stage approach, where a deep neural network (DNN) estimator learnt in the first stage is directly plugged into the second stage, in which another DNN is used to estimate the causal effect. Naively plugging the estimator can cause heavy bias in the second stage, especially when regularisation bias is present in the first stage estimator. We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions and effectively learns high-performing policies. We derive a novel learning objective to reduce bias and design the DML-IV algorithm following the double/debiased machine learning (DML) framework. The learnt DML-IV estimator has strong convergence rate and $O(N^{-1/2})$ suboptimality guarantees that match those when the dataset is unconfounded. DML-IV outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments.
Updated: 2024-05-15 12:05:18
标题: 使用仪器变量通过双机器学习学习决策策略
摘要: 在数据丰富的环境中学习决策政策时常见的问题是离线数据集中的虚假相关性,这可能是由隐藏的混杂因素引起的。仪器变量(IV)回归利用一种被称为仪器的关键无混杂变量,是学习混杂行动、结果和背景变量之间因果关系的标准技术。大多数最近的IV回归算法使用两阶段方法,其中第一阶段学习的深度神经网络(DNN)估计器直接插入第二阶段,第二阶段使用另一个DNN来估计因果效应。天真地插入估计器可能会在第二阶段引起严重偏差,尤其是当第一阶段估计器中存在正则化偏差时。我们提出了一种非线性IV回归方法DML-IV,它减少了两阶段IV回归中的偏差,并有效地学习高性能政策。我们提出了一种新的学习目标来减少偏差,并设计了DML-IV算法,遵循双/去偏机器学习(DML)框架。学习的DML-IV估计器具有强大的收敛速度和与非混杂数据集相匹配的$O(N^{-1/2})$次优保证。DML-IV在IV回归基准测试中表现优于最先进的IV回归方法,并在存在仪器的情况下学习高性能政策。
更新时间: 2024-05-15 12:05:18
领域: cs.LG,stat.ML
XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare
The integration of Large Language Models (LLMs) into healthcare diagnostics offers a promising avenue for clinical decision-making. This study outlines the development of a novel method for zero-shot/few-shot in-context learning (ICL) by integrating medical domain knowledge using a multi-layered structured prompt. We also explore the efficacy of two communication styles between the user and LLMs: the Numerical Conversational (NC) style, which processes data incrementally, and the Natural Language Single-Turn (NL-ST) style, which employs long narrative prompts. Our study systematically evaluates the diagnostic accuracy and risk factors, including gender bias and false negative rates, using a dataset of 920 patient records in various few-shot scenarios. Results indicate that traditional clinical machine learning (ML) models generally outperform LLMs in zero-shot and few-shot settings. However, the performance gap narrows significantly when employing few-shot examples alongside effective explainable AI (XAI) methods as sources of domain knowledge. Moreover, with sufficient time and an increased number of examples, the conversational style (NC) nearly matches the performance of ML models. Most notably, LLMs demonstrate comparable or superior cost-sensitive accuracy relative to ML models. This research confirms that, with appropriate domain knowledge and tailored communication strategies, LLMs can significantly enhance diagnostic processes. The findings highlight the importance of optimizing the number of training examples and communication styles to improve accuracy and reduce biases in LLM applications.
Updated: 2024-05-15 11:59:41
标题: XAI4LLM. 让机器学习模型和LLMs共同合作,以增强医疗保健领域的上下文学习
摘要: 将大型语言模型(LLMs)整合到医疗诊断中为临床决策提供了一个有前途的途径。本研究概述了通过使用多层结构提示集成医学领域知识开发一种新颖的零样本/少样本情境学习(ICL)方法。我们还探讨了用户和LLMs之间的两种沟通风格的有效性:数字对话(NC)风格,逐步处理数据,以及自然语言单轮(NL-ST)风格,采用长篇叙述提示。 我们的研究系统评估了诊断准确性和风险因素,包括性别偏见和假阴性率,使用包含920份患者记录的各种少样本情境的数据集。结果表明,在零样本和少样本情境下,传统的临床机器学习(ML)模型通常优于LLMs。然而,当将少样本示例与有效的可解释人工智能(XAI)方法作为领域知识来源一起使用时,性能差距显著缩小。此外,通过足够的时间和增加样本数量,对话风格(NC)几乎可以与ML模型的性能相匹敌。值得注意的是,相对于ML模型,LLMs展示出可比较或更优越的成本敏感准确性。 这项研究证实了,通过恰当的领域知识和量身定制的沟通策略,LLMs可以显著增强诊断过程。研究结果强调了优化训练示例数量和沟通风格以提高LLMs应用准确性和减少偏见的重要性。
更新时间: 2024-05-15 11:59:41
领域: cs.LG,cs.AI,cs.CL
A Comprehensive Survey on Data Augmentation
Data augmentation is a series of techniques that generate high-quality artificial data by manipulating existing data samples. By leveraging data augmentation techniques, AI models can achieve significantly improved applicability in tasks involving scarce or imbalanced datasets, thereby substantially enhancing AI models' generalization capabilities. Existing literature surveys only focus on a certain type of specific modality data, and categorize these methods from modality-specific and operation-centric perspectives, which lacks a consistent summary of data augmentation methods across multiple modalities and limits the comprehension of how existing data samples serve the data augmentation process. To bridge this gap, we propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities. Specifically, from a data-centric perspective, this survey proposes a modality-independent taxonomy by investigating how to take advantage of the intrinsic relationship between data samples, including single-wise, pair-wise, and population-wise sample data augmentation methods. Additionally, we categorize data augmentation methods across five data modalities through a unified inductive approach.
Updated: 2024-05-15 11:58:08
标题: 数据增强综述
摘要: 数据增强是一系列通过操作现有数据样本生成高质量人工数据的技术。通过利用数据增强技术,人工智能模型可以在涉及稀缺或不平衡数据集的任务中实现显着改进的适用性,从而大大提升人工智能模型的泛化能力。现有文献调查仅关注特定类型的特定模态数据,并从模态特定和操作中心的角度对这些方法进行分类,缺乏对跨多种模态的数据增强方法的一致总结,限制了现有数据样本如何为数据增强过程服务的理解。为了弥补这一差距,我们提出了一个更具启发性的分类法,涵盖了不同常见数据模态的数据增强技术。具体而言,从数据中心的角度出发,本调查提出了一个与模态无关的分类法,通过调查如何利用数据样本之间的内在关系,包括单向、双向和总体样本数据增强方法。此外,我们通过统一的归纳方法对五种数据模态的数据增强方法进行分类。
更新时间: 2024-05-15 11:58:08
领域: cs.LG,cs.AI
Sign of the Times: Evaluating the use of Large Language Models for Idiomaticity Detection
Despite the recent ubiquity of large language models and their high zero-shot prompted performance across a wide range of tasks, it is still not known how well they perform on tasks which require processing of potentially idiomatic language. In particular, how well do such models perform in comparison to encoder-only models fine-tuned specifically for idiomaticity tasks? In this work, we attempt to answer this question by looking at the performance of a range of LLMs (both local and software-as-a-service models) on three idiomaticity datasets: SemEval 2022 Task 2a, FLUTE, and MAGPIE. Overall, we find that whilst these models do give competitive performance, they do not match the results of fine-tuned task-specific models, even at the largest scales (e.g. for GPT-4). Nevertheless, we do see consistent performance improvements across model scale. Additionally, we investigate prompting approaches to improve performance, and discuss the practicalities of using LLMs for these tasks.
Updated: 2024-05-15 11:55:14
标题: 时代的标志:评估大型语言模型在成语检测中的应用
摘要: 尽管最近大型语言模型在各种任务中的高零-shot性能已经变得普遍,但它们在需要处理潜在习语语言的任务中的表现仍然未知。特别是,这些模型在与专门针对习语性任务进行微调的仅编码器模型相比表现如何?在这项工作中,我们尝试回答这个问题,通过观察一系列LLMs(包括本地模型和软件即服务模型)在三个习语性数据集(SemEval 2022任务2a、FLUTE和MAGPIE)上的表现。总体而言,我们发现,尽管这些模型表现竞争力强,但它们并不与微调的任务特定模型的结果相匹配,即使在最大规模(例如GPT-4)下也是如此。然而,我们确实看到了模型规模上的一致性性能改进。此外,我们还研究了促使方法以提高性能,并讨论了使用LLMs进行这些任务的实际性。
更新时间: 2024-05-15 11:55:14
领域: cs.CL,cs.AI
A multiscale and multicriteria Generative Adversarial Network to synthesize 1-dimensional turbulent fields
This article introduces a new Neural Network stochastic model to generate a 1-dimensional stochastic field with turbulent velocity statistics. Both the model architecture and training procedure ground on the Kolmogorov and Obukhov statistical theories of fully developed turbulence, so guaranteeing descriptions of 1) energy distribution, 2) energy cascade and 3) intermittency across scales in agreement with experimental observations. The model is a Generative Adversarial Network with multiple multiscale optimization criteria. First, we use three physics-based criteria: the variance, skewness and flatness of the increments of the generated field that retrieve respectively the turbulent energy distribution, energy cascade and intermittency across scales. Second, the Generative Adversarial Network criterion, based on reproducing statistical distributions, is used on segments of different length of the generated field. Furthermore, to mimic multiscale decompositions frequently used in turbulence's studies, the model architecture is fully convolutional with kernel sizes varying along the multiple layers of the model. To train our model we use turbulent velocity signals from grid turbulence at Modane wind tunnel.
Updated: 2024-05-15 11:49:51
标题: 一个多尺度和多标准生成对抗网络用于合成一维湍流场
摘要: 这篇文章介绍了一个新的神经网络随机模型,用于生成具有湍流速度统计特性的一维随机场。该模型的架构和训练过程基于科尔莫戈洛夫和奥布科夫关于完全发展湍流的统计理论,从而确保与实验观测一致的描述能量分布、能量级联和尺度间断性。该模型是一个具有多重多尺度优化标准的生成对抗网络。首先,我们使用三个基于物理的标准:生成场增量的方差、偏度和平坦度,分别恢复湍流能量分布、能量级联和尺度间断性。其次,基于再现统计分布的生成对抗网络标准,用于不同长度段的生成场。此外,为了模拟湍流研究中经常使用的多尺度分解,模型架构是完全卷积的,核尺寸沿模型的多个层变化。为了训练我们的模型,我们使用莫丹风洞中的网格湍流湍流速度信号。
更新时间: 2024-05-15 11:49:51
领域: cs.LG,eess.SP,physics.flu-dyn
Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks
We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) -- DNNs often fit target functions from low to high frequencies -- on high-dimensional benchmark datasets such as MNIST/CIFAR10 and deep neural networks such as VGG16. This F-Principle of DNNs is opposite to the behavior of most conventional iterative numerical schemes (e.g., Jacobi method), which exhibit faster convergence for higher frequencies for various scientific computing problems. With a simple theory, we illustrate that this F-Principle results from the regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or randomized dataset.
Updated: 2024-05-15 11:48:11
标题: 频率原理:傅里叶分析揭示深度神经网络
摘要: 我们从傅立叶分析的角度研究了深度神经网络(DNNs)的训练过程。我们在高维基准数据集(如MNIST/CIFAR10)和深度神经网络(如VGG16)上展示了一个非常普遍的频率原则(F-Principle) - DNNs通常从低频到高频拟合目标函数。这种DNNs的F-Principle与大多数常规的迭代数值方案(如Jacobi方法)的行为相反,后者对于各种科学计算问题的高频率展现出更快的收敛性。通过简单的理论,我们说明了这种F-Principle是由常用激活函数的规则性导致的。这个F-Principle暗示了一个隐含的偏见,即DNNs倾向于通过低频函数拟合训练数据。这种理解解释了DNNs在大多数真实数据集上的良好泛化能力,以及在奇偶函数或随机数据集上的泛化能力差的原因。
更新时间: 2024-05-15 11:48:11
领域: cs.LG,stat.ML,68Q32, 68T01,I.2.6
Dual-Segment Clustering Strategy for Federated Learning in Heterogeneous Environments
Federated learning (FL) is a distributed machine learning paradigm with high efficiency and low communication load, only transmitting parameters or gradients of network. However, the non-independent and identically distributed (Non-IID) data characteristic has a negative impact on this paradigm. Furthermore, the heterogeneity of communication quality will significantly affect the accuracy of parameter transmission, causing a degradation in the performance of the FL system or even preventing its convergence. This letter proposes a dual-segment clustering (DSC) strategy, which first clusters the clients according to the heterogeneous communication conditions and then performs a second clustering by the sample size and label distribution, so as to solve the problem of data and communication heterogeneity. Experimental results show that the DSC strategy proposed in this letter can improve the convergence rate of FL, and has superiority on accuracy in a heterogeneous environment compared with the classical algorithm of cluster.
Updated: 2024-05-15 11:46:47
标题: 异构环境中联邦学习的双段聚类策略
摘要: 分布式学习(FL)是一种具有高效率和低通信负载的机器学习范例,仅传输网络的参数或梯度。然而,非独立同分布(Non-IID)数据特征对这种范例产生了负面影响。此外,通信质量的异质性将显着影响参数传输的准确性,导致FL系统性能的下降甚至阻止其收敛。本文提出了一种双分段聚类(DSC)策略,首先根据异质通信条件对客户端进行聚类,然后按照样本大小和标签分布进行第二次聚类,以解决数据和通信的异质性问题。实验结果表明,本信函中提出的DSC策略可以提高FL的收敛速度,并在异构环境中相对于传统的聚类算法具有更高的准确性。
更新时间: 2024-05-15 11:46:47
领域: cs.LG,cs.AI,cs.DC
Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study
In this work, we systematically investigate the efficacy of dynamic activation mechanisms within the LLaMA family of language models. Despite the potential of dynamic activation methods to reduce computation and increase speed in models using the ReLU activation function, our empirical findings have uncovered several inherent pitfalls in the current dynamic activation schemes. Through extensive experiments across various dynamic activation strategies, we demonstrate that LLaMA models usually underperform when compared to their ReLU counterparts, particularly in scenarios demanding high sparsity ratio. We attribute these deficiencies to a combination of factors: 1) the inherent complexity of dynamically predicting activation heads and neurons; 2) the inadequate sparsity resulting from activation functions; 3) the insufficient preservation of information resulting from KV cache skipping. Our analysis not only sheds light on the limitations of dynamic activation in the context of large-scale LLaMA models but also proposes roadmaps for enhancing the design of future sparsity schemes.
Updated: 2024-05-15 11:42:42
标题: LLaMA模型中的动态激活陷阱:一项实证研究
摘要: 在这项工作中,我们系统地研究了LLaMA语言模型系列内动态激活机制的效力。尽管动态激活方法在使用ReLU激活函数的模型中有减少计算和增加速度的潜力,但我们的实证研究发现了当前动态激活方案中的几个固有缺陷。通过对各种动态激活策略进行广泛实验,我们证明了与其ReLU对照组相比,LLaMA模型通常表现不佳,特别是在要求高稀疏比例的情况下。我们将这些缺陷归因于多种因素:1)动态预测激活头和神经元的固有复杂性;2)由激活函数导致的不足稀疏性;3)由于KV缓存跳过而导致信息的不充分保存。我们的分析不仅揭示了在大规模LLaMA模型背景下动态激活的局限性,还提出了增强未来稀疏方案设计的路线图。
更新时间: 2024-05-15 11:42:42
领域: cs.LG
Fair Generalized Linear Mixed Models
When using machine learning for automated prediction, it is important to account for fairness in the prediction. Fairness in machine learning aims to ensure that biases in the data and model inaccuracies do not lead to discriminatory decisions. E.g., predictions from fair machine learning models should not discriminate against sensitive variables such as sexual orientation and ethnicity. The training data often in obtained from social surveys. In social surveys, oftentimes the data collection process is a strata sampling, e.g. due to cost restrictions. In strata samples, the assumption of independence between the observation is not fulfilled. Hence, if the machine learning models do not account for the strata correlations, the results may be biased. Especially high is the bias in cases where the strata assignment is correlated to the variable of interest. We present in this paper an algorithm that can handle both problems simultaneously, and we demonstrate the impact of stratified sampling on the quality of fair machine learning predictions in a reproducible simulation study.
Updated: 2024-05-15 11:42:41
标题: 公平广义线性混合模型
摘要: 在使用机器学习进行自动预测时,重要的是要考虑预测中的公平性。机器学习中的公平性旨在确保数据中的偏见和模型不准确性不导致歧视性决策。例如,公平的机器学习模型的预测不应该歧视敏感变量,如性取向和种族。训练数据通常来自社会调查。在社会调查中,数据收集过程通常是分层抽样,例如由于成本限制。在分层样本中,观察之间的独立性假设不成立。因此,如果机器学习模型不考虑分层相关性,结果可能会有偏见。尤其是在分层分配与感兴趣的变量相关联的情况下,偏见会很高。本文介绍了一种可以同时处理这两个问题的算法,并演示了分层抽样对公平机器学习预测质量的影响在可重复的模拟研究中。
更新时间: 2024-05-15 11:42:41
领域: cs.LG,math.OC
Polar Encoding: A Simple Baseline Approach for Classification with Missing Values
We propose polar encoding, a representation of categorical and numerical $[0,1]$-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and $[0,1]$-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and -- depending on the classifier -- about as well or better than mean/mode imputation with missing-indicators.
Updated: 2024-05-15 11:40:20
标题: 极性编码:一种简单的基准方法用于具有缺失值的分类
摘要: 我们提出了极性编码,这是一种用于分类环境中表示具有缺失值的分类和数值$[0,1]$值属性的方法。我们认为这是一个很好的基准方法,因为它可以与任何分类算法一起使用,保留了缺失信息,非常简单易用,并且性能良好。特别地,与现有的缺失指示器方法不同,它不需要插补,确保缺失值与非缺失值等距,让决策树算法选择如何分割缺失值,从而提供了“属性中包含缺失信息”(MIA)提案的实际实现。此外,我们展示了分类和$[0,1]$值属性可以被视为一个单一属性类型的特殊情况,对应于重心坐标的经典概念,并且这为极性编码提供了一个模糊化的一热编码的自然解释。通过基于二十个具有缺失值的实际数据集的实验,我们展示了在分类性能方面,极性编码优于现有最先进的策略“通过链式方程进行多重插补”(MICE)和“通过去噪自动编码器进行多重插补”(MIDAS),并且-取决于分类器-与使用均值/众数插补和缺失指示器相比,表现得差不多或更好。
更新时间: 2024-05-15 11:40:20
领域: cs.LG
Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
The task of generating dance from music is crucial, yet current methods, which mainly produce joint sequences, lead to outputs that lack intuitiveness and complicate data collection due to the necessity for precise joint annotations. We introduce a Dance Any Beat Diffusion model, namely DabFusion, that employs music as a conditional input to directly create dance videos from still images, utilizing conditional image-to-video generation principles. This approach pioneers the use of music as a conditioning factor in image-to-video synthesis. Our method unfolds in two stages: training an auto-encoder to predict latent optical flow between reference and driving frames, eliminating the need for joint annotation, and training a U-Net-based diffusion model to produce these latent optical flows guided by music rhythm encoded by CLAP. Although capable of producing high-quality dance videos, the baseline model struggles with rhythm alignment. We enhance the model by adding beat information, improving synchronization. We introduce a 2D motion-music alignment score (2D-MM Align) for quantitative assessment. Evaluated on the AIST++ dataset, our enhanced model shows marked improvements in 2D-MM Align score and established metrics. Video results can be found on our project page: https://DabFusion.github.io.
Updated: 2024-05-15 11:33:07
标题: 跳动的节拍:将节奏与视觉融合在舞蹈视频生成中
摘要: 生成舞蹈的任务对于音乐非常重要,然而当前的方法主要是生成联合序列,导致输出缺乏直观性,并且由于需要精确的联合注释,使数据收集变得复杂。我们引入了一种名为DabFusion的舞蹈任意节拍扩散模型,它利用音乐作为条件输入,直接从静止图像创建舞蹈视频,利用条件图像到视频生成原则。这种方法是在图像到视频合成中首次使用音乐作为条件因素。我们的方法分为两个阶段:训练一个自动编码器来预测参考帧和驱动帧之间的潜在光流,消除了对联合注释的需求,然后训练一个基于U-Net的扩散模型,通过由CLAP编码的音乐节奏产生这些潜在光流。虽然能够生成高质量的舞蹈视频,基线模型在节奏对齐方面存在困难。我们通过添加节拍信息来增强模型,改善了同步性。我们引入了一个用于定量评估的2D运动音乐对齐分数(2D-MM Align)。在AIST++数据集上进行评估,我们增强的模型在2D-MM Align分数和已建立的度量方面显示出明显的改善。视频结果可以在我们的项目页面上找到:https://DabFusion.github.io。
更新时间: 2024-05-15 11:33:07
领域: cs.CV,cs.AI,cs.MM,cs.SD,eess.AS
A Quantum of QUIC: Dissecting Cryptography with Post-Quantum Insights
QUIC is a new network protocol standardized in 2021. It was designed to replace the TCP/TLS stack and is based on UDP. The most current web standard HTTP/3 is specifically designed to use QUIC as transport protocol. QUIC claims to provide secure and fast transport with low-latency connection establishment, flow and congestion control, reliable delivery, and stream multiplexing. To achieve the security goals, QUIC enforces the usage of TLS 1.3. It uses authenticated encryption with additional data (AEAD) algorithms to not only protect the payload but also parts of the header. The handshake relies on asymmetric cryptography, which will be broken with the introduction of powerful quantum computers, making the use of post-quantum cryptography inevitable. This paper presents a detailed evaluation of the impact of cryptography on QUIC performance. The high-performance QUIC implementations LSQUIC, quiche, and MsQuic are evaluated under different aspects. We break symmetric cryptography down to the different security features. To be able to isolate the impact of cryptography, we implemented a NOOP AEAD algorithm which leaves plaintext unaltered. We show that QUIC performance increases by 10 to 20% when removing packet protection. The header protection has negligible impact on performance, especially for AES ciphers. We integrate post-quantum cryptographic algorithms into QUIC, demonstrating its feasibility without major changes to the QUIC libraries by using a TLS library that implements post-quantum algorithms. Kyber, Dilithium, and FALCON are promising candidates for post-quantum secure QUIC, as they have a low impact on the handshake duration. Algorithms like SPHINCS+ with larger key sizes or more complex calculations significantly impact the handshake duration and cause additional issues in our measurements.
Updated: 2024-05-15 11:27:28
标题: 一个QUIC的量子:用后量子洞见解剖密码学
摘要: QUIC是一种在2021年标准化的新型网络协议。它旨在取代TCP/TLS堆栈,并基于UDP。目前最新的Web标准HTTP/3专门设计为使用QUIC作为传输协议。QUIC声称提供安全和快速的传输,具有低延迟的连接建立、流量和拥塞控制、可靠传递和流多路复用。为实现安全目标,QUIC强制使用TLS 1.3。它使用认证加密与附加数据(AEAD)算法,不仅保护有效负载,还保护部分头部。握手依赖于非对称加密,引入强大的量子计算机将破坏这一加密方式,使后量子密码的使用不可避免。本文详细评估了加密对QUIC性能的影响。高性能的QUIC实现LSQUIC、quiche和MsQuic在不同方面进行评估。我们将对称加密分解为不同的安全特性。为了能够隔离加密的影响,我们实现了一个NOOP AEAD算法,保持明文不变。我们展示了当去除数据包保护时,QUIC性能提高了10到20%。头部保护对性能影响微乎其微,尤其对于AES密码。我们将后量子密码算法集成到QUIC中,证明了其可行性,无需对QUIC库进行重大更改,只需使用实现后量子算法的TLS库。Kyber、Dilithium和FALCON是后量子安全QUIC的有前途的候选者,因为它们对握手持续时间影响较小。像SPHINCS+这样具有更大密钥大小或更复杂计算的算法显著影响握手持续时间,并导致我们测量中的额外问题。
更新时间: 2024-05-15 11:27:28
领域: cs.NI,cs.CR
Using Combinatorial Optimization to Design a High quality LLM Solution
We introduce a novel LLM based solution design approach that utilizes combinatorial optimization and sampling. Specifically, a set of factors that influence the quality of the solution are identified. They typically include factors that represent prompt types, LLM inputs alternatives, and parameters governing the generation and design alternatives. Identifying the factors that govern the LLM solution quality enables the infusion of subject matter expert knowledge. Next, a set of interactions between the factors are defined and combinatorial optimization is used to create a small subset $P$ that ensures all desired interactions occur in $P$. Each element $p \in P$ is then developed into an appropriate benchmark. Applying the alternative solutions on each combination, $p \in P$ and evaluating the results facilitate the design of a high quality LLM solution pipeline. The approach is especially applicable when the design and evaluation of each benchmark in $P$ is time-consuming and involves manual steps and human evaluation. Given its efficiency the approach can also be used as a baseline to compare and validate an autoML approach that searches over the factors governing the solution.
Updated: 2024-05-15 11:13:39
标题: 使用组合优化设计高质量的LLM解决方案
摘要: 我们介绍了一种基于LLM的新颖解决方案设计方法,该方法利用组合优化和抽样。具体来说,确定了影响解决方案质量的一组因素。它们通常包括代表提示类型、LLM输入替代方案和控制生成和设计替代方案的参数的因素。确定影响LLM解决方案质量的因素使主题专家知识得以融入。接下来,定义了因素之间的一组相互作用,并使用组合优化创建一个确保所有期望的相互作用发生在P中的小子集。然后,将P中的每个元素p开发成一个适当的基准。在每个组合中应用替代方案p,并评估结果有助于设计高质量的LLM解决方案管道。当在P中设计和评估每个基准是耗时且涉及手动步骤和人工评估时,该方法尤其适用。鉴于其效率,该方法还可以用作比较和验证自动机器学习方法的基准,该方法搜索控制解决方案的因素。
更新时间: 2024-05-15 11:13:39
领域: cs.CL,cs.AI
Does Machine Bring in Extra Bias in Learning? Approximating Fairness in Models Promptly
Providing various machine learning (ML) applications in the real world, concerns about discrimination hidden in ML models are growing, particularly in high-stakes domains. Existing techniques for assessing the discrimination level of ML models include commonly used group and individual fairness measures. However, these two types of fairness measures are usually hard to be compatible with each other, and even two different group fairness measures might be incompatible as well. To address this issue, we investigate to evaluate the discrimination level of classifiers from a manifold perspective and propose a "harmonic fairness measure via manifolds (HFM)" based on distances between sets. Yet the direct calculation of distances might be too expensive to afford, reducing its practical applicability. Therefore, we devise an approximation algorithm named "Approximation of distance between sets (ApproxDist)" to facilitate accurate estimation of distances, and we further demonstrate its algorithmic effectiveness under certain reasonable assumptions. Empirical results indicate that the proposed fairness measure HFM is valid and that the proposed ApproxDist is effective and efficient.
Updated: 2024-05-15 11:07:40
标题: 机器学习是否在学习中带来额外的偏见?及时逼近模型的公平性
摘要: 随着在现实世界中提供各种机器学习(ML)应用,人们对隐藏在ML模型中的歧视问题的关注日益增长,特别是在高风险领域。目前用于评估ML模型歧视水平的技术包括常用的群体和个体公平度量。然而,这两种公平度量通常很难相互兼容,甚至两种不同的群体公平度量也可能不兼容。为了解决这个问题,我们研究了从流形角度评估分类器的歧视水平,并提出了一种基于集合之间距离的“流形上的谐和公平度量(HFM)”。然而,直接计算距离可能太昂贵,降低了其实际适用性。因此,我们设计了一种名为“集合之间距离的近似(ApproxDist)”的近似算法,以便准确估计距离,并进一步证明其在某些合理假设下的算法有效性。经验结果表明,提出的公平度量HFM是有效的,提出的ApproxDist是有效且高效的。
更新时间: 2024-05-15 11:07:40
领域: cs.LG,cs.CY,68T01, 68T09, 68T20,I.2; I.2.6; I.2.0; K.4.2
Graph Neural Network based Handwritten Trajectories Recognition
The graph neural networks has been proved to be an efficient machine learning technique in real life applications. The handwritten recognition is one of the useful area in real life use where both offline and online handwriting recognition are required. The chain code as feature extraction technique has shown significant results in literature and we have been able to use chain codes with graph neural networks. To the best of our knowledge, this work presents first time a novel combination of handwritten trajectories features as chain codes and graph neural networks together. The handwritten trajectories for offline handwritten text has been evaluated using recovery of drawing order, whereas online handwritten trajectories are directly used with chain codes. Our results prove that present combination surpass previous results and minimize error rate in few epochs only.
Updated: 2024-05-15 11:00:42
标题: 基于图神经网络的手写轨迹识别
摘要: 图形神经网络已被证明是一种有效的机器学习技术在现实生活应用中。手写识别是现实生活中的一个有用领域,需要离线和在线手写识别。链码作为特征提取技术在文献中显示出显著的结果,我们已经能够将链码与图形神经网络结合使用。据我们所知,这项工作首次提出了手写轨迹特征作为链码和图形神经网络的新颖组合。离线手写文本的手写轨迹已通过绘图顺序的恢复进行评估,而在线手写轨迹则直接与链码一起使用。我们的结果证明了当前组合超越了先前的结果,并在仅几个时代内最小化了错误率。
更新时间: 2024-05-15 11:00:42
领域: cs.CV,cs.LG
NeuralCMS: A deep learning approach to study Jupiter's interior
NASA's Juno mission provided exquisite measurements of Jupiter's gravity field that together with the Galileo entry probe atmospheric measurements constrains the interior structure of the giant planet. Inferring its interior structure range remains a challenging inverse problem requiring a computationally intensive search of combinations of various planetary properties, such as the cloud-level temperature, composition, and core features, requiring the computation of ~10^9 interior models. We propose an efficient deep neural network (DNN) model to generate high-precision wide-ranged interior models based on the very accurate but computationally demanding concentric MacLaurin spheroid (CMS) method. We trained a sharing-based DNN with a large set of CMS results for a four-layer interior model of Jupiter, including a dilute core, to accurately predict the gravity moments and mass, given a combination of interior features. We evaluated the performance of the trained DNN (NeuralCMS) to inspect its predictive limitations. NeuralCMS shows very good performance in predicting the gravity moments, with errors comparable with the uncertainty due to differential rotation, and a very accurate mass prediction. This allowed us to perform a broad parameter space search by computing only ~10^4 actual CMS interior models, resulting in a large sample of plausible interior structures, and reducing the computation time by a factor of 10^5. Moreover, we used a DNN explainability algorithm to analyze the impact of the parameters setting the interior model on the predicted observables, providing information on their nonlinear relation.
Updated: 2024-05-15 10:55:16
标题: 神经CMS:研究木星内部的深度学习方法
摘要: 美国宇航局的朱诺任务提供了关于木星引力场的精确测量,结合伽利略入射探测器大气测量,限制了这颗巨大行星的内部结构。推断其内部结构范围仍然是一个具有挑战性的反问题,需要对各种行星属性的组合进行计算密集型搜索,如云层温度、组成和核心特征,需要计算大约10^9个内部模型。我们提出了一个高效的深度神经网络(DNN)模型,基于非常精确但计算需求量大的同心马克劳林椭球体(CMS)方法,生成高精度的广泛范围内部模型。我们训练了一个基于共享的DNN,利用大量CMS结果对木星的四层内部模型进行训练,包括一个稀薄的核心,以准确预测引力矩和质量,给定一组内部特征的组合。我们评估了经过训练的DNN(NeuralCMS)的性能,以检查其预测限制。NeuralCMS在预测引力矩方面表现出很好的性能,与由于差分旋转而引起的不确定性相当,质量预测非常准确。这使我们能够通过仅计算大约10^4个实际CMS内部模型进行广泛的参数空间搜索,从而得到大量合理的内部结构样本,并将计算时间缩短了10^5倍。此外,我们使用了一个DNN解释性算法来分析设置内部模型参数对预测可观测量的影响,提供了关于它们非线性关系的信息。
更新时间: 2024-05-15 10:55:16
领域: astro-ph.EP,astro-ph.IM,cs.LG
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 24.2% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .
Updated: 2024-05-15 10:47:43
标题: 3D扩散策略:通过简单的3D表示实现可推广的视觉动作策略学习
摘要: 模仿学习为教授机器人灵巧技能提供了一种高效的方式;然而,学习复杂的技能并使其具有健壮性和泛化性通常需要大量的人类示范。为了解决这一具有挑战性的问题,我们提出了3D扩散策略(DP3),这是一种新颖的视觉模仿学习方法,将3D视觉表示的能力融入到扩散策略中,一种条件动作生成模型。DP3的核心设计是利用紧凑的3D视觉表示,从稀疏点云中提取出来,使用高效的点编码器。在我们涉及72项模拟任务的实验中,DP3仅需要10个示范就能成功处理大多数任务,并且超过基线方法,相对改进了24.2%。在4个真实机器人任务中,DP3表现出精确控制,成功率高达85%,每项任务仅需40个示范,展现出在空间、视角、外观和实例等多个方面的出色泛化能力。有趣的是,在真实机器人实验中,DP3很少违反安全要求,而基线方法经常需要人为干预。我们的广泛评估突显了3D表示在现实世界机器人学习中的关键重要性。视频、代码和数据可在 https://3d-diffusion-policy.github.io 上获取。
更新时间: 2024-05-15 10:47:43
领域: cs.RO,cs.CV,cs.LG
Implicit meta-learning may lead language models to trust more reliable sources
We demonstrate that LLMs may learn indicators of document usefulness and modulate their updates accordingly. We introduce random strings ("tags") as indicators of usefulness in a synthetic fine-tuning dataset. Fine-tuning on this dataset leads to implicit meta-learning (IML): in further fine-tuning, the model updates to make more use of text that is tagged as useful. We perform a thorough empirical investigation of this phenomenon, finding (among other things) that (i) it occurs in both pretrained LLMs and those trained from scratch, as well as on a vision task, and (ii) larger models and smaller batch sizes tend to give more IML. We also use probing to examine how IML changes the way models store knowledge in their parameters. Finally, we reflect on what our results might imply about capabilities, risks, and controllability of future AI systems. Our code can be found at https://github.com/krasheninnikov/internalization.
Updated: 2024-05-15 10:47:28
标题: 隐式元学习可能导致语言模型信任更可靠的来源
摘要: 我们证明LLMs可以学习文档有用性的指标,并相应调整更新。我们在一个合成微调数据集中引入随机字符串(“标签”)作为有用性的指标。在这个数据集上进行微调会导致隐式元学习(IML):在进一步的微调中,模型会更新以更多地利用被标记为有用的文本。我们对这一现象进行了彻底的实证调查,发现(除其他事项外):(i)它发生在预训练的LLMs和从头开始训练的模型上,以及视觉任务上,(ii)更大的模型和更小的批量大小倾向于产生更多的IML。我们还使用探测来检查IML如何改变模型在参数中存储知识的方式。最后,我们反思了我们的结果可能暗示的未来AI系统的能力、风险和可控性。我们的代码可在https://github.com/krasheninnikov/internalization找到。
更新时间: 2024-05-15 10:47:28
领域: cs.LG,cs.AI
Reduce to the MACs -- Privacy Friendly Generic Probe Requests
Abstract. Since the introduction of active discovery in Wi-Fi networks, users can be tracked via their probe requests. Although manufacturers typically try to conceal Media Access Control (MAC) addresses using MAC address randomisation, probe requests still contain Information Elements (IEs) that facilitate device identification. This paper introduces generic probe requests: By removing all unnecessary information from IEs, the requests become indistinguishable from one another, letting single devices disappear in the largest possible anonymity set. Conducting a comprehensive evaluation, we demonstrate that a large IE set contained within undirected probe requests does not necessarily imply fast connection establishment. Furthermore, we show that minimising IEs to nothing but Supported Rates would enable 82.55% of the devices to share the same anonymity set. Our contributions provide a significant advancement in the pursuit of robust privacy solutions for wireless networks, paving the way for more user anonymity and less surveillance in wireless communication ecosystems.
Updated: 2024-05-15 10:18:30
标题: 减少MAC地址-隐私友好的通用探测请求
摘要: 摘要。自从Wi-Fi网络中引入主动发现以来,用户可以通过其探测请求进行跟踪。虽然制造商通常尝试使用MAC地址随机化来隐藏媒体访问控制(MAC)地址,但探测请求仍然包含信息元素(IEs),这有助于设备识别。本文介绍了通用探测请求:通过从IEs中删除所有不必要的信息,请求变得无法区分,使单个设备可以在最大可能的匿名集中消失。通过进行全面评估,我们证明了包含在无向探测请求中的大型IE集合并不一定意味着快速连接建立。此外,我们展示将IE最小化为仅支持速率将使82.55%的设备共享相同的匿名集。我们的贡献在寻求无线网络的强大隐私解决方案方面取得了重要进展,为无线通信生态系统中更多用户匿名和更少监视铺平了道路。
更新时间: 2024-05-15 10:18:30
领域: cs.CR,cs.NI
Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Review
The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the proliferation of FMs brings forth a critical challenge: the potential to generate hallucinated outputs, particularly in high-stakes applications. The tendency of foundation models to produce hallucinated content arguably represents the biggest hindrance to their widespread adoption in real-world scenarios, especially in domains where reliability and accuracy are paramount. This survey paper presents a comprehensive overview of recent developments that aim to identify and mitigate the problem of hallucination in FMs, spanning text, image, video, and audio modalities. By synthesizing recent advancements in detecting and mitigating hallucination across various modalities, the paper aims to provide valuable insights for researchers, developers, and practitioners. Essentially, it establishes a clear framework encompassing definition, taxonomy, and detection strategies for addressing hallucination in multimodal foundation models, laying the foundation for future research in this pivotal area.
Updated: 2024-05-15 10:16:25
标题: 揭示文本、图像、视频和音频基础模型中的幻觉:一项全面回顾
摘要: 基于语言、图像、音频和视频领域的基础模型(FMs)的快速发展展示了在多样任务中的显著能力。然而,FMs的广泛应用带来了一个关键挑战:在高风险应用中产生虚构输出的潜力。基础模型产生虚构内容的倾向,可以说是阻碍它们在现实世界场景中广泛采用的最大障碍,特别是在可靠性和准确性至关重要的领域。本综述论文全面概述了最近旨在识别和减轻FMs中幻觉问题的发展,涵盖了文本、图像、视频和音频模态。通过综合各种模态中检测和减轻幻觉的最新进展,本文旨在为研究人员、开发人员和从业者提供宝贵见解。从本质上讲,它建立了一个清晰的框架,包括定义、分类和检测策略,以解决多模式基础模型中的幻觉问题,为这一关键领域的未来研究奠定基础。
更新时间: 2024-05-15 10:16:25
领域: cs.LG,cs.AI,cs.CL,cs.CV,cs.SD,eess.AS
Perception-Inspired Graph Convolution for Music Understanding Tasks
We propose a new graph convolutional block, called MusGConv, specifically designed for the efficient processing of musical score data and motivated by general perceptual principles. It focuses on two fundamental dimensions of music, pitch and rhythm, and considers both relative and absolute representations of these components. We evaluate our approach on four different musical understanding problems: monophonic voice separation, harmonic analysis, cadence detection, and composer identification which, in abstract terms, translate to different graph learning problems, namely, node classification, link prediction, and graph classification. Our experiments demonstrate that MusGConv improves the performance on three of the aforementioned tasks while being conceptually very simple and efficient. We interpret this as evidence that it is beneficial to include perception-informed processing of fundamental musical concepts when developing graph network applications on musical score data.
Updated: 2024-05-15 10:04:44
标题: 灵感启发的图卷积用于音乐理解任务
摘要: 我们提出了一种新的图卷积块,称为MusGConv,专门设计用于高效处理音乐谱数据,并受到一般知觉原则的启发。它专注于音乐的两个基本维度,音高和节奏,并考虑这些组成部分的相对和绝对表示。我们在四个不同的音乐理解问题上评估了我们的方法:单声部声音分离、和声分析、终止音检测和作曲家识别,这在抽象术语中,可以转化为不同的图学习问题,即节点分类、链接预测和图分类。我们的实验表明,MusGConv在三个前述任务上提高了性能,同时在概念上非常简单和高效。我们解释这是有益的,当开发音乐谱数据上的图网络应用时,包含基本音乐概念的感知信息处理。
更新时间: 2024-05-15 10:04:44
领域: cs.SD,cs.AI,cs.LG,eess.AS
Word Alignment as Preference for Machine Translation
The problem of hallucination and omission, a long-standing problem in machine translation (MT), is more pronounced when a large language model (LLM) is used in MT because an LLM itself is susceptible to these phenomena. In this work, we mitigate the problem in an LLM-based MT model by guiding it to better word alignment. We first study the correlation between word alignment and the phenomena of hallucination and omission in MT. Then we propose to utilize word alignment as preference to optimize the LLM-based MT model. The preference data are constructed by selecting chosen and rejected translations from multiple MT tools. Subsequently, direct preference optimization is used to optimize the LLM-based model towards the preference signal. Given the absence of evaluators specifically designed for hallucination and omission in MT, we further propose selecting hard instances and utilizing GPT-4 to directly evaluate the performance of the models in mitigating these issues. We verify the rationality of these designed evaluation methods by experiments, followed by extensive results demonstrating the effectiveness of word alignment-based preference optimization to mitigate hallucination and omission.
Updated: 2024-05-15 10:04:19
标题: 单词对齐作为机器翻译的偏好
摘要: 机器翻译(MT)中长期存在的幻觉和遗漏问题在使用大型语言模型(LLM)时更加突出,因为LLM本身容易受到这些现象的影响。在这项工作中,我们通过引导LLM生成更好的单词对齐来缓解基于LLM的MT模型中的问题。我们首先研究了单词对齐与MT中幻觉和遗漏现象之间的相关性。然后我们提出利用单词对齐作为偏好来优化基于LLM的MT模型。偏好数据是通过从多个MT工具中选择所选和被拒绝的翻译来构建的。随后,我们使用直接偏好优化来使LLM模型朝向偏好信号进行优化。鉴于目前没有专门设计用于MT中幻觉和遗漏问题的评估器,我们进一步提出选择困难实例,并利用GPT-4直接评估模型在减轻这些问题方面的表现。通过实验验证了这些设计评估方法的合理性,随后展示了基于单词对齐的偏好优化有效减轻幻觉和遗漏问题的广泛结果。
更新时间: 2024-05-15 10:04:19
领域: cs.CL,cs.AI
Bridging the gap in online hate speech detection: a comparative analysis of BERT and traditional models for homophobic content identification on X/Twitter
Our study addresses a significant gap in online hate speech detection research by focusing on homophobia, an area often neglected in sentiment analysis research. Utilising advanced sentiment analysis models, particularly BERT, and traditional machine learning methods, we developed a nuanced approach to identify homophobic content on X/Twitter. This research is pivotal due to the persistent underrepresentation of homophobia in detection models. Our findings reveal that while BERT outperforms traditional methods, the choice of validation technique can impact model performance. This underscores the importance of contextual understanding in detecting nuanced hate speech. By releasing the largest open-source labelled English dataset for homophobia detection known to us, an analysis of various models' performance and our strongest BERT-based model, we aim to enhance online safety and inclusivity. Future work will extend to broader LGBTQIA+ hate speech detection, addressing the challenges of sourcing diverse datasets. Through this endeavour, we contribute to the larger effort against online hate, advocating for a more inclusive digital landscape. Our study not only offers insights into the effective detection of homophobic content by improving on previous research results, but it also lays groundwork for future advancements in hate speech analysis.
Updated: 2024-05-15 10:02:47
标题: 填补在线仇恨言论检测领域的空白:对比分析 BERT 和传统模型在 X/Twitter 上同性恋内容识别中的应用
摘要: 我们的研究填补了在线仇恨言论检测研究中的一个重要空白,重点关注同性恋恐惧症,在情感分析研究中经常被忽视。利用先进的情感分析模型,特别是BERT和传统的机器学习方法,我们开发了一种细致的方法来识别X/Twitter上的同性恋恐惧内容。这项研究至关重要,因为同性恋恐惧在检测模型中持续被低估。我们的研究结果显示,虽然BERT胜过传统方法,但验证技术的选择可能会影响模型的性能。这强调了在检测微妙仇恨言论中的背景理解的重要性。通过发布我们所知的最大的开源标记的英文数据集,用于检测同性恋恐惧,分析各种模型的性能以及我们最强大的基于BERT的模型,我们旨在增强在线安全性和包容性。未来的工作将扩展到更广泛的LGBTQIA+仇恨言论检测,解决多样化数据集的挑战。通过这一努力,我们为反对在线仇恨做出贡献,倡导更具包容性的数字景观。我们的研究不仅通过改进先前的研究结果,提供了对同性恋恐惧内容有效检测的见解,还为未来仇恨言论分析的进展奠定了基础。
更新时间: 2024-05-15 10:02:47
领域: cs.CL,cs.AI,cs.LG,H.5; I.2; J.5
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models
In this paper, we present the findings of our Project ALPINE which stands for ``Autoregressive Learning for Planning In NEtworks." Project ALPINE initiates a theoretical investigation into the development of planning capabilities in Transformer-based language models through their autoregressive learning mechanisms, aiming to identify any potential limitations in their planning abilities. We abstract planning as a network path-finding task where the objective is to generate a valid path from a specified source node to a designated target node. In terms of expressiveness, we show that the Transformer is capable of executing path-finding by embedding the adjacency and reachability matrices within its weights. Our theoretical analysis of the gradient-based learning dynamic of the Transformer reveals that the Transformer is capable of learning both the adjacency matrix and a limited form of the reachability matrix. These theoretical insights are then validated through experiments, which demonstrate that the Transformer indeed learns the adjacency matrix and an incomplete reachability matrix, which aligns with the predictions made in our theoretical analysis. Additionally, when applying our methodology to a real-world planning benchmark, called Blocksworld, our observations remain consistent. Our theoretical and empirical analyses further unveil a potential limitation of Transformer in path-finding: it cannot identify reachability relationships through transitivity, and thus would fail when path concatenation is needed to generate a path. In summary, our findings shed new light on how the internal mechanisms of autoregressive learning enable planning in networks. This study may contribute to our understanding of the general planning capabilities in other related domains.
Updated: 2024-05-15 09:59:37
标题: ALPINE:揭示自回归学习在语言模型中的规划能力
摘要: 在本文中,我们介绍了我们的ALPINE项目的发现,ALPINE代表“Autoregressive Learning for Planning In NEtworks(网络规划的自回归学习)”。ALPINE项目通过自回归学习机制对基于Transformer的语言模型的规划能力进行了理论研究,旨在识别它们规划能力中的潜在限制。我们将规划抽象为网络路径查找任务,其中的目标是从指定的源节点生成一个有效路径到指定的目标节点。在表达能力方面,我们展示了Transformer能够通过将邻接矩阵和可达性矩阵嵌入其权重来执行路径查找。我们对Transformer基于梯度的学习动态的理论分析表明,Transformer能够学习邻接矩阵和有限形式的可达性矩阵。这些理论洞见随后通过实验证实,证明了Transformer确实学习了邻接矩阵和不完整的可达性矩阵,与我们理论分析的预测一致。此外,当将我们的方法应用于一个名为Blocksworld的真实世界规划基准时,我们的观察结果保持一致。我们的理论和实证分析进一步揭示了Transformer在路径查找中的潜在限制:它无法通过传递性识别可达性关系,因此在需要路径连接以生成路径时会失败。总之,我们的研究结果为我们如何通过自回归学习的内部机制在网络中进行规划提供了新的视角。这项研究可能有助于我们了解其他相关领域的一般规划能力。
更新时间: 2024-05-15 09:59:37
领域: cs.LG,cs.AI,cs.CL
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
We introduce Xmodel-VLM, a cutting-edge multimodal vision language model. It is designed for efficient deployment on consumer GPU servers. Our work directly confronts a pivotal industry issue by grappling with the prohibitive service costs that hinder the broad adoption of large-scale multimodal systems. Through rigorous training, we have developed a 1B-scale language model from the ground up, employing the LLaVA paradigm for modal alignment. The result, which we call Xmodel-VLM, is a lightweight yet powerful multimodal vision language model. Extensive testing across numerous classic multimodal benchmarks has revealed that despite its smaller size and faster execution, Xmodel-VLM delivers performance comparable to that of larger models. Our model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/XmodelVLM.
Updated: 2024-05-15 09:47:59
标题: Xmodel-VLM:多模态视觉语言模型的简单基线
摘要: 我们介绍了Xmodel-VLM,这是一种先进的多模态视觉语言模型。它专为在消费者GPU服务器上高效部署而设计。我们的工作直接应对了一个关键的行业问题,即应对阻碍大规模多模态系统广泛采用的高昂服务成本。通过严格的训练,我们从零开始开发了一个10亿规模的语言模型,采用了LLaVA范式进行模态对齐。我们称之为Xmodel-VLM的结果是一个轻量而强大的多模态视觉语言模型。在众多经典的多模态基准测试中进行了广泛测试,结果显示,尽管尺寸较小且执行速度较快,Xmodel-VLM的性能与更大的模型相当。我们的模型检查点和代码可以在GitHub上公开获取,网址为https://github.com/XiaoduoAILab/XmodelVLM。
更新时间: 2024-05-15 09:47:59
领域: cs.CV,cs.AI
SOMTP: Self-Supervised Learning-Based Optimizer for MPC-Based Safe Trajectory Planning Problems in Robotics
Model Predictive Control (MPC)-based trajectory planning has been widely used in robotics, and incorporating Control Barrier Function (CBF) constraints into MPC can greatly improve its obstacle avoidance efficiency. Unfortunately, traditional optimizers are resource-consuming and slow to solve such non-convex constrained optimization problems (COPs) while learning-based methods struggle to satisfy the non-convex constraints. In this paper, we propose SOMTP algorithm, a self-supervised learning-based optimizer for CBF-MPC trajectory planning. Specifically, first, SOMTP employs problem transcription to satisfy most of the constraints. Then the differentiable SLPG correction is proposed to move the solution closer to the safe set and is then converted as the guide policy in the following training process. After that, inspired by the Augmented Lagrangian Method (ALM), our training algorithm integrated with guide policy constraints is proposed to enable the optimizer network to converge to a feasible solution. Finally, experiments show that the proposed algorithm has better feasibility than other learning-based methods and can provide solutions much faster than traditional optimizers with similar optimality.
Updated: 2024-05-15 09:38:52
标题: SOMTP:基于自监督学习的机器人基于MPC的安全轨迹规划问题优化器
摘要: 基于模型预测控制(MPC)的轨迹规划被广泛应用于机器人领域,将控制屏障函数(CBF)约束整合进MPC可以极大提高其避障效率。然而,传统优化器在解决这种非凸约束优化问题(COPs)时耗费资源且速度缓慢,而基于学习的方法则难以满足非凸约束。本文提出了一种名为SOMTP的算法,这是一种基于自监督学习的CBF-MPC轨迹规划优化器。具体而言,SOMTP首先采用问题转录来满足大部分约束。然后提出了可微分的SLPG修正,将解决方案移动到安全集合附近,并在接下来的训练过程中转换为引导策略。接着,受到增广拉格朗日方法(ALM)的启发,我们提出了集成引导策略约束的训练算法,使优化器网络能够收敛到一个可行解。最后,实验证明,所提出的算法比其他基于学习的方法具有更好的可行性,并且能够比传统优化器更快地提供类似优化度的解决方案。
更新时间: 2024-05-15 09:38:52
领域: cs.RO,cs.LG
DDE-Find: Learning Delay Differential Equations from Noisy, Limited Data
Delay Differential Equations (DDEs) are a class of differential equations that can model diverse scientific phenomena. However, identifying the parameters, especially the time delay, that make a DDE's predictions match experimental results can be challenging. We introduce DDE-Find, a data-driven framework for learning a DDE's parameters, time delay, and initial condition function. DDE-Find uses an adjoint-based approach to efficiently compute the gradient of a loss function with respect to the model parameters. We motivate and rigorously prove an expression for the gradients of the loss using the adjoint. DDE-Find builds upon recent developments in learning DDEs from data and delivers the first complete framework for learning DDEs from data. Through a series of numerical experiments, we demonstrate that DDE-Find can learn DDEs from noisy, limited data.
Updated: 2024-05-15 09:34:20
标题: DDE-Find:从嘈杂、有限数据中学习时滞微分方程
摘要: 延迟微分方程(DDEs)是一类可以模拟各种科学现象的微分方程。然而,确定参数,特别是时间延迟,以使DDE的预测与实验结果相匹配可能具有挑战性。我们引入了DDE-Find,这是一个基于数据驱动的框架,用于学习DDE的参数、时间延迟和初始条件函数。DDE-Find使用基于伴随法的方法来高效计算损失函数对模型参数的梯度。我们推导并严格证明了使用伴随法计算损失梯度的表达式。DDE-Find建立在最近从数据中学习DDE的发展基础上,并提供了首个完整的从数据中学习DDE的框架。通过一系列数值实验,我们证明了DDE-Find可以从嘈杂、有限的数据中学习DDE。
更新时间: 2024-05-15 09:34:20
领域: cs.LG
Generation of Granular-Balls for Clustering Based on the Principle of Justifiable Granularity
Efficient and robust data clustering remains a challenging task in the field of data analysis. Recent efforts have explored the integration of granular-ball (GB) computing with clustering algorithms to address this challenge, yielding promising results. However, existing methods for generating GBs often rely on single indicators to measure GB quality and employ threshold-based or greedy strategies, potentially leading to GBs that do not accurately capture the underlying data distribution. To address these limitations, this article introduces a novel GB generation method. The originality of this method lies in leveraging the principle of justifiable granularity to measure the quality of a GB for clustering tasks. To be precise, we define the coverage and specificity of a GB and introduce a comprehensive measure for assessing GB quality. Utilizing this quality measure, the method incorporates a binary tree pruning-based strategy and an anomaly detection method to determine the best combination of sub-GBs for each GB and identify abnormal GBs, respectively. Compared to previous GB generation methods, the new method maximizes the overall quality of generated GBs while ensuring alignment with the data distribution, thereby enhancing the rationality of the generated GBs. Experimental results obtained from both synthetic and publicly available datasets underscore the effectiveness of the proposed GB generation method, showcasing improvements in clustering accuracy and normalized mutual information.
Updated: 2024-05-15 09:29:58
标题: 基于可证明粒度原则的基于聚类的颗粒球生成
摘要: 高效而健壮的数据聚类仍然是数据分析领域的一个具有挑战性的任务。最近的努力探索了将粗粒球(GB)计算与聚类算法集成以解决这一挑战,取得了令人鼓舞的结果。然而,现有的生成GB的方法通常依赖于单一指标来衡量GB的质量,并采用基于阈值或贪婪策略,可能导致GB不能准确地捕捉基础数据分布。为了解决这些限制,本文介绍了一种新颖的GB生成方法。这种方法的独创性在于利用合理粒度的原则来衡量GB在聚类任务中的质量。更准确地说,我们定义了GB的覆盖范围和特异性,并引入了一个全面的衡量GB质量的方法。利用这一质量衡量方法,该方法结合了基于二叉树修剪的策略和异常检测方法,以确定每个GB的最佳子GB组合并分别识别异常GB。与先前的GB生成方法相比,新方法在确保与数据分布一致的情况下最大化了生成的GB的整体质量,从而提高了生成的GB的合理性。从合成和公开可用数据集获得的实验结果强调了所提出的GB生成方法的有效性,展示了在聚类准确性和标准化互信息方面的改善。
更新时间: 2024-05-15 09:29:58
领域: cs.LG
MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space
Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and diffusion to SBDD, including mode collapse and hybrid continuous-discrete space. In this paper, we introduce MolCRAFT, the first SBDD model that operates in the continuous parameter space, together with a novel noise reduced sampling strategy. Empirical results show that our model consistently achieves superior performance in binding affinity with more stable 3D structure, demonstrating our ability to accurately model interatomic interactions. To our best knowledge, MolCRAFT is the first to achieve reference-level Vina Scores (-6.59 kcal/mol) with comparable molecular size, outperforming other strong baselines by a wide margin (-0.84 kcal/mol). Code is available at https://github.com/AlgoMole/MolCRAFT.
Updated: 2024-05-15 09:26:38
标题: MolCRAFT: 基于结构的药物设计在连续参数空间中
摘要: 近年来,基于结构的药物设计(SBDD)的生成模型已经显示出令人鼓舞的结果。现有的研究主要集中在如何生成具有更高结合亲和力的分子,忽略了生成的3D姿态的可行性前提,导致了假阳性。我们对应用自回归方法和扩散到SBDD时的不良构象问题的关键因素进行了彻底研究,包括模式崩溃和混合连续-离散空间。在本文中,我们介绍了MolCRAFT,这是第一个在连续参数空间中运行的SBDD模型,以及一种新颖的降噪抽样策略。实证结果显示,我们的模型始终在结合亲和力方面表现出优越性能,具有更稳定的3D结构,表明我们能够准确地建模原子间相互作用。据我们所知,MolCRAFT是第一个以可比较的分子大小实现参考级Vina分数(-6.59 kcal/mol),并且大幅超过其他强大基线(-0.84 kcal/mol)。代码可在https://github.com/AlgoMole/MolCRAFT 上找到。
更新时间: 2024-05-15 09:26:38
领域: q-bio.BM,cs.LG
Training Deep Learning Models with Hybrid Datasets for Robust Automatic Target Detection on real SAR images
In this work, we propose to tackle several challenges hindering the development of Automatic Target Detection (ATD) algorithms for ground targets in SAR images. To address the lack of representative training data, we propose a Deep Learning approach to train ATD models with synthetic target signatures produced with the MOCEM simulator. We define an incrustation pipeline to incorporate synthetic targets into real backgrounds. Using this hybrid dataset, we train ATD models specifically tailored to bridge the domain gap between synthetic and real data. Our approach notably relies on massive physics-based data augmentation techniques and Adversarial Training of two deep-learning detection architectures. We then test these models on several datasets, including (1) patchworks of real SAR images, (2) images with the incrustation of real targets in real backgrounds, and (3) images with the incrustation of synthetic background objects in real backgrounds. Results show that the produced hybrid datasets are exempt from image overlay bias. Our approach can reach up to 90% of Average Precision on real data while exclusively using synthetic targets for training.
Updated: 2024-05-15 09:26:24
标题: 使用混合数据集训练深度学习模型,以实现对真实合成孔径雷达图像的稳健自动目标检测
摘要: 在这项工作中,我们提出解决几个阻碍合成孔径雷达(SAR)图像中地面目标自动检测(ATD)算法发展的挑战。为了解决缺乏代表性训练数据的问题,我们提出了一种深度学习方法,利用MOCEM模拟器生成的合成目标特征来训练ATD模型。我们定义了一个嵌入管道,将合成目标融入真实背景中。利用这种混合数据集,我们训练了专门针对合成和真实数据之间领域差距的ATD模型。我们的方法主要依赖于大规模基于物理的数据增强技术和对两个深度学习检测架构进行对抗训练。然后我们在几个数据集上测试这些模型,包括(1)真实SAR图像的拼贴图,(2)真实背景中真实目标的图像,以及(3)真实背景中合成背景对象的图像。结果显示,生成的混合数据集不受图像叠加偏差的影响。我们的方法在仅使用合成目标进行训练的情况下,可以在真实数据上达到高达90%的平均精度。
更新时间: 2024-05-15 09:26:24
领域: cs.CV,cs.AI,cs.LG,eess.SP
A first look into Utiq: Next-generation cookies at the ISP level
Online privacy has become increasingly important in recent years. While third-party cookies have been widely used for years, they have also been criticized for their potential impact on user privacy. They can be used by advertisers to track users across multiple sites, allowing them to build detailed profiles of their behavior and interests. However, nowadays, many browsers allow users to block third-party cookies, which limits their usefulness for advertisers. In this paper, we take a first look at Utiq, a new way of user tracking performed directly by the ISP, to substitute the third-party cookies used until now. We study the main properties of this new identification methodology and their adoption on the 10K most popular websites. Our results show that, although still marginal due to the restrictions imposed by the system, between 0.7% and 1.2% of websites already include Utiq as one of their user identification methods.
Updated: 2024-05-15 09:23:59
标题: “Utiq初探:ISP级别的下一代cookies”
摘要: 在线隐私在近年来变得越来越重要。虽然第三方cookie多年来被广泛使用,但也因其对用户隐私的潜在影响而受到批评。它们可以被广告商用来跟踪用户在多个网站上的活动,从而建立他们行为和兴趣的详细档案。然而,如今,许多浏览器允许用户阻止第三方cookie,从而限制了它们对广告商的有用性。在本文中,我们首次介绍了Utiq,这是一种由ISP直接执行的新型用户跟踪方式,以替代目前使用的第三方cookie。我们研究了这种新的识别方法的主要特性及其在排名前10000的最受欢迎的网站上的采用情况。我们的结果显示,尽管由于系统强加的限制,Utiq在网站中的使用仍然很小众,但已有0.7%至1.2%的网站将Utiq作为其用户识别方法之一。
更新时间: 2024-05-15 09:23:59
领域: cs.CR,cs.NI
Lens functions for exploring UMAP Projections with Domain Knowledge
Dimensionality reduction algorithms are often used to visualise high-dimensional data. Previously, studies have used prior information to enhance or suppress expected patterns in projections. In this paper, we adapt such techniques for domain knowledge guided interactive exploration. Inspired by Mapper and STAD, we present three types of lens functions for UMAP, a state-of-the-art dimensionality reduction algorithm. Lens functions enable analysts to adapt projections to their questions, revealing otherwise hidden patterns. They filter the modelled connectivity to explore the interaction between manually selected features and the data's structure, creating configurable perspectives each potentially revealing new insights. The effectiveness of the lens functions is demonstrated in two use cases and their computational cost is analysed in a synthetic benchmark. Our implementation is available in an open-source Python package: https://github.com/vda-lab/lensed_umap.
Updated: 2024-05-15 09:23:21
标题: 利用领域知识探索UMAP投影的镜头功能
摘要: 降维算法通常用于可视化高维数据。先前的研究已经使用先验信息来增强或抑制投影中的预期模式。在本文中,我们将这些技术应用于领域知识引导的交互式探索。受Mapper和STAD的启发,我们为UMAP提出了三种类型的镜头函数,这是一种最先进的降维算法。镜头功能使分析人员可以根据问题调整投影,揭示否则隐藏的模式。它们过滤了建模的连接性,以探索手动选择的特征和数据结构之间的交互,创建可配置的透视图,每个可能揭示新的见解。镜头功能的有效性在两个用例中得到证明,并且它们的计算成本在一个合成基准测试中进行了分析。我们的实现可在开源Python软件包https://github.com/vda-lab/lensed_umap 中获取。
更新时间: 2024-05-15 09:23:21
领域: cs.LG,cs.CG,cs.HC
MINDE: Mutual Information Neural Diffusion Estimation
In this work we present a new method for the estimation of Mutual Information (MI) between random variables. Our approach is based on an original interpretation of the Girsanov theorem, which allows us to use score-based diffusion models to estimate the Kullback Leibler divergence between two densities as a difference between their score functions. As a by-product, our method also enables the estimation of the entropy of random variables. Armed with such building blocks, we present a general recipe to measure MI, which unfolds in two directions: one uses conditional diffusion process, whereas the other uses joint diffusion processes that allow simultaneous modelling of two random variables. Our results, which derive from a thorough experimental protocol over all the variants of our approach, indicate that our method is more accurate than the main alternatives from the literature, especially for challenging distributions. Furthermore, our methods pass MI self-consistency tests, including data processing and additivity under independence, which instead are a pain-point of existing methods.
Updated: 2024-05-15 09:21:41
标题: MINDE:相互信息神经扩散估计
摘要: 在这项工作中,我们提出了一种估计随机变量之间互信息(MI)的新方法。我们的方法基于对Girsanov定理的原始解释,这使我们能够使用基于得分的扩散模型来估计两个密度之间的Kullback Leibler散度,将其视为它们的得分函数之间的差异。作为副产品,我们的方法还能够估计随机变量的熵。借助这些基本构建模块,我们提出了一种衡量MI的通用方法,该方法展开成两个方向:一个使用条件扩散过程,而另一个使用联合扩散过程,允许同时对两个随机变量进行建模。我们的结果源自对我们方法的所有变体进行彻底实验协议的推导,表明我们的方法比文献中的主要替代方法更准确,特别适用于具有挑战性的分布。此外,我们的方法通过了MI自洽性测试,包括数据处理和独立性下的可加性,而这些是现有方法的痛点。
更新时间: 2024-05-15 09:21:41
领域: cs.LG,stat.ML
UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning
Recently, a noticeable trend has emerged in developing pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound understanding of existing pre-training methods, including 2D graph masking, 2D-3D contrastive learning, and 3D denoising, hampers the advancement of molecular foundation models. In this work, we provide a unified comprehension of existing pre-training methods through the lens of contrastive learning. Thus their distinctions lie in clustering different views of molecules, which is shown beneficial to specific downstream tasks. To achieve a complete and general-purpose molecular representation, we propose a novel pre-training framework, named UniCorn, that inherits the merits of the three methods, depicting molecular views in three different levels. SOTA performance across quantum, physicochemical, and biological tasks, along with comprehensive ablation study, validate the universality and effectiveness of UniCorn.
Updated: 2024-05-15 09:20:02
标题: UniCorn:一种统一的对比学习方法,用于多视角分子表示学习
摘要: 最近,在计算机视觉和自然语言处理领域,开发预训练基础模型的明显趋势已经出现。然而,对于分子预训练,缺乏一个能够有效应用于各种分子任务类别的通用模型,因为现有普遍的预训练方法对特定类型的下游任务表现出有效性。此外,缺乏对现有预训练方法的深刻理解,包括2D图掩膜、2D-3D对比学习和3D去噪,阻碍了分子基础模型的进展。在这项工作中,我们通过对比学习的视角提供了对现有预训练方法的统一理解。因此,它们的区别在于对分子的不同视图进行聚类,这对特定的下游任务有益。为了实现完整和通用的分子表示,我们提出了一个新的预训练框架,名为UniCorn,继承了这三种方法的优点,展示了三个不同级别的分子视图。在量子、物理化学和生物任务中表现出顶尖的性能,以及全面的消融研究,验证了UniCorn的普适性和有效性。
更新时间: 2024-05-15 09:20:02
领域: q-bio.BM,cs.AI
Sliced-Wasserstein Estimation with Spherical Harmonics as Control Variates
The Sliced-Wasserstein (SW) distance between probability measures is defined as the average of the Wasserstein distances resulting for the associated one-dimensional projections. As a consequence, the SW distance can be written as an integral with respect to the uniform measure on the sphere and the Monte Carlo framework can be employed for calculating the SW distance. Spherical harmonics are polynomials on the sphere that form an orthonormal basis of the set of square-integrable functions on the sphere. Putting these two facts together, a new Monte Carlo method, hereby referred to as Spherical Harmonics Control Variates (SHCV), is proposed for approximating the SW distance using spherical harmonics as control variates. The resulting approach is shown to have good theoretical properties, e.g., a no-error property for Gaussian measures under a certain form of linear dependency between the variables. Moreover, an improved rate of convergence, compared to Monte Carlo, is established for general measures. The convergence analysis relies on the Lipschitz property associated to the SW integrand. Several numerical experiments demonstrate the superior performance of SHCV against state-of-the-art methods for SW distance computation.
Updated: 2024-05-15 09:17:13
标题: 用球谐作为控制变量的切割Wasserstein估计
摘要: 概述:概率测度之间的切片Wasserstein(SW)距离被定义为关联的一维投影导致的Wasserstein距离的平均值。因此,SW距离可以写成关于球面上均匀测度的积分,并且可以利用蒙特卡罗框架来计算SW距离。球谐函数是球面上的多项式,形成球面上可积函数集的正交基。结合这两个事实,提出了一种新的蒙特卡罗方法,称为球谐控制变量(SHCV),用于使用球谐作为控制变量来逼近SW距离。结果表明,该方法具有良好的理论性质,例如,在某种线性变量依赖形式下,对于高斯测度具有无误差特性。此外,与蒙特卡罗相比,已建立了一种更快的收敛速度,适用于一般测度。收敛分析依赖于与SW被积函数相关的Lipschitz性质。几个数值实验证明了SHCV相对于SW距离计算的最先进方法的优越性能。
更新时间: 2024-05-15 09:17:13
领域: stat.ML,cs.LG,65C05 (Primary) 65D30, 68Txx, 68Wxx (Secondary)
OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks
We introduce a self-supervised pretraining method, called OcFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to the model in the 3D space through distillation from a self-supervised pretrained image foundation model. Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios. Moreover, empirical results affirm the efficacy of integrating feature distillation with 3D occupancy prediction in our pretraining approach.
Updated: 2024-05-15 09:16:03
标题: OccFeat:自监督占用特征预测用于预训练BEV分割网络
摘要: 我们介绍了一种自监督的预训练方法,称为OcFeat,用于仅基于摄像头的鸟瞰图(BEV)分割网络。通过OccFeat,我们通过占用预测和特征蒸馏任务对BEV网络进行预训练。占用预测为模型提供了对场景的三维几何理解。然而,学习到的几何形状是与类别无关的。因此,我们通过从自监督预训练的图像基础模型中蒸馏语义信息到模型的三维空间中来增加语义信息。使用我们的方法预训练的模型在BEV语义分割性能方面表现出改善,特别是在数据稀少的情况下。此外,实证结果证实了在我们的预训练方法中将特征蒸馏与三维占用预测相结合的功效。
更新时间: 2024-05-15 09:16:03
领域: cs.CV,cs.LG
A Unified Sequence Parallelism Approach for Long Context Generative AI
Sequence parallelism (SP), which divides the sequence dimension of input tensors across multiple computational devices, is becoming key to unlocking the long-context capabilities of generative AI models. This paper investigates the state-of-the-art SP approaches, i.e. DeepSpeed-Ulysses and Ring-Attention, and proposes a unified SP approach, which is more robust to transformer model architectures and network hardware topology. This paper compares the communication and memory cost of SP and existing parallelism, including data/tensor/zero/expert/pipeline parallelism, and discusses the best practices for designing hybrid 4D parallelism involving SP. We achieved 86% MFU on two 8xA800 nodes using SP for sequence length 208K for the LLAMA3-8B model. Our code is publicly available on \url{https://github.com/feifeibear/long-context-attention}.
Updated: 2024-05-15 09:12:55
标题: 一个统一的序列并行方法用于长上下文生成AI
摘要: 序列并行性(SP)将输入张量的序列维度跨越多个计算设备进行划分,已成为解锁生成式人工智能模型长上下文能力的关键。本文调查了最先进的SP方法,即DeepSpeed-Ulysses和Ring-Attention,并提出了一种统一的SP方法,对Transformer模型架构和网络硬件拓扑更具鲁棒性。本文比较了SP和现有并行方法(包括数据/张量/零/专家/流水线并行)的通信和内存成本,并讨论了涉及SP的混合4D并行的最佳实践。我们在两个8xA800节点上使用SP实现了LLAMA3-8B模型序列长度208K的86% MFU。我们的代码可以公开访问\url{https://github.com/feifeibear/long-context-attention}。
更新时间: 2024-05-15 09:12:55
领域: cs.LG,cs.AI
On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms
Building upon recent works on linesearch-free adaptive proximal gradient methods, this paper proposes adaPG$^{q,r}$, a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds. Different choices of the parameters $q$ and $r$ are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations. In an attempt to better understand the underlying theory, its convergence is established in a more general setting that allows for time-varying parameters. Finally, an adaptive alternating minimization algorithm is presented by exploring the dual setting. This algorithm not only incorporates additional adaptivity, but also expands its applicability beyond standard strongly convex settings.
Updated: 2024-05-15 09:05:09
标题: 关于自适应一阶方法的收敛性:近端梯度和交替最小化算法
摘要: 建立在最近关于无线搜索自适应近端梯度方法的研究基础上,本文提出了adaPG$^{q,r}$框架,通过提供更大步长策略和改进的下界,统一并扩展了现有结果。讨论了不同参数$q$和$r$的选择,并通过数值模拟展示了由此产生的方法的有效性。为了更好地理解基本理论,其收敛性在允许时间变化参数的更一般设置中得到了验证。最后,通过探索对偶设置,提出了一种自适应交替最小化算法。该算法不仅融入了额外的适应性,而且扩展了其适用性,超出了标准强凸设置。
更新时间: 2024-05-15 09:05:09
领域: math.OC,cs.LG,65K05, 90C06, 90C25, 90C30, 90C47
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust
Widely deployed large language models (LLMs) can produce convincing yet incorrect outputs, potentially misleading users who may rely on them as if they were correct. To reduce such overreliance, there have been calls for LLMs to communicate their uncertainty to end users. However, there has been little empirical work examining how users perceive and act upon LLMs' expressions of uncertainty. We explore this question through a large-scale, pre-registered, human-subject experiment (N=404) in which participants answer medical questions with or without access to responses from a fictional LLM-infused search engine. Using both behavioral and self-reported measures, we examine how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance. We find that first-person expressions (e.g., "I'm not sure, but...") decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy. An exploratory analysis suggests that this increase can be attributed to reduced (but not fully eliminated) overreliance on incorrect answers. While we observe similar effects for uncertainty expressed from a general perspective (e.g., "It's not clear, but..."), these effects are weaker and not statistically significant. Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters. This highlights the importance of user testing before deploying LLMs at scale.
Updated: 2024-05-15 09:04:54
标题: "我不确定,但……":探讨大型语言模型的不确定性表达对用户依赖和信任的影响
摘要: 广泛部署的大型语言模型(LLMs)可能会产生令人信服但不正确的输出,可能会误导依赖它们的用户。为了减少这种过度依赖,有人呼吁LLMs向最终用户传达其不确定性。然而,很少有实证工作来研究用户如何感知和对LLMs不确定性表达做出反应。我们通过一个大规模、预先注册的人类实验(N=404)来探讨这个问题,参与者需要回答医学问题,有或没有访问来自虚构的LLM-infused搜索引擎的回复。通过行为和自我报告的措施,我们研究了不同自然语言表达不确定性如何影响参与者的依赖程度、信任度和整体任务表现。我们发现第一人称表达(例如,“我不确定,但是...”)会降低参与者对系统的信心和赞同系统答案的倾向,同时提高参与者的准确性。初步分析表明,这种提高可以归因于对不正确答案的依赖减少(但并非完全消除)。虽然我们观察到从一般角度表达的不确定性也会产生类似的影响(例如,“不清楚,但是...”),但这些影响较弱且不具有统计学显著性。我们的发现表明,使用自然语言表达不确定性可能是减少对LLMs过度依赖的有效方法,但使用的确切语言很重要。这突显了在大规模部署LLMs之前进行用户测试的重要性。
更新时间: 2024-05-15 09:04:54
领域: cs.HC,cs.AI
QMedShield: A Novel Quantum Chaos-based Image Encryption Scheme for Secure Medical Image Storage in the Cloud
In the age of digital technology, medical images play a crucial role in the healthcare industry which aids surgeons in making precise decisions and reducing the diagnosis time. However, the storage of large amounts of these images in third-party cloud services raises privacy and security concerns. There are a lot of classical security mechanisms to protect them. Although, the advent of quantum computing entails the development of quantum-based encryption models for healthcare. Hence, we introduce a novel quantum chaos-based encryption scheme for medical images in this article. The model comprises bit-plane scrambling, quantum logistic map, quantum operations in the diffusion phase and hybrid chaotic map, DNA encoding, and computations in the confusion phase to transform the plain medical image into a cipher medical image. The proposed scheme has been evaluated using multiple statistical measures and validated against more attacks such as differential attacks with three different medical datasets. Hence the introduced encryption model has proved to be attack-resistant and robust than other existing image encryption schemes, ensuring the secure storage of medical images in cloud environments.
Updated: 2024-05-15 08:56:16
标题: QMedShield:一种新颖的基于量子混沌的图像加密方案,用于在云中安全存储医学图像
摘要: 在数字技术时代,医学图像在医疗保健行业中发挥着至关重要的作用,帮助外科医生做出精确决策并缩短诊断时间。然而,大量这些图像存储在第三方云服务中引发了隐私和安全方面的担忧。有许多经典的安全机制用来保护它们。然而,量子计算的出现意味着需要开发基于量子的加密模型用于医疗保健。因此,我们在本文中介绍了一种新颖的基于混沌的量子加密方案用于医学图像。该模型包括位平面混淆、量子逻辑映射、扩散阶段中的量子操作和混合混沌映射、DNA编码以及混乱阶段中的计算,将普通医学图像转换为密文医学图像。提出的方案已经使用多种统计指标进行评估,并针对更多攻击,如带有三种不同医学数据集的差分攻击进行了验证。因此,引入的加密模型已被证明比其他现有图像加密方案更具攻击抵抗能力和鲁棒性,确保医学图像在云环境中的安全存储。
更新时间: 2024-05-15 08:56:16
领域: cs.CR,cs.MM
Advancing Explainable AI with Causal Analysis in Large-Scale Fuzzy Cognitive Maps
In the quest for accurate and interpretable AI models, eXplainable AI (XAI) has become crucial. Fuzzy Cognitive Maps (FCMs) stand out as an advanced XAI method because of their ability to synergistically combine and exploit both expert knowledge and data-driven insights, providing transparency and intrinsic interpretability. This letter introduces and investigates the "Total Causal Effect Calculation for FCMs" (TCEC-FCM) algorithm, an innovative approach that, for the first time, enables the efficient calculation of total causal effects among concepts in large-scale FCMs by leveraging binary search and graph traversal techniques, thereby overcoming the challenge of exhaustive causal path exploration that hinder existing methods. We evaluate the proposed method across various synthetic FCMs that demonstrate TCEC-FCM's superior performance over exhaustive methods, marking a significant advancement in causal effect analysis within FCMs, thus broadening their usability for modern complex XAI applications.
Updated: 2024-05-15 08:53:47
标题: 在大规模模糊认知图中推进因果分析的可解释人工智能
摘要: 在追求准确和可解释的人工智能模型中,可解释的人工智能(XAI)变得至关重要。模糊认知图(FCMs)以其能够协同结合和利用专业知识和数据驱动见解的能力而脱颖而出,提供透明度和内在可解释性。本文介绍并研究了“FCMs的总因果效应计算”(TCEC-FCM)算法,这是一种创新方法,首次通过利用二分查找和图遍历技术,实现了对大规模FCMs中概念之间的总因果效应的高效计算,从而克服了阻碍现有方法的详尽因果路径探索的挑战。我们通过在各种合成FCMs上评估所提出的方法,展示了TCEC-FCM在详尽方法上的卓越表现,标志着在FCMs内因果效应分析方面的重大进展,从而拓宽了它们在现代复杂XAI应用中的可用性。
更新时间: 2024-05-15 08:53:47
领域: cs.AI
Transforming gradient-based techniques into interpretable methods
The explication of Convolutional Neural Networks (CNN) through xAI techniques often poses challenges in interpretation. The inherent complexity of input features, notably pixels extracted from images, engenders complex correlations. Gradient-based methodologies, exemplified by Integrated Gradients (IG), effectively demonstrate the significance of these features. Nevertheless, the conversion of these explanations into images frequently yields considerable noise. Presently, we introduce GAD (Gradient Artificial Distancing) as a supportive framework for gradient-based techniques. Its primary objective is to accentuate influential regions by establishing distinctions between classes. The essence of GAD is to limit the scope of analysis during visualization and, consequently reduce image noise. Empirical investigations involving occluded images have demonstrated that the identified regions through this methodology indeed play a pivotal role in facilitating class differentiation.
Updated: 2024-05-15 08:52:23
标题: 将基于梯度的技术转化为可解释的方法
摘要: 通过xAI技术解释卷积神经网络(CNN)经常在解释上面临挑战。从图像中提取的像素等输入特征的固有复杂性引发了复杂的相关性。以集成梯度(IG)为例的基于梯度的方法有效地展示了这些特征的重要性。然而,将这些解释转化为图像常常产生大量噪音。目前,我们引入了GAD(梯度人工距离)作为基于梯度技术的支持框架。其主要目标是通过建立类别之间的区别来突出影响力区域。GAD的本质是在可视化过程中限制分析范围,从而减少图像噪音。对遮挡图像进行的实证调查表明,通过这种方法确定的区域确实在促进类别区分方面起着关键作用。
更新时间: 2024-05-15 08:52:23
领域: cs.CV,cs.AI,cs.LG
Cross-Input Certified Training for Universal Perturbations
Existing work in trustworthy machine learning primarily focuses on single-input adversarial perturbations. In many real-world attack scenarios, input-agnostic adversarial attacks, e.g. universal adversarial perturbations (UAPs), are much more feasible. Current certified training methods train models robust to single-input perturbations but achieve suboptimal clean and UAP accuracy, thereby limiting their applicability in practical applications. We propose a novel method, CITRUS, for certified training of networks robust against UAP attackers. We show in an extensive evaluation across different datasets, architectures, and perturbation magnitudes that our method outperforms traditional certified training methods on standard accuracy (up to 10.3\%) and achieves SOTA performance on the more practical certified UAP accuracy metric.
Updated: 2024-05-15 08:33:41
标题: 跨输入认证培训用于通用扰动
摘要: 现有的值得信赖的机器学习工作主要集中在单输入对抗扰动上。在许多现实攻击场景中,无关输入的对抗攻击,例如通用对抗性扰动(UAPs),更加可行。当前的认证训练方法训练模型抵抗单输入扰动,但在保持干净和UAP准确性方面表现亚优,从而限制了它们在实际应用中的适用性。我们提出了一种新颖的方法CITRUS,用于认证训练网络以抵御UAP攻击者。我们在不同数据集、架构和扰动幅度上进行了广泛评估,结果表明我们的方法在标准准确性方面(高达10.3%)优于传统的认证训练方法,并在更实际的认证UAP准确性指标上达到了SOTA性能。
更新时间: 2024-05-15 08:33:41
领域: cs.LG,cs.CR
Hyperparameter Importance Analysis for Multi-Objective AutoML
Hyperparameter optimization plays a pivotal role in enhancing the predictive performance and generalization capabilities of ML models. However, in many applications, we do not only care about predictive performance but also about objectives such as inference time, memory, or energy consumption. In such MOO scenarios, determining the importance of hyperparameters poses a significant challenge due to the complex interplay between the conflicting objectives. In this paper, we propose the first method for assessing the importance of hyperparameters in the context of multi-objective hyperparameter optimization. Our approach leverages surrogate-based hyperparameter importance (HPI) measures, i.e. fANOVA and ablation paths, to provide insights into the impact of hyperparameters on the optimization objectives. Specifically, we compute the a-priori scalarization of the objectives and determine the importance of the hyperparameters for different objective tradeoffs. Through extensive empirical evaluations on diverse benchmark datasets with three different objectives paired with accuracy, namely time, demographic parity, and energy consumption, we demonstrate the effectiveness and robustness of our proposed method. Our findings not only offer valuable guidance for hyperparameter tuning in MOO tasks but also contribute to advancing the understanding of HPI in complex optimization scenarios.
Updated: 2024-05-15 08:32:56
标题: 多目标自动机器学习的超参数重要性分析
摘要: 超参数优化在增强机器学习模型的预测性能和泛化能力方面起着至关重要的作用。然而,在许多应用中,我们不仅关心预测性能,还关心推理时间、内存或能耗等目标。在这种多目标优化场景中,由于相互冲突的目标之间复杂的相互作用,确定超参数的重要性是一个重大挑战。本文提出了首个用于评估超参数重要性的方法,这在多目标超参数优化的背景下起作用。我们的方法利用基于替代模型的超参数重要性(HPI)度量,即fANOVA和消融路径,以提供超参数对优化目标的影响的见解。具体来说,我们计算目标的先验标量化,确定不同目标权衡下的超参数重要性。通过在配对准确性的三个不同目标(时间、人口平等和能耗)的多样化基准数据集上进行广泛的实证评估,我们展示了我们提出的方法的有效性和稳健性。我们的发现不仅为多目标优化任务中的超参数调优提供宝贵指导,还有助于推动在复杂优化场景中对HPI的理解。
更新时间: 2024-05-15 08:32:56
领域: cs.LG,cs.AI
Large Language Models can be Guided to Evade AI-Generated Text Detection
Large language models (LLMs) have shown remarkable performance in various tasks and have been extensively utilized by the public. However, the increasing concerns regarding the misuse of LLMs, such as plagiarism and spamming, have led to the development of multiple detectors, including fine-tuned classifiers and statistical methods. In this study, we equip LLMs with prompts, rather than relying on an external paraphraser, to evaluate the vulnerability of these detectors. We propose a novel Substitution-based In-Context example Optimization method (SICO) to automatically construct prompts for evading the detectors. SICO is cost-efficient as it requires only 40 human-written examples and a limited number of LLM inferences to generate a prompt. Moreover, once a task-specific prompt has been constructed, it can be universally used against a wide range of detectors. Extensive experiments across three real-world tasks demonstrate that SICO significantly outperforms the paraphraser baselines and enables GPT-3.5 to successfully evade six detectors, decreasing their AUC by 0.5 on average. Furthermore, a comprehensive human evaluation show that the SICO-generated text achieves human-level readability and task completion rates, while preserving high imperceptibility. Finally, we propose an ensemble approach to enhance the robustness of detectors against SICO attack. The code is publicly available at https://github.com/ColinLu50/Evade-GPT-Detector.
Updated: 2024-05-15 08:00:09
标题: 大型语言模型可以被引导以规避人工智能生成的文本检测
摘要: 大型语言模型(LLMs)在各种任务中表现出色,并被广泛应用。然而,对LLMs滥用的担忧日益增加,例如抄袭和垃圾邮件,导致了多种检测器的开发,包括经过微调的分类器和统计方法。在本研究中,我们为LLMs配备提示而不是依赖外部改述器,以评估这些检测器的脆弱性。我们提出了一种新颖的基于替换的上下文示例优化方法(SICO),用于自动构建提示以规避检测器。SICO具有成本效益,在生成提示时仅需要40个人工编写的示例和有限数量的LLM推断。此外,一旦构建了特定任务的提示,它就可以普遍用于对抗各种检测器。通过三个真实任务的广泛实验表明,SICO明显优于改述器基线,并使GPT-3.5成功规避了六个检测器,平均降低其AUC值0.5。此外,全面的人类评估显示,SICO生成的文本达到了人类水平的可读性和任务完成率,同时保持了高不可感知性。最后,我们提出了一种集成方法来增强检测器对抗SICO攻击的鲁棒性。该代码公开可在https://github.com/ColinLu50/Evade-GPT-Detector上获得。
更新时间: 2024-05-15 08:00:09
领域: cs.CL,cs.AI
Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation
The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal alignment. Additionally, existing methods for similar historical cases retrieval face suboptimal performance owing to the modal gap issue. In response, this paper introduces a novel method, named Factual Serialization Enhancement (FSE), for chest X-ray report generation. FSE begins with the structural entities approach to eliminate presentation-style vocabulary in reports, providing specific input for our model. Then, uni-modal features are learned through cross-modal alignment between images and factual serialization in reports. Subsequently, we present a novel approach to retrieve similar historical cases from the training set, leveraging aligned image features. These features implicitly preserve semantic similarity with their corresponding reference reports, enabling us to calculate similarity solely among aligned features. This effectively eliminates the modal gap issue for knowledge retrieval without the requirement for disease labels. Finally, the cross-modal fusion network is employed to query valuable information from these cases, enriching image features and aiding the text decoder in generating high-quality reports. Experiments on MIMIC-CXR and IU X-ray datasets from both specific and general scenarios demonstrate the superiority of FSE over state-of-the-art approaches in both natural language generation and clinical efficacy metrics.
Updated: 2024-05-15 07:56:38
标题: 真实序列化增强:胸部X光报告生成的关键创新
摘要: 图像报告撰写的自动化对于减轻放射科医生的工作量是一种有价值的工具。这一过程中的关键步骤包括医学图像和报告之间的跨模态对齐,以及检索类似的历史病例。然而,报告中存在演示风格词汇(例如句子结构和语法)给跨模态对齐带来了挑战。此外,现有的类似历史病例检索方法由于模态差距问题而表现不佳。为此,本文介绍了一种名为Factual Serialization Enhancement(FSE)的新方法,用于胸部X射线报告生成。FSE采用结构实体方法开始,以消除报告中的演示风格词汇,为我们的模型提供具体输入。然后,通过图像和报告中的事实序列化之间的跨模态对齐学习单模特征。随后,我们提出了一种新的方法,从训练集中检索类似的历史病例,利用对齐的图像特征。这些特征隐式地保持与其相应的参考报告的语义相似性,使我们能够仅在对齐特征之间计算相似性,有效消除了知识检索中的模态差距问题,而无需疾病标签。最后,交叉模态融合网络用于查询这些病例中的有价值信息,丰富图像特征,并帮助文本解码器生成高质量的报告。在来自MIMIC-CXR和IU X射线数据集的特定和一般情景下进行的实验证明了FSE在自然语言生成和临床效果指标上优于现有技术方法。
更新时间: 2024-05-15 07:56:38
领域: eess.IV,cs.AI,cs.CV
A Semi-Automated Solution Approach Recommender for a Given Use Case: a Case Study for AI/ML in Oncology via Scopus and OpenAI
Nowadays, literature review is a necessary task when trying to solve a given problem. However, an exhaustive literature review is very time-consuming in today's vast literature landscape. It can take weeks, even if looking only for abstracts or surveys. Moreover, choosing a method among others, and targeting searches within relevant problem and solution domains, are not easy tasks. These are especially true for young researchers or engineers starting to work in their field. Even if surveys that provide methods used to solve a specific problem already exist, an automatic way to do it for any use case is missing, especially for those who don't know the existing literature. Our proposed tool, SARBOLD-LLM, allows discovering and choosing among methods related to a given problem, providing additional information about their uses in the literature to derive decision-making insights, in only a few hours. The SARBOLD-LLM comprises three modules: (1: Scopus search) paper selection using a keyword selection scheme to query Scopus API; (2: Scoring and method extraction) relevancy and popularity scores calculation and solution method extraction in papers utilizing OpenAI API (GPT 3.5); (3: Analyzes) sensitivity analysis and post-analyzes which reveals trends, relevant papers and methods. Comparing the SARBOLD-LLM to manual ground truth using precision, recall, and F1-score metrics, the performance results of AI in the oncology case study are 0.68, 0.9, and 0.77, respectively. SARBOLD-LLM demonstrates successful outcomes across various domains, showcasing its robustness and effectiveness. The SARBOLD-LLM addresses engineers more than researchers, as it proposes methods and trends without adding pros and cons. It is a useful tool to select which methods to investigate first and comes as a complement to surveys. This can limit the global search and accumulation of knowledge for the end user. However...
Updated: 2024-05-15 07:46:58
标题: 给定用例的半自动化解决方案推荐器:基于Scopus和OpenAI的肿瘤学中AI/ML的案例研究
摘要: 现在,文献综述在解决特定问题时是一项必要的任务。然而,在当今广阔的文献领域中,进行详尽的文献综述非常耗时。即使只查找摘要或调查,也可能需要数周时间。此外,在众多选择中选择一种方法,并在相关问题和解决方案领域内进行搜索,并非易事。这对于刚开始在自己领域工作的年轻研究人员或工程师来说尤为困难。即使已经存在提供解决特定问题所使用的方法的调查,也缺乏一种自动方式来处理任何用例,尤其是对于不了解现有文献的人来说。我们提出的工具SARBOLD-LLM允许发现和选择与给定问题相关的方法之间的关系,并提供有关它们在文献中的使用情况的额外信息,以便在短短几小时内推导出决策见解。SARBOLD-LLM包括三个模块:(1:Scopus搜索)使用关键词选择方案查询Scopus API进行论文选择;(2:评分和方法提取)利用OpenAI API(GPT 3.5)计算相关度和流行度分数,并提取解决方法;(3:分析)敏感性分析和后续分析揭示趋势、相关论文和方法。通过将SARBOLD-LLM与手动基本真相进行比较,使用精确度、召回率和F1分数指标,在肿瘤学案例研究中,人工智能的性能结果分别为0.68、0.9和0.77。SARBOLD-LLM在各个领域展示了成功的结果,展示了其稳健性和有效性。SARBOLD-LLM更多地面向工程师而不是研究人员,因为它提出方法和趋势,而不添加利弊。这是一个有用的工具,可选择首先调查哪些方法,并作为调查的补充。这可以限制最终用户的全球搜索和知识积累。然而...
更新时间: 2024-05-15 07:46:58
领域: cs.AI,cs.IR,cs.LG
Correlation Dimension of Natural Language in a Statistical Manifold
The correlation dimension of natural language is measured by applying the Grassberger-Procaccia algorithm to high-dimensional sequences produced by a large-scale language model. This method, previously studied only in a Euclidean space, is reformulated in a statistical manifold via the Fisher-Rao distance. Language exhibits a multifractal, with global self-similarity and a universal dimension around 6.5, which is smaller than those of simple discrete random sequences and larger than that of a Barab\'asi-Albert process. Long memory is the key to producing self-similarity. Our method is applicable to any probabilistic model of real-world discrete sequences, and we show an application to music data.
Updated: 2024-05-15 07:46:01
标题: 自然语言的统计流形中的相关维数
摘要: 自然语言的相关维度是通过将Grassberger-Procaccia算法应用于由大规模语言模型产生的高维序列来衡量的。这种方法之前只在欧几里德空间中研究过,现在通过Fisher-Rao距离在统计流形中重新制定。语言表现出多重分形特征,具有全局自相似性和大约6.5的普遍维度,比简单的离散随机序列小,比Barab\'asi-Albert过程大。长期记忆是产生自相似性的关键。我们的方法适用于任何真实世界离散序列的概率模型,并展示了对音乐数据的应用。
更新时间: 2024-05-15 07:46:01
领域: cs.CL,cond-mat.stat-mech,cs.AI
Adapting Abstract Meaning Representation Parsing to the Clinical Narrative -- the SPRING THYME parser
This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME) corpus, we adapted a state-of-the-art AMR parser utilizing continuous training. Our approach incorporates data augmentation techniques to enhance the accuracy of AMR structure predictions. Notably, through this learning strategy, our parser achieved an impressive F1 score of 88% on the THYME corpus's colon cancer dataset. Moreover, our research delved into the efficacy of data required for domain adaptation within the realm of clinical notes, presenting domain adaptation data requirements for AMR parsing. This exploration not only underscores the parser's robust performance but also highlights its potential in facilitating a deeper understanding of clinical narratives through structured semantic representations.
Updated: 2024-05-15 07:32:43
标题: 将抽象意义表示解析调整到临床叙述——SPRING THYME解析器
摘要: 本文致力于设计和评估专门针对临床笔记的第一个AMR解析器。我们的目标是促进临床笔记准确转化为结构化AMR表达式,从而增强大规模临床文本数据的可解释性和可用性。通过利用来自“您医疗事件的时间历史”(THYME)语料库的结肠癌数据集,我们采用了最新的AMR解析器并进行了持续训练。我们的方法融合了数据增强技术,以提高AMR结构预测的准确性。值得注意的是,通过这种学习策略,我们的解析器在THYME语料库的结肠癌数据集上取得了令人印象深刻的88%的F1得分。此外,我们的研究探讨了临床笔记领域适应所需数据的有效性,为AMR解析提供了领域适应数据需求。这种探索不仅突显了解析器的强大性能,还突显了通过结构化语义表示促进对临床叙述更深入理解的潜力。
更新时间: 2024-05-15 07:32:43
领域: cs.CL,cs.LG
Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching Tasks
In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability, which may pose challenges in ensuring stability and safety. To address these issues, we propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy, all while actively engaging in the learning phase through interactions with the environment. This approach circumvents the control performance and complexities associated with computations while addressing nonrepetitive reaching tasks in the presence of obstacles. First, a model-free DRL agent is employed to plan velocity-bounded motion for a manipulator with 'n' degrees of freedom (DoF), ensuring collision avoidance for the end-effector through joint-level reasoning. The generated reference motion is then input into a robust subsystem-based adaptive controller, which produces the necessary torques, while the cuckoo search optimization (CSO) algorithm enhances control gains to minimize the stabilization and tracking error in the steady state. This approach guarantees robustness and uniform exponential convergence in an unfamiliar environment, despite the presence of uncertainties and disturbances. Theoretical assertions are validated through the presentation of simulation outcomes.
Updated: 2024-05-15 07:31:35
标题: 将深度强化学习与机器人机械臂的鲁棒低级控制集成,用于非重复性到达任务
摘要: 在机器人技术领域,当代策略是基于学习的,具有复杂的黑盒特性和缺乏可解释性,这可能会在确保稳定性和安全性方面带来挑战。为了解决这些问题,我们提出了将基于深度强化学习(DRL)的无碰撞轨迹规划器与一种新颖的自动调节低级控制策略相结合,同时通过与环境的交互积极参与学习阶段。这种方法规避了与计算相关的控制性能和复杂性,同时解决了在存在障碍物的情况下非重复到达任务。首先,采用无模型的DRL代理规划具有'n'自由度(DoF)的机械手的速度受限运动,通过关节级推理确保末端执行器的避碰。然后将生成的参考运动输入到基于鲁棒子系统的自适应控制器中,该控制器产生必要的扭矩,同时杜鹃搜索优化(CSO)算法增强控制增益,以最小化稳态中的稳定和跟踪误差。这种方法保证了在陌生环境中的稳健性和均匀指数收敛,尽管存在不确定性和干扰。理论断言通过模拟结果的展示得到了验证。
更新时间: 2024-05-15 07:31:35
领域: cs.RO,cs.LG,cs.SY,eess.SY
An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding
Genomic selection (GS), as a critical crop breeding strategy, plays a key role in enhancing food production and addressing the global hunger crisis. The predominant approaches in GS currently revolve around employing statistical methods for prediction. However, statistical methods often come with two main limitations: strong statistical priors and linear assumptions. A recent trend is to capture the non-linear relationships between markers by deep learning. However, as crop datasets are commonly long sequences with limited samples, the robustness of deep learning models, especially Transformers, remains a challenge. In this work, to unleash the unexplored potential of attention mechanism for the task of interest, we propose a simple yet effective Transformer-based framework that enables end-to-end training of the whole sequence. Via experiments on rice3k and wheat3k datasets, we show that, with simple tricks such as k-mer tokenization and random masking, Transformer can achieve overall superior performance against seminal methods on GS tasks of interest.
Updated: 2024-05-15 07:31:06
标题: 一个尴尬简单的方法,用于增强转换器在作物育种基因组选择中的性能
摘要: 基因组选择(GS)作为一种关键的作物育种策略,在增加粮食产量和解决全球饥饿危机方面起着关键作用。目前,GS中主要的方法围绕着使用统计方法进行预测。然而,统计方法经常存在两个主要限制:强大的统计先验和线性假设。最近的一个趋势是通过深度学习来捕捉标记之间的非线性关系。然而,由于作物数据集通常是具有有限样本的长序列,深度学习模型的稳健性,特别是Transformer,仍然是一个挑战。在这项工作中,为了释放关注机制在感兴趣任务中的未开发潜力,我们提出了一个简单而有效的基于Transformer的框架,使整个序列的端到端训练成为可能。通过对rice3k和wheat3k数据集的实验,我们展示了,通过简单的技巧如k-mer标记化和随机屏蔽,Transformer在感兴趣的GS任务上可以实现整体优越的性能。
更新时间: 2024-05-15 07:31:06
领域: cs.LG,cs.AI
Optimal Multi-Distribution Learning
Multi-distribution learning (MDL), which seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions, has emerged as a unified framework in response to the evolving demand for robustness, fairness, multi-group collaboration, etc. Achieving data-efficient MDL necessitates adaptive sampling, also called on-demand sampling, throughout the learning process. However, there exist substantial gaps between the state-of-the-art upper and lower bounds on the optimal sample complexity. Focusing on a hypothesis class of Vapnik-Chervonenkis (VC) dimension d, we propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon^2 (modulo some logarithmic factor), matching the best-known lower bound. Our algorithmic ideas and theory are further extended to accommodate Rademacher classes. The proposed algorithms are oracle-efficient, which access the hypothesis class solely through an empirical risk minimization oracle. Additionally, we establish the necessity of randomization, revealing a large sample size barrier when only deterministic hypotheses are permitted. These findings resolve three open problems presented in COLT 2023 (i.e., citet[Problems 1, 3 and 4]{awasthi2023sample}).
Updated: 2024-05-15 07:29:44
标题: 最佳多分布学习
摘要: 多分布学习(MDL)旨在学习一个共享模型,最大程度地减少$k$个不同数据分布之间的最坏情况风险,已经成为一个统一的框架,以响应对鲁棒性、公平性、多组合作等不断增长的需求。实现数据高效的MDL需要在整个学习过程中进行自适应抽样,也称为按需抽样。然而,在最佳样本复杂性的上下界之间存在实质性差距。我们专注于一个Vapnik-Chervonenkis(VC)维度$d$的假设类,提出了一种新算法,可以生成一个$varepsilon$-最优的随机假设,其样本复杂度大致为$(d+k)/varepsilon^2$(除以一些对数因子),与已知的最佳下界相匹配。我们进一步将我们的算法思想和理论扩展到适应Rademacher类。所提出的算法是Oracle高效的,仅通过经验风险最小化Oracle访问假设类。 此外,我们确认了随机化的必要性,揭示了仅允许确定性假设时的大样本量障碍。这些发现解决了COLT 2023提出的三个开放问题(即cite{awasthi2023sample}中的问题1、3和4)。
更新时间: 2024-05-15 07:29:44
领域: cs.LG,stat.ML
Revisiting the Role of Language Priors in Vision-Language Models
Vision-language models (VLMs) are impactful in part because they can be applied to a variety of visual understanding tasks in a zero-shot fashion, without any fine-tuning. We study $\textit{generative VLMs}$ that are trained for next-word generation given an image. We explore their zero-shot performance on the illustrative task of image-text retrieval across 8 popular vision-language benchmarks. Our first observation is that they can be repurposed for discriminative tasks (such as image-text retrieval) by simply computing the match score of generating a particular text string given an image. We call this probabilistic score the $\textit{Visual Generative Pre-Training Score}$ (VisualGPTScore). While the VisualGPTScore produces near-perfect accuracy on some retrieval benchmarks, it yields poor accuracy on others. We analyze this behavior through a probabilistic lens, pointing out that some benchmarks inadvertently capture unnatural language distributions by creating adversarial but unlikely text captions. In fact, we demonstrate that even a "blind" language model that ignores any image evidence can sometimes outperform all prior art, reminiscent of similar challenges faced by the visual-question answering (VQA) community many years ago. We derive a probabilistic post-processing scheme that controls for the amount of linguistic bias in generative VLMs at test time without having to retrain or fine-tune the model. We show that the VisualGPTScore, when appropriately debiased, is a strong zero-shot baseline for vision-language understanding, oftentimes producing state-of-the-art accuracy.
Updated: 2024-05-15 07:15:05
标题: 重新审视语言先验在视觉语言模型中的作用
摘要: 视觉语言模型(VLMs)具有影响力,部分原因在于它们可以以零调整的方式应用于各种视觉理解任务,而无需任何微调。我们研究了训练用于给定图像生成下一个单词的生成式VLMs。我们探索它们在八个流行的视觉语言基准测试上的零调整性能,其中包括图像文本检索任务。我们的第一个观察结果是,它们可以通过简单计算给定图像生成特定文本字符串的匹配分数来重新用于区分性任务(如图像文本检索)。我们将这个概率分数称为“视觉生成预训练分数”(VisualGPTScore)。虽然VisualGPTScore在一些检索基准测试上产生几乎完美的准确性,但在其他基准测试上产生准确性较低。我们通过概率角度分析了这种行为,指出一些基准测试通过创建敌对但不太可能的文本标题无意中捕捉到不自然的语言分布。事实上,我们证明了即使是一个“盲目”的语言模型,忽略任何图像证据,有时也可以胜过所有以往的艺术作品,这让人想起多年前视觉问答(VQA)社区面临的类似挑战。我们推导出一个概率后处理方案,可以在测试时控制生成式VLMs中的语言偏见程度,而无需重新训练或微调模型。我们展示了当适当进行去偏置时,VisualGPTScore是视觉语言理解的强大零调整基准线,通常产生最先进的准确性。
更新时间: 2024-05-15 07:15:05
领域: cs.CV,cs.AI,cs.CL
PPFlow: Target-aware Peptide Design with Torsional Flow Matching
Therapeutic peptides have proven to have great pharmaceutical value and potential in recent decades. However, methods of AI-assisted peptide drug discovery are not fully explored. To fill the gap, we propose a target-aware peptide design method called \textsc{PPFlow}, based on conditional flow matching on torus manifolds, to model the internal geometries of torsion angles for the peptide structure design. Besides, we establish a protein-peptide binding dataset named PPBench2024 to fill the void of massive data for the task of structure-based peptide drug design and to allow the training of deep learning methods. Extensive experiments show that PPFlow reaches state-of-the-art performance in tasks of peptide drug generation and optimization in comparison with baseline models, and can be generalized to other tasks including docking and side-chain packing.
Updated: 2024-05-15 07:09:35
标题: PPFlow: 针对目标的肽设计与扭转流匹配
摘要: 治疗性肽在近几十年来已被证明具有很大的药用价值和潜力。然而,AI辅助的肽药物发现方法尚未完全被探索。为了填补这一空白,我们提出了一种基于环面流匹配的目标感知肽设计方法,称为\textsc{PPFlow},以模拟肽结构设计中的扭转角的内部几何结构。此外,我们建立了一个名为PPBench2024的蛋白质-肽结合数据集,以填补结构基础肽药物设计任务的大量数据空白,并允许深度学习方法的训练。大量实验证明,与基线模型相比,PPFlow在肽药物生成和优化任务中达到了最先进的性能,并且可以推广到其他任务,包括对接和侧链包装。
更新时间: 2024-05-15 07:09:35
领域: q-bio.BM,cs.AI,cs.LG
Nonparametric regression using over-parameterized shallow ReLU neural networks
It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the H\"older space with smoothness $\alpha<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.
Updated: 2024-05-15 07:05:06
标题: 过参数化浅ReLU神经网络在非参数回归中的应用
摘要: 研究表明,如果神经网络的参数过多,且权重受到适当约束或正则化,可以实现对于某些平滑函数类的学习函数的极小-最优收敛速度(在对数因子内)。具体来说,我们考虑通过使用浅层ReLU神经网络来估计未知的$d$变量函数的非参数回归。假设回归函数来自具有光滑度$\alpha<(d+3)/2$的H\"older空间,或者对应于浅层神经网络的变分空间,可以看作是一个无限宽的神经网络。在这种情况下,我们证明了基于对权重施加一定范数约束的浅层神经网络的最小二乘估计器是极小-最优的,如果网络宽度足够大。作为副产品,我们推导出了浅层ReLU神经网络的局部Rademacher复杂性的一个新的与大小无关的界限,这可能是独立感兴趣的。
更新时间: 2024-05-15 07:05:06
领域: stat.ML,cs.LG,math.ST,stat.TH
Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings
Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. In this work, we study the efficacy of continued pretraining (CPT) in adapting Wav2vec2.0 to the classroom domain. We show that CPT is a powerful tool in that regard and reduces the Word Error Rate (WER) of Wav2vec2.0-based models by upwards of 10%. More specifically, CPT improves the model's robustness to different noises, microphones, classroom conditions as well as classroom demographics. Our CPT models show improved ability to generalize to different demographics unseen in the labeled finetuning data.
Updated: 2024-05-15 06:59:33
标题: Wav2vec2.0领域自适应中的持续预训练在小学数学课堂环境中的自动语音识别
摘要: 在开发AI工具以帮助教师和学生方面,创建能够在课堂环境下稳健和弹性的自动语音识别(ASR)系统至关重要。在这项工作中,我们研究了持续预训练(CPT)在适应Wav2vec2.0到课堂领域的有效性。我们展示了CPT在这方面是一个强大的工具,并且能够将基于Wav2vec2.0的模型的词错误率(WER)降低超过10%。更具体地说,CPT提高了模型对不同噪音、麦克风、课堂条件以及课堂人口统计学的稳健性。我们的CPT模型显示出更好的能力,能够推广到标记的微调数据中未见的不同人口统计学。
更新时间: 2024-05-15 06:59:33
领域: cs.CL,cs.AI,eess.AS
Overcoming Domain Drift in Online Continual Learning
Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. However, OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks, leading to a biased forgetting of prior knowledge. Moreover, the continual doman drift in sequential learning tasks may entail the gradual displacement of the decision boundaries in the learned feature space, rendering the learned knowledge susceptible to forgetting. To address the above problem, in this paper, we propose a novel rehearsal strategy, termed Drift-Reducing Rehearsal (DRR), to anchor the domain of old tasks and reduce the negative transfer effects. First, we propose to select memory for more representative samples guided by constructed centroids in a data stream. Then, to keep the model from domain chaos in drifting, a two-level angular cross-task Contrastive Margin Loss (CML) is proposed, to encourage the intra-class and intra-task compactness, and increase the inter-class and inter-task discrepancy. Finally, to further suppress the continual domain drift, we present an optional Centorid Distillation Loss (CDL) on the rehearsal memory to anchor the knowledge in feature space for each previous old task. Extensive experimental results on four benchmark datasets validate that the proposed DRR can effectively mitigate the continual domain drift and achieve the state-of-the-art (SOTA) performance in OCL.
Updated: 2024-05-15 06:57:18
标题: 克服在线持续学习中的领域漂移
摘要: 在线持续学习(OCL)赋予机器学习模型在线跨一系列任务获取新知识的能力。然而,OCL面临着一个重要挑战:灾难性遗忘,即在遇到新任务时,先前学习的模型被大幅覆盖,导致先前知识的偏向性遗忘。此外,在连续学习任务中的持续领域漂移可能导致学习特征空间中决策边界的逐渐移位,使得学习到的知识容易被遗忘。为了解决上述问题,在本文中,我们提出了一种新颖的复述策略,称为减少漂移的复述(DRR),以锚定旧任务的领域并减少负迁移效应。首先,我们建议通过在数据流中构建的质心来选择更具代表性的样本进行记忆。然后,为了使模型不陷入漂移中的领域混乱,提出了一个两级角跨任务对比边缘损失(CML),以鼓励类内和任务内的紧凑性,并增加类间和任务间的差异。最后,为了进一步抑制持续领域漂移,我们在复述记忆上提出了一个可选的质心蒸馏损失(CDL),以在特征空间中锚定每个以前的旧任务的知识。对四个基准数据集的广泛实验结果验证了我们提出的DRR可以有效减轻持续领域漂移并在OCL中实现最先进的性能。
更新时间: 2024-05-15 06:57:18
领域: cs.LG
HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition
Internal Language Model (LM)-based methods use permutation language modeling (PLM) to solve the error correction caused by conditional independence in external LM-based methods. However, random permutations of human interference cause fit oscillations in the model training, and Iterative Refinement (IR) operation to improve multimodal information decoupling also introduces additional overhead. To address these issues, this paper proposes the Hierarchical Attention autoregressive Model with Adaptive Permutation (HAAP) to enhance the location-context-image interaction capability, improving autoregressive generalization with internal LM. First, we propose Implicit Permutation Neurons (IPN) to generate adaptive attention masks to dynamically exploit token dependencies. The adaptive masks increase the diversity of training data and prevent model dependency on a specific order. It reduces the training overhead of PLM while avoiding training fit oscillations. Second, we develop Cross-modal Hierarchical Attention mechanism (CHA) to couple context and image features. This processing establishes rich positional semantic dependencies between context and image while avoiding IR. Extensive experimental results show the proposed HAAP achieves state-of-the-art (SOTA) performance in terms of accuracy, complexity, and latency on several datasets.
Updated: 2024-05-15 06:41:43
标题: HAAP: 带有自适应排列的视觉上下文层次化注意力自回归模型用于场景文本识别
摘要: 基于内部语言模型(LM)的方法使用置换语言建模(PLM)来解决外部LM方法中由条件独立性引起的错误校正问题。然而,人为干扰的随机置换会导致模型训练中的拟合振荡,而用于改善多模态信息解耦的迭代细化(IR)操作也会引入额外的开销。为解决这些问题,本文提出了具有自适应置换的分层注意力自回归模型(HAAP),以增强位置-上下文-图像交互能力,提高内部LM的自回归泛化能力。首先,我们提出了隐式置换神经元(IPN)来生成自适应注意力掩模,动态利用标记依赖关系。自适应掩模增加了训练数据的多样性,防止模型对特定顺序的依赖。它降低了PLM的训练开销,同时避免了训练拟合振荡。其次,我们开发了跨模态分层注意力机制(CHA)来耦合上下文和图像特征。这种处理建立了上下文和图像之间丰富的位置语义依赖关系,同时避免了IR。大量实验结果显示,所提出的HAAP在准确性、复杂性和延迟方面在多个数据集上实现了最先进的性能。
更新时间: 2024-05-15 06:41:43
领域: cs.CV,cs.AI,68T01,I.2.10
Classification by sparse generalized additive models
We consider (nonparametric) sparse (generalized) additive models (SpAM) for classification. The design of a SpAM classifier is based on minimizing the logistic loss with a sparse group Lasso/Slope-type penalties on the coefficients of univariate additive components' expansions in orthonormal series (e.g., Fourier or wavelets). The resulting classifier is inherently adaptive to the unknown sparsity and smoothness. We show that under certain sparse group restricted eigenvalue condition it is nearly-minimax (up to log-factors) simultaneously across the entire range of analytic, Sobolev and Besov classes. The performance of the proposed classifier is illustrated on a simulated and a real-data examples.
Updated: 2024-05-15 06:33:03
标题: 用稀疏广义加性模型进行分类
摘要: 我们考虑(非参数)稀疏(广义)加性模型(SpAM)用于分类。SpAM分类器的设计基于在正交系列(例如,傅立叶或小波)中对单变量加性分量的系数施加稀疏组Lasso/Slope类型惩罚来最小化逻辑损失。结果分类器固有地适应未知的稀疏性和平滑性。我们展示在某些稀疏组限制特征值条件下,它几乎是最小最大化(在对数因子范围内)同时覆盖整个分析、Sobolev和Besov类。所提出的分类器的性能在模拟和真实数据示例上进行了说明。
更新时间: 2024-05-15 06:33:03
领域: math.ST,cs.LG,stat.ME,stat.TH
Easy attention: A simple attention mechanism for temporal predictions with transformers
To improve the robustness of transformer neural networks used for temporal-dynamics prediction of chaotic systems, we propose a novel attention mechanism called easy attention which we demonstrate in time-series reconstruction and prediction. While the standard self attention only makes use of the inner product of queries and keys, it is demonstrated that the keys, queries and softmax are not necessary for obtaining the attention score required to capture long-term dependencies in temporal sequences. Through the singular-value decomposition (SVD) on the softmax attention score, we further observe that self attention compresses the contributions from both queries and keys in the space spanned by the attention score. Therefore, our proposed easy-attention method directly treats the attention scores as learnable parameters. This approach produces excellent results when reconstructing and predicting the temporal dynamics of chaotic systems exhibiting more robustness and less complexity than self attention or the widely-used long short-term memory (LSTM) network. We show the improved performance of the easy-attention method in the Lorenz system, a turbulence shear flow and a model of a nuclear reactor.
Updated: 2024-05-15 06:32:46
标题: 简易关注:使用变压器进行时间预测的简单关注机制
摘要: 为了提高用于混沌系统时间动力学预测的transformer神经网络的鲁棒性,我们提出了一种称为easy attention的新型注意机制,我们在时间序列重建和预测中进行了演示。虽然标准的自注意力机制仅利用查询和键的内积,但实验证明,获取捕捉时间序列中长期依赖关系所需的注意力分数并不需要键、查询和softmax。通过对softmax注意力分数进行奇异值分解(SVD),我们进一步观察到自注意力在注意力分数所张成的空间中压缩了来自查询和键的贡献。因此,我们提出的easy-attention方法直接将注意力分数视为可学习参数。这种方法在重建和预测表现出更好的结果,比自注意力或广泛使用的长短期记忆(LSTM)网络更具鲁棒性和简单性。我们展示了easy-attention方法在Lorenz系统、湍流剪切流和核反应堆模型中的改进性能。
更新时间: 2024-05-15 06:32:46
领域: cs.LG
Standard Gaussian Process Can Be Excellent for High-Dimensional Bayesian Optimization
There has been a long-standing and widespread belief that Bayesian Optimization (BO) with standard Gaussian process (GP), referred to as standard BO, is ineffective in high-dimensional optimization problems. While this belief sounds reasonable, strong empirical evidence is lacking. In this paper, we systematically investigated BO with standard GP regression across a variety of synthetic and real-world benchmark problems for high-dimensional optimization. We found that, surprisingly, when using Mat\'ern kernels and Upper Confidence Bound (UCB), standard BO consistently achieves top-tier performance, often outperforming other BO methods specifically designed for high-dimensional optimization. Contrary to the stereotype, we found that standard GP equipped with Mat\'ern kernels can serve as a capable surrogate for learning high-dimensional functions. Without strong structural assumptions, BO with standard GP not only excels in high-dimensional optimization but also is robust in accommodating various structures within target functions. Furthermore, with standard GP, achieving promising optimization performance is possible via maximum a posterior (MAP) estimation with diffuse priors or merely maximum likelihood estimation, eliminating the need for expensive Markov-Chain Monte Carlo (MCMC) sampling that might be required by more complex surrogate models. In parallel, we also investigated and analyzed alternative popular settings in running standard BO, which, however, often fail in high-dimensional optimization. This might link to the a few failure cases reported in literature. We thus advocate for a re-evaluation and in-depth study of the potential of standard BO in addressing high-dimensional problems.
Updated: 2024-05-15 06:31:21
标题: 标准高斯过程在高维贝叶斯优化中表现出色
摘要: 长期以来,人们普遍认为使用标准高斯过程(GP)的贝叶斯优化(BO),即标准BO,在高维优化问题中效果不佳。虽然这种观点听起来合理,但缺乏强有力的经验证据。在本文中,我们系统地研究了在各种合成和真实世界基准问题中使用标准GP回归的BO,用于高维优化。令人惊讶的是,当使用Mat\'ern核和Upper Confidence Bound(UCB)时,标准BO始终能够稳定地实现最佳性能,通常优于其他专门设计用于高维优化的BO方法。与刻板印象相反,我们发现标准GP配备Mat\'ern核可以作为学习高维函数的有效代理。在没有强大结构假设的情况下,使用标准GP的BO不仅在高维优化中表现出色,而且能够适应目标函数中的各种结构。此外,通过最大后验(MAP)估计和弥散先验,或者仅仅通过最大似然估计,就可以实现有希望的优化性能,无需使用昂贵的马尔可夫链蒙特卡洛(MCMC)采样,这可能是更复杂代理模型所需的。同时,我们还研究和分析了在运行标准BO时的替代流行设置,然而在高维优化中通常失败。这可能与文献中报道的一些失败案例有关。因此,我们主张重新评估和深入研究标准BO在解决高维问题中的潜力。
更新时间: 2024-05-15 06:31:21
领域: cs.LG,stat.ML
BonnBot-I Plus: A Bio-diversity Aware Precise Weed Management Robotic Platform
In this article, we focus on the critical tasks of plant protection in arable farms, addressing a modern challenge in agriculture: integrating ecological considerations into the operational strategy of precision weeding robots like \bbot. This article presents the recent advancements in weed management algorithms and the real-world performance of \bbot\ at the University of Bonn's Klein-Altendorf campus. We present a novel Rolling-view observation model for the BonnBot-Is weed monitoring section which leads to an average absolute weeding performance enhancement of $3.4\%$. Furthermore, for the first time, we show how precision weeding robots could consider bio-diversity-aware concerns in challenging weeding scenarios. We carried out comprehensive weeding experiments in sugar-beet fields, covering both weed-only and mixed crop-weed situations, and introduced a new dataset compatible with precision weeding. Our real-field experiments revealed that our weeding approach is capable of handling diverse weed distributions, with a minimal loss of only $11.66\%$ attributable to intervention planning and $14.7\%$ to vision system limitations highlighting required improvements of the vision system.
Updated: 2024-05-15 06:23:59
标题: 博恩机器人-1 Plus:一种生物多样性感知精准除草管理机器人平台
摘要: 在这篇文章中,我们关注耕地农场中植物保护的关键任务,解决了农业中一个现代挑战:将生态考虑融入到精准除草机器人(如\bbot)的运作策略中。本文介绍了除草管理算法的最新进展,以及\bbot 在波恩大学Klein-Altendorf 校区的实际表现。我们提出了一种新颖的Rolling-view观察模型,用于波恩机器人的除草监测部分,可以使除草性能平均提高$3.4\%$。此外,我们首次展示了精准除草机器人如何在具有挑战性的除草场景中考虑生物多样性问题。我们在甜菜田进行了全面的除草实验,涵盖了仅有杂草和混合作物-杂草情况,并引入了一个与精准除草兼容的新数据集。我们的实地实验表明,我们的除草方法能够处理多样化的杂草分布,仅有$11.66\%$的最小损失归因于干预计划,$14.7\%$归因于视觉系统的限制,突显了视觉系统需要改进的必要性。
更新时间: 2024-05-15 06:23:59
领域: cs.RO,cs.AI,cs.LG,cs.MA
An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Mapping
The advent of advanced AI underscores the urgent need for comprehensive safety evaluations, necessitating collaboration across communities (i.e., AI, software engineering, and governance). However, divergent practices and terminologies across these communities, combined with the complexity of AI systems-of which models are only a part-and environmental affordances (e.g., access to tools), obstruct effective communication and comprehensive evaluation. This paper proposes a framework for AI system evaluation comprising three components: 1) harmonised terminology to facilitate communication across communities involved in AI safety evaluation; 2) a taxonomy identifying essential elements for AI system evaluation; 3) a mapping between AI lifecycle, stakeholders, and requisite evaluations for accountable AI supply chain. This framework catalyses a deeper discourse on AI system evaluation beyond model-centric approaches.
Updated: 2024-05-15 06:19:04
标题: 一个用于推进人工智能安全的人工智能系统评估框架:术语、分类、生命周期映射
摘要: 先进人工智能的出现突显了对全面安全评估的迫切需求,需要跨社区合作(即人工智能、软件工程和治理)。然而,这些社区之间存在差异化的实践和术语,加上人工智能系统的复杂性(其中模型只是其中的一部分)和环境条件(例如,工具的可访问性),阻碍了有效沟通和全面评估。本文提出了一个人工智能系统评估框架,包括三个组成部分:1)协调术语,以促进参与人工智能安全评估的社区之间的沟通;2)识别人工智能系统评估的基本要素的分类法;3)人工智能生命周期、利益相关者和负责任人工智能供应链所需评估之间的映射。该框架促进了对人工智能系统评估的更深入讨论,超越了以模型为中心的方法。
更新时间: 2024-05-15 06:19:04
领域: cs.SE,cs.AI,cs.CY,cs.LG
Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
Recent research indicates that large language models (LLMs) are susceptible to jailbreaking attacks that can generate harmful content. This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which effectively jailbreaks several open-source LLMs. Our approach relaxes the discrete jailbreak optimization into a continuous optimization and progressively increases the sparsity of the optimizing vectors. Consequently, our method effectively bridges the gap between discrete and continuous space optimization. Experimental results demonstrate that our method is more effective and efficient than existing token-level methods. On Harmbench, our method achieves state of the art attack success rate on seven out of eight LLMs. Code will be made available. Trigger Warning: This paper contains model behavior that can be offensive in nature.
Updated: 2024-05-15 06:11:24
标题: 高效的LLM越狱:通过自适应的密集到稀疏约束优化
摘要: 最近的研究表明,大型语言模型(LLMs)容易受到越狱攻击,可以生成有害内容。本文介绍了一种新颖的基于标记的攻击方法,自适应稠密到稀疏约束优化(ADC),有效地越狱了几个开源LLMs。我们的方法将离散越狱优化转化为连续优化,并逐渐增加优化向量的稀疏性。因此,我们的方法有效地弥合了离散和连续空间优化之间的差距。实验结果表明,我们的方法比现有的基于标记的方法更有效和高效。在Harmbench上,我们的方法在八个LLMs中的七个上实现了攻击成功率的最新水平。代码将会提供。触发警告:本文包含可能具有冒犯性质的模型行为。
更新时间: 2024-05-15 06:11:24
领域: cs.LG
CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving
To safely navigate intricate real-world scenarios, autonomous vehicles must be able to adapt to diverse road conditions and anticipate future events. World model (WM) based reinforcement learning (RL) has emerged as a promising approach by learning and predicting the complex dynamics of various environments. Nevertheless, to the best of our knowledge, there does not exist an accessible platform for training and testing such algorithms in sophisticated driving environments. To fill this void, we introduce CarDreamer, the first open-source learning platform designed specifically for developing WM based autonomous driving algorithms. It comprises three key components: 1) World model backbone: CarDreamer has integrated some state-of-the-art WMs, which simplifies the reproduction of RL algorithms. The backbone is decoupled from the rest and communicates using the standard Gym interface, so that users can easily integrate and test their own algorithms. 2) Built-in tasks: CarDreamer offers a comprehensive set of highly configurable driving tasks which are compatible with Gym interfaces and are equipped with empirically optimized reward functions. 3) Task development suite: This suite streamlines the creation of driving tasks, enabling easy definition of traffic flows and vehicle routes, along with automatic collection of multi-modal observation data. A visualization server allows users to trace real-time agent driving videos and performance metrics through a browser. Furthermore, we conduct extensive experiments using built-in tasks to evaluate the performance and potential of WMs in autonomous driving. Thanks to the richness and flexibility of CarDreamer, we also systematically study the impact of observation modality, observability, and sharing of vehicle intentions on AV safety and efficiency. All code and documents are accessible on https://github.com/ucd-dare/CarDreamer.
Updated: 2024-05-15 05:57:20
标题: CarDreamer:基于世界模型的自动驾驶开源学习平台
摘要: 为了安全地驾驶复杂的现实场景,自动驾驶车辆必须能够适应多样化的道路条件并预测未来事件。基于世界模型(WM)的强化学习(RL)已经成为一种有希望的方法,通过学习和预测各种环境的复杂动态。然而,据我们所知,目前还没有一个可访问的平台用于在复杂的驾驶环境中训练和测试这样的算法。为了填补这一空白,我们介绍了CarDreamer,这是第一个专门设计用于开发基于WM的自动驾驶算法的开源学习平台。它包括三个关键组件:1)世界模型骨干:CarDreamer集成了一些最先进的WM,简化了RL算法的重现。骨干与其余部分解耦,并使用标准的Gym接口进行通信,以便用户可以轻松集成和测试自己的算法。2)内置任务:CarDreamer提供了一套高度可配置的驾驶任务,与Gym接口兼容,并配备了经验优化的奖励函数。3)任务开发套件:这个套件简化了驾驶任务的创建,可以轻松定义交通流和车辆路线,并自动收集多模态观测数据。可视化服务器允许用户通过浏览器跟踪实时代理驾驶视频和性能指标。此外,我们使用内置任务进行了大量实验,评估了WM在自动驾驶中的性能和潜力。由于CarDreamer的丰富性和灵活性,我们还系统地研究了观测模态、可观测性和车辆意图共享对自动驾驶的安全性和效率的影响。所有代码和文档都可以在https://github.com/ucd-dare/CarDreamer上访问。
更新时间: 2024-05-15 05:57:20
领域: cs.RO,cs.AI
POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning
Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning, with QMIX receiving significant attention. Many QMIX-based methods introduce monotonicity constraints between the joint action value and individual action values to achieve decentralized execution. However, such constraints limit the representation capacity of value factorization, restricting the joint action values it can represent and hindering the learning of the optimal policy. To address this challenge, we propose the Potentially Optimal joint actions Weighted QMIX (POWQMIX) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses of these joint actions during training. We theoretically prove that with such a weighted training approach the optimal policy is guaranteed to be recovered. Experiments in matrix games, predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.
Updated: 2024-05-15 05:53:23
标题: POWQMIX: 带有潜在最优联合行动识别的加权值因子分解,用于合作多智体强化学习
摘要: 价值函数分解方法通常用于协作多智能体强化学习中,其中QMIX受到重视。许多基于QMIX的方法引入了联合动作值和个体动作值之间的单调性约束,以实现分散执行。然而,这种约束限制了价值分解的表示能力,限制了它可以表示的联合动作值,并阻碍了最优策略的学习。为了解决这一挑战,我们提出了潜在最优联合动作加权QMIX(POWQMIX)算法,该算法识别出潜在最优的联合动作,并在训练过程中将更高的权重分配给这些联合动作的损失。我们在矩阵游戏、捕食者-猎物和星际争霸II多智能体挑战环境中进行的实验表明,我们的算法优于基于价值的多智能体强化学习方法的最新水平。
更新时间: 2024-05-15 05:53:23
领域: cs.LG,cs.AI
Motion Prediction with Gaussian Processes for Safe Human-Robot Interaction in Virtual Environments
Humans use collaborative robots as tools for accomplishing various tasks. The interaction between humans and robots happens in tight shared workspaces. However, these machines must be safe to operate alongside humans to minimize the risk of accidental collisions. Ensuring safety imposes many constraints, such as reduced torque and velocity limits during operation, thus increasing the time to accomplish many tasks. However, for applications such as using collaborative robots as haptic interfaces with intermittent contacts for virtual reality applications, speed limitations result in poor user experiences. This research aims to improve the efficiency of a collaborative robot while improving the safety of the human user. We used Gaussian process models to predict human hand motion and developed strategies for human intention detection based on hand motion and gaze to improve the time for the robot and human security in a virtual environment. We then studied the effect of prediction. Results from comparisons show that the prediction models improved the robot time by 3\% and safety by 17\%. When used alongside gaze, prediction with Gaussian process models resulted in an improvement of the robot time by 2\% and the safety by 13\%.
Updated: 2024-05-15 05:51:41
标题: 高斯过程用于虚拟环境中安全人机互动的运动预测
摘要: 人类使用协作机器人作为完成各种任务的工具。人与机器人之间的互动发生在紧密的共享工作空间中。然而,这些机器必须安全地与人类一起操作,以最小化意外碰撞的风险。确保安全会施加许多限制,如在操作过程中降低扭矩和速度限制,从而增加完成许多任务所需的时间。然而,对于将协作机器人用作具有间歇性接触的触觉界面以用于虚拟现实应用的情况,速度限制会导致用户体验不佳。本研究旨在提高协作机器人的效率,同时提高人类用户的安全性。我们使用高斯过程模型来预测人手运动,并开发了基于手部运动和凝视的人类意图检测策略,以改善虚拟环境中机器人和人类的安全性。然后我们研究了预测的效果。比较结果显示,预测模型使机器人时间提高了3\%,安全性提高了17\%。当与凝视一起使用时,使用高斯过程模型进行预测使机器人时间提高了2\%,安全性提高了13\%。
更新时间: 2024-05-15 05:51:41
领域: cs.RO,cs.AI,cs.LG,I.2.6; I.2.9; I.3.2; H.5.2
PHUDGE: Phi-3 as Scalable Judge
In this paper cum technical report, we present PHUDGE A fine tuned Phi3 model that achieved SOTA results in 4 tasks as Feedback Test, Feedback OOD, MT Human, Preference Test surpassing each and every existing model in latency and throughput. It shows very strong correlation not only with GPT4 but with Human annotators too in unseen data as well as in both absolute and relative grading tasks. We have not only addressed the usage of small LMs for cost effective production grade systems but have also shown that Causal modelling is not only slow in nature but sometimes it can hinder models learning capabilities and should be replaced by simpler tasks whenever we can to make the overall system faster and better. We show that by following systematic ML experimentation, thoughtful data augmentation and re purposing the problem itself, we can even beat 10x bigger models even with lesser training data. To the best of our knowledge, we are re the first one to experiment and showcase the usage of generalised version of Earth Movers Distance AKA Wasserstein distance by using Minkowski Distance with a penalty to control loss smoothing and can be used as a loss function instead of Cross Entropy to get stable training and better results for grading tasks.
Updated: 2024-05-15 05:46:01
标题: PHUDGE:Phi-3作为可扩展的评判者
摘要: 在这篇技术报告中,我们提出了PHUDGE,一个经过精心调整的Phi3模型,在反馈测试、反馈OOD、MT Human、偏好测试等4个任务中取得了SOTA结果,超越了每一个现有模型在延迟和吞吐量方面。它不仅与GPT4表现出非常强的相关性,还与人类注释者在未知数据以及绝对和相对评分任务中也有很强的相关性。我们不仅解决了使用小型LM在成本有效的生产级系统中的问题,还展示了因果建模不仅在性质上较慢,有时会阻碍模型的学习能力,应该在可能的情况下用更简单的任务替代,使整个系统更快更好。我们展示通过系统化的ML实验、周到的数据增强和重新定位问题本身,即使在训练数据较少的情况下,我们也能击败10倍更大的模型。据我们所知,我们是第一个尝试和展示使用广义版本的地球移动者距离(也称为Wasserstein距离)的人,通过使用Minkowski距离,并加入惩罚来控制损失平滑,可以用作损失函数,而不是交叉熵,以获得稳定的训练和更好的评分任务结果。
更新时间: 2024-05-15 05:46:01
领域: cs.LG,cs.AI
MileBench: Benchmarking MLLMs in Long Context
Despite the advancements and impressive performance of Multimodal Large Language Models (MLLMs) on benchmarks, their effectiveness in real-world, long-context, and multi-image tasks is unclear due to the benchmarks' limited scope. Existing benchmarks often focus on single-image and short-text samples, and when assessing multi-image tasks, they either limit the image count or focus on specific task (e.g time-series captioning), potentially obscuring the performance challenges of MLLMs. To address these limitations, we introduce MileBench, a pioneering benchmark designed to test the MultImodal Long-contExt capabilities of MLLMs. This benchmark comprises not only multimodal long contexts, but also multiple tasks requiring both comprehension and generation. We establish two distinct evaluation sets, diagnostic and realistic, to systematically assess MLLMs' long-context adaptation capacity and their ability to complete tasks in long-context scenarios. Our experimental results, obtained from testing 22 models, revealed that while the closed-source GPT-4o outperforms others, most open-source MLLMs struggle in long-context situations. Interestingly, the performance gap tends to widen with an increase in the number of images. We strongly encourage an intensification of research efforts towards enhancing MLLMs' long-context capabilities, especially in scenarios involving multiple images.
Updated: 2024-05-15 05:43:30
标题: MileBench:在长上下文中对MLLMs进行基准测试
摘要: 尽管多模态大语言模型(MLLMs)在基准测试中取得了进展和令人印象深刻的表现,但由于基准测试的范围有限,它们在现实世界、长上下文和多图像任务中的有效性尚不清楚。现有的基准测试通常侧重于单图像和短文本样本,而在评估多图像任务时,它们要么限制图像数量,要么专注于特定任务(例如时间序列字幕),潜在地遮蔽了MLLMs的性能挑战。为了解决这些限制,我们引入了MileBench,这是一个旨在测试MLLMs的MultImodal Long-contExt能力的开创性基准测试。这一基准测试不仅包括多模态长上下文,还包括需要理解和生成的多个任务。我们建立了两个不同的评估集,诊断和现实,以系统地评估MLLMs的长上下文适应能力以及它们在长上下文场景中完成任务的能力。我们从测试22个模型获得的实验结果表明,尽管闭源的GPT-4o优于其他模型,但大多数开源的MLLMs在长上下文情况下表现不佳。有趣的是,性能差距往往随着图像数量的增加而扩大。我们强烈鼓励加强对提升MLLMs的长上下文能力的研究工作,特别是在涉及多个图像的场景中。
更新时间: 2024-05-15 05:43:30
领域: cs.CL,cs.AI,cs.CV,cs.LG
Minimisation of Polyak-Łojasewicz Functions Using Random Zeroth-Order Oracles
The application of a zeroth-order scheme for minimising Polyak-\L{}ojasewicz (PL) functions is considered. The framework is based on exploiting a random oracle to estimate the function gradient. The convergence of the algorithm to a global minimum in the unconstrained case and to a neighbourhood of the global minimum in the constrained case along with their corresponding complexity bounds are presented. The theoretical results are demonstrated via numerical examples.
Updated: 2024-05-15 05:43:16
标题: 使用随机零阶神谕最小化Polyak-Łojasewicz函数
摘要: 本文考虑了使用零阶方案最小化Polyak-\L{}ojasewicz (PL)函数的应用。该框架基于利用随机预言来估计函数梯度。算法在无约束情况下收敛到全局最小值,在有约束情况下收敛到全局最小值附近,并给出了它们对应的复杂度界限。理论结果通过数值例子进行了验证。
更新时间: 2024-05-15 05:43:16
领域: math.OC,cs.LG
A Systematic Analysis on the Temporal Generalization of Language Models in Social Media
In machine learning, temporal shifts occur when there are differences between training and test splits in terms of time. For streaming data such as news or social media, models are commonly trained on a fixed corpus from a certain period of time, and they can become obsolete due to the dynamism and evolving nature of online content. This paper focuses on temporal shifts in social media and, in particular, Twitter. We propose a unified evaluation scheme to assess the performance of language models (LMs) under temporal shift on standard social media tasks. LMs are tested on five diverse social media NLP tasks under different temporal settings, which revealed two important findings: (i) the decrease in performance under temporal shift is consistent across different models for entity-focused tasks such as named entity recognition or disambiguation, and hate speech detection, but not significant in the other tasks analysed (i.e., topic and sentiment classification); and (ii) continuous pre-training on the test period does not improve the temporal adaptability of LMs.
Updated: 2024-05-15 05:41:06
标题: 社交媒体中语言模型时间泛化的系统分析
摘要: 在机器学习中,时间偏移发生在训练和测试数据在时间上存在差异时。对于像新闻或社交媒体这样的流数据,模型通常是在一定时间段内的固定语料库上进行训练的,由于在线内容的动态性和不断发展的性质,这些模型可能会变得过时。本文关注社交媒体,特别是Twitter中的时间偏移。我们提出了一个统一的评估方案,用于评估语言模型(LMs)在标准社交媒体任务中的表现在时间偏移下。LMs在不同的时间设置下被测试了五个不同的社交媒体自然语言处理任务,揭示了两个重要发现:(一)在时间偏移下,对于实体相关任务(如命名实体识别或消歧)和仇恨言论检测,不同模型的性能下降是一致的,但在其他任务(即主题和情感分类)中并不显著;(二)在测试期间进行连续预训练并不能提高LMs的时间适应能力。
更新时间: 2024-05-15 05:41:06
领域: cs.CL,cs.LG
Restless Bandit Problem with Rewards Generated by a Linear Gaussian Dynamical System
The stochastic multi-armed bandit problem studies decision-making under uncertainty. In the problem, the learner interacts with an environment by choosing an action at each round, where a round is an instance of an interaction. In response, the environment reveals a reward, which is sampled from a stochastic process, to the learner. The goal of the learner is to maximize cumulative reward. A specific variation of the stochastic multi-armed bandit problem is the restless bandit, where the reward for each action is sampled from a Markov chain. The restless bandit with a discrete state-space is a well-studied problem, but to the best of our knowledge, not many results exist for the continuous state-space version which has many applications such as hyperparameter optimization. In this work, we tackle the restless bandit with continuous state-space by assuming the rewards are the inner product of an action vector and a state vector generated by a linear Gaussian dynamical system. To predict the reward for each action, we propose a method that takes a linear combination of previously observed rewards for predicting each action's next reward. We show that, regardless of the sequence of previous actions chosen, the reward sampled for any previously chosen action can be used for predicting another action's future reward, i.e. the reward sampled for action 1 at round $t-1$ can be used for predicting the reward for action $2$ at round $t$. This is accomplished by designing a modified Kalman filter with a matrix representation that can be learned for reward prediction. Numerical evaluations are carried out on a set of linear Gaussian dynamical systems.
Updated: 2024-05-15 05:33:49
标题: 线性高斯动态系统生成奖励的不安定强盗问题
摘要: 随机多臂赌博问题研究了在不确定性条件下的决策制定。在这个问题中,学习者通过在每一轮选择一个行动与环境进行交互,其中一轮是一次交互的实例。作为回应,环境会揭示一个奖励,这个奖励是从一个随机过程中抽样得到的。学习者的目标是最大化累积奖励。随机多臂赌博问题的一个特定变体是不安分的赌徒,其中每个行动的奖励是从一个马尔可夫链中抽样得到的。离散状态空间下的不安分的赌徒是一个被广泛研究的问题,但据我们所知,对于具有许多应用的连续状态空间版本并没有太多结果存在,比如超参数优化。在这项工作中,我们通过假设奖励是由一个线性高斯动态系统生成的行动向量和状态向量的内积来处理连续状态空间下的不安分的赌徒。为了预测每个行动的奖励,我们提出了一种方法,该方法对预测每个行动的下一个奖励采用先前观察到的奖励的线性组合。我们展示了,无论以前选择的行动序列如何,对于任何以前选择的行动抽样的奖励都可以用于预测另一个行动的未来奖励,即在$t-1$轮抽样的行动1的奖励可以用于预测$t$轮的行动2的奖励。这是通过设计一个可以用于奖励预测的矩阵表示的修改卡尔曼滤波器来实现的。在一组线性高斯动态系统上进行了数值评估。
更新时间: 2024-05-15 05:33:49
领域: stat.ML,cs.LG,cs.SY,eess.SY
Dynamic Adversarial Attacks on Autonomous Driving Systems
This paper introduces an attacking mechanism to challenge the resilience of autonomous driving systems. Specifically, we manipulate the decision-making processes of an autonomous vehicle by dynamically displaying adversarial patches on a screen mounted on another moving vehicle. These patches are optimized to deceive the object detection models into misclassifying targeted objects, e.g., traffic signs. Such manipulation has significant implications for critical multi-vehicle interactions such as intersection crossing and lane changing, which are vital for safe and efficient autonomous driving systems. Particularly, we make four major contributions. First, we introduce a novel adversarial attack approach where the patch is not co-located with its target, enabling more versatile and stealthy attacks. Moreover, our method utilizes dynamic patches displayed on a screen, allowing for adaptive changes and movement, enhancing the flexibility and performance of the attack. To do so, we design a Screen Image Transformation Network (SIT-Net), which simulates environmental effects on the displayed images, narrowing the gap between simulated and real-world scenarios. Further, we integrate a positional loss term into the adversarial training process to increase the success rate of the dynamic attack. Finally, we shift the focus from merely attacking perceptual systems to influencing the decision-making algorithms of self-driving systems. Our experiments demonstrate the first successful implementation of such dynamic adversarial attacks in real-world autonomous driving scenarios, paving the way for advancements in the field of robust and secure autonomous driving.
Updated: 2024-05-15 05:24:31
标题: 自动驾驶系统的动态对抗攻击
摘要: 本文介绍了一种攻击机制,挑战自动驾驶系统的韧性。具体而言,我们通过在另一辆移动车辆上安装的屏幕动态显示对抗性贴片来操纵自动驾驶车辆的决策过程。这些贴片经过优化,可以欺骗目标检测模型,使其误分类目标对象,例如交通标志。这种操纵对于关键的多车辆交互行为,如路口穿越和变道,对于安全和高效的自动驾驶系统至关重要。特别地,我们做出了四个主要贡献。首先,我们引入了一种新颖的对抗性攻击方法,其中贴片与其目标不共位,可以实现更多样化和隐蔽的攻击。此外,我们的方法利用在屏幕上显示的动态贴片,允许自适应变化和移动,增强了攻击的灵活性和性能。为此,我们设计了一个屏幕图像转换网络(SIT-Net),模拟显示图像的环境效果,缩小了模拟和真实场景之间的差距。此外,我们在对抗性训练过程中引入了一个位置损失项,以增加动态攻击的成功率。最后,我们将焦点从仅仅攻击感知系统转移到影响自动驾驶系统的决策算法。我们的实验展示了在真实世界自动驾驶场景中首次成功实施此类动态对抗性攻击,为鲁棒和安全自动驾驶领域的进展铺平了道路。
更新时间: 2024-05-15 05:24:31
领域: cs.RO,cs.CV,cs.LG
Dual Correction Strategy for Ranking Distillation in Top-N Recommender System
Knowledge Distillation (KD), which transfers the knowledge of a well-trained large model (teacher) to a small model (student), has become an important area of research for practical deployment of recommender systems. Recently, Relaxed Ranking Distillation (RRD) has shown that distilling the ranking information in the recommendation list significantly improves the performance. However, the method still has limitations in that 1) it does not fully utilize the prediction errors of the student model, which makes the training not fully efficient, and 2) it only distills the user-side ranking information, which provides an insufficient view under the sparse implicit feedback. This paper presents Dual Correction strategy for Distillation (DCD), which transfers the ranking information from the teacher model to the student model in a more efficient manner. Most importantly, DCD uses the discrepancy between the teacher model and the student model predictions to decide which knowledge to be distilled. By doing so, DCD essentially provides the learning guidance tailored to "correcting" what the student model has failed to accurately predict. This process is applied for transferring the ranking information from the user-side as well as the item-side to address sparse implicit user feedback. Our experiments show that the proposed method outperforms the state-of-the-art baselines, and ablation studies validate the effectiveness of each component.
Updated: 2024-05-15 05:24:19
标题: 双重修正策略在Top-N推荐系统中的排名蒸馏
摘要: 知识蒸馏(KD)是将经过良好训练的大型模型(教师)的知识转移到小型模型(学生)的方法,已成为推荐系统在实际部署中的重要研究领域。最近,Relaxed Ranking Distillation(RRD)表明,在推荐列表中蒸馏排名信息显著提高了性能。然而,该方法仍然存在限制,即1)它未充分利用学生模型的预测误差,使训练效率不完全高效,2)它仅蒸馏用户端的排名信息,在稀疏的隐式反馈下提供不足的视角。本文提出了用于蒸馏的双重校正策略(DCD),以更高效的方式将排名信息从教师模型转移到学生模型。最重要的是,DCD利用教师模型和学生模型预测之间的差异来决定要蒸馏哪些知识。通过这样做,DCD基本上提供了定制的学习指导,以“纠正”学生模型未能准确预测的内容。该过程被应用于传输来自用户端和物品端的排名信息,以解决稀疏的隐式用户反馈。我们的实验表明,所提出的方法优于现有的基线方法,并且消融研究验证了每个组件的有效性。
更新时间: 2024-05-15 05:24:19
领域: cs.IR,cs.LG
Optimizing Sensor Network Design for Multiple Coverage
Sensor placement optimization methods have been studied extensively. They can be applied to a wide range of applications, including surveillance of known environments, optimal locations for 5G towers, and placement of missile defense systems. However, few works explore the robustness and efficiency of the resulting sensor network concerning sensor failure or adversarial attacks. This paper addresses this issue by optimizing for the least number of sensors to achieve multiple coverage of non-simply connected domains by a prescribed number of sensors. We introduce a new objective function for the greedy (next-best-view) algorithm to design efficient and robust sensor networks and derive theoretical bounds on the network's optimality. We further introduce a Deep Learning model to accelerate the algorithm for near real-time computations. The Deep Learning model requires the generation of training examples. Correspondingly, we show that understanding the geometric properties of the training data set provides important insights into the performance and training process of deep learning techniques. Finally, we demonstrate that a simple parallel version of the greedy approach using a simpler objective can be highly competitive.
Updated: 2024-05-15 05:13:20
标题: 多目标覆盖下传感器网络设计的优化
摘要: 传感器位置优化方法已被广泛研究。它们可以应用于广泛的应用场景,包括对已知环境的监视,5G塔的最佳位置以及导弹防御系统的部署。然而,很少有研究探讨传感器网络在传感器故障或敌对攻击方面的鲁棒性和效率。本文通过优化最少数量的传感器来实现对预定数量的传感器进行多重覆盖的非简单连接域,来解决这个问题。我们为贪婪(下一个最佳视图)算法引入了一个新的目标函数,设计高效且鲁棒的传感器网络,并推导出网络优化的理论界限。我们进一步引入了一个深度学习模型,以加速算法进行几乎实时的计算。深度学习模型需要生成训练示例。相应地,我们表明理解训练数据集的几何属性对深度学习技术的性能和训练过程提供了重要见解。最后,我们证明了使用简单目标的贪婪方法的简单并行版本可以具有很高的竞争力。
更新时间: 2024-05-15 05:13:20
领域: cs.LG,cs.RO,math.OC
Optimal Clustering with Bandit Feedback
This paper considers the problem of online clustering with bandit feedback. A set of arms (or items) can be partitioned into various groups that are unknown. Within each group, the observations associated to each of the arms follow the same distribution with the same mean vector. At each time step, the agent queries or pulls an arm and obtains an independent observation from the distribution it is associated to. Subsequent pulls depend on previous ones as well as the previously obtained samples. The agent's task is to uncover the underlying partition of the arms with the least number of arm pulls and with a probability of error not exceeding a prescribed constant $\delta$. The problem proposed finds numerous applications from clustering of variants of viruses to online market segmentation. We present an instance-dependent information-theoretic lower bound on the expected sample complexity for this task, and design a computationally efficient and asymptotically optimal algorithm, namely Bandit Online Clustering (BOC). The algorithm includes a novel stopping rule for adaptive sequential testing that circumvents the need to exactly solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower bound asymptotically, and significantly outperforms a non-adaptive baseline algorithm.
Updated: 2024-05-15 05:05:50
标题: 使用贝叶斯反馈进行最佳聚类
摘要: 本文考虑了带有赌博反馈的在线聚类问题。一组手臂(或物品)可以被划分为各种未知的组。在每个组内,与每个手臂相关联的观测值遵循相同的分布,具有相同的均值向量。在每个时间步,代理程序查询或拉动一个手臂,并从其相关的分布中获得独立的观测值。后续的拉动取决于先前的拉动以及先前获得的样本。代理的任务是以最少的手臂拉动次数揭示手臂的潜在分区,并且错误的概率不超过预定的常数 δ。所提出的问题在从病毒变种的聚类到在线市场细分等领域有许多应用。我们针对这个任务提出了一个依赖于实例的信息论下界,设计了一个计算效率高且渐近最优的算法,即Bandit Online Clustering(BOC)。该算法包括一种新颖的自适应顺序测试停止规则,可以避免需要精确解决任何 NP-hard 加权聚类问题作为其子程序。通过对合成和真实数据集的广泛模拟,我们展示了BOC的性能在渐近上与下界相匹配,并且明显优于非自适应基线算法。
更新时间: 2024-05-15 05:05:50
领域: cs.LG,cs.IT,math.IT,stat.ML
Towards Next-Generation Steganalysis: LLMs Unleash the Power of Detecting Steganography
Linguistic steganography provides convenient implementation to hide messages, particularly with the emergence of AI generation technology. The potential abuse of this technology raises security concerns within societies, calling for powerful linguistic steganalysis to detect carrier containing steganographic messages. Existing methods are limited to finding distribution differences between steganographic texts and normal texts from the aspect of symbolic statistics. However, the distribution differences of both kinds of texts are hard to build precisely, which heavily hurts the detection ability of the existing methods in realistic scenarios. To seek a feasible way to construct practical steganalysis in real world, this paper propose to employ human-like text processing abilities of large language models (LLMs) to realize the difference from the aspect of human perception, addition to traditional statistic aspect. Specifically, we systematically investigate the performance of LLMs in this task by modeling it as a generative paradigm, instead of traditional classification paradigm. Extensive experiment results reveal that generative LLMs exhibit significant advantages in linguistic steganalysis and demonstrate performance trends distinct from traditional approaches. Results also reveal that LLMs outperform existing baselines by a wide margin, and the domain-agnostic ability of LLMs makes it possible to train a generic steganalysis model (Both codes and trained models are openly available in https://github.com/ba0z1/Linguistic-Steganalysis-with-LLMs).
Updated: 2024-05-15 04:52:09
标题: 走向下一代隐写分析:LLMs释放检测隐写术的力量
摘要: 语言隐写术为隐藏消息提供了便捷的实现方式,尤其是随着人工智能生成技术的出现。这项技术的潜在滥用引发了社会的安全担忧,呼吁进行强大的语言隐写分析来检测包含隐写消息的载体。现有方法仅限于从符号统计的角度找到隐写文本和普通文本之间的分布差异。然而,这两种文本的分布差异很难精确构建,这严重影响了现有方法在现实场景中的检测能力。为了寻求在现实世界中构建实用的隐写分析的可行途径,本文提出利用大型语言模型(LLMs)的类人文本处理能力,从人类感知的角度实现差异,以及传统的统计方面。具体来说,我们通过将其建模为生成范式而不是传统的分类范式,系统地研究了LLMs在这一任务中的表现。广泛的实验结果显示,生成LLMs在语言隐写分析中表现出显著优势,并展示了与传统方法不同的性能趋势。结果还显示,LLMs在很大程度上优于现有基线,并且LLMs的领域无关能力使得训练通用的隐写分析模型成为可能(代码和训练模型均在https://github.com/ba0z1/Linguistic-Steganalysis-with-LLMs 开放获取)。
更新时间: 2024-05-15 04:52:09
领域: cs.CR
Chaos-based reinforcement learning with TD3
Chaos-based reinforcement learning (CBRL) is a method in which the agent's internal chaotic dynamics drives exploration. This approach offers a model for considering how the biological brain can create variability in its behavior and learn in an exploratory manner. At the same time, it is a learning model that has the ability to automatically switch between exploration and exploitation modes and the potential to realize higher explorations that reflect what it has learned so far. However, the learning algorithms in CBRL have not been well-established in previous studies and have yet to incorporate recent advances in reinforcement learning. This study introduced Twin Delayed Deep Deterministic Policy Gradients (TD3), which is one of the state-of-the-art deep reinforcement learning algorithms that can treat deterministic and continuous action spaces, to CBRL. The validation results provide several insights. First, TD3 works as a learning algorithm for CBRL in a simple goal-reaching task. Second, CBRL agents with TD3 can autonomously suppress their exploratory behavior as learning progresses and resume exploration when the environment changes. Finally, examining the effect of the agent's chaoticity on learning shows that extremely strong chaos negatively impacts the flexible switching between exploration and exploitation.
Updated: 2024-05-15 04:47:31
标题: 混沌驱动的TD3增强学习
摘要: 混沌强化学习(CBRL)是一种方法,其中代理的内部混沌动态驱动探索。这种方法提供了一个模型,用于考虑生物大脑如何在行为中产生变化并以探索方式学习。与此同时,这是一种具有自动在探索和开发模式之间切换能力的学习模型,并具有实现反映迄今所学内容的更高探索的潜力。然而,在之前的研究中,CBRL中的学习算法尚未得到很好地建立,并且尚未融入最新的强化学习进展。本研究引入了Twin Delayed Deep Deterministic Policy Gradients(TD3),这是一种能够处理确定性和连续行动空间的最新深度强化学习算法之一,用于CBRL。验证结果提供了几点见解。首先,TD3在简单的目标达成任务中作为CBRL的学习算法有效。其次,具有TD3的CBRL代理可以在学习进行时自主抑制其探索行为,并在环境发生变化时恢复探索。最后,检查代理的混沌性对学习的影响表明,极强的混沌性对探索与开发之间的灵活切换产生负面影响。
更新时间: 2024-05-15 04:47:31
领域: cs.LG,cs.AI,cs.NE
Temporarily Restricting Solidity Smart Contract Interactions
In this work we explore ways to restrict the ability to call Solidity smart contract functions for a specified duration. We describe methods to restrict functions from being called twice in the same transaction, block, or time period. This is related to the notion of non-reentrant functions, which are functions that can be called within a previous execution. These methods can be used to restrict interactions with entire sets of functions of smart contracts. We are motivated to revisit this topic for two reasons. First, we note that sixteen real-world smart contracts exploits in 2023 resulting in over $136M USD lost or stolen that could have been prevented by restricting function calls. As part of this survey, we dissect a new class of exploit that involves so-called read-only reentrancy: exploits that re-enter read-only functions to make smart contract state inconsistent in order to enable their exploitation. Second, while some of these approaches are simple, they may not always behave the same across different blockchains that support Solidity.
Updated: 2024-05-15 04:38:50
标题: 暂时限制Solidity智能合约交互
摘要: 在这项工作中,我们探讨了限制调用Solidity智能合约函数的能力在指定持续时间内的方法。我们描述了限制函数在同一交易、区块或时间段内被调用两次的方法。这与非重入函数的概念相关,非重入函数是指可以在先前执行中被调用的函数。这些方法可以用于限制与智能合约的整套函数的交互。我们之所以重新审视这个话题,有两个原因。首先,我们注意到在2023年发生了十六起真实世界的智能合约利用,导致超过1.36亿美元的损失或被盗,这些损失本可以通过限制函数调用来防止。作为本次调查的一部分,我们分析了一种新型利用类别,涉及所谓的只读重入性:利用重新进入只读函数,使智能合约状态不一致以便实现利用。其次,尽管其中一些方法很简单,但在支持Solidity的不同区块链上可能不会始终表现一致。
更新时间: 2024-05-15 04:38:50
领域: cs.CR
Conformalized Adaptive Forecasting of Heterogeneous Trajectories
This paper presents a new conformal method for generating simultaneous forecasting bands guaranteed to cover the entire path of a new random trajectory with sufficiently high probability. Prompted by the need for dependable uncertainty estimates in motion planning applications where the behavior of diverse objects may be more or less unpredictable, we blend different techniques from online conformal prediction of single and multiple time series, as well as ideas for addressing heteroscedasticity in regression. This solution is both principled, providing precise finite-sample guarantees, and effective, often leading to more informative predictions than prior methods.
Updated: 2024-05-15 04:38:46
标题: 异质轨迹的适应性预测的共形化
摘要: 本文介绍了一种新的符合方法,用于生成同时预测带,保证覆盖新的随机轨迹的整个路径,且有足够高的概率。受到对可靠不确定性估计的需求的推动,在运动规划应用中,各种对象的行为可能更或少不可预测,我们融合了来自单个和多个时间序列的在线符合预测的不同技术,以及解决回归中异方差性的想法。这种解决方案既有原则,提供精确的有限样本保证,又有效,通常比先前的方法导致更具信息性的预测。
更新时间: 2024-05-15 04:38:46
领域: stat.ML,cs.LG
Explainable AI for Ship Collision Avoidance: Decoding Decision-Making Processes and Behavioral Intentions
This study developed an explainable AI for ship collision avoidance. Initially, a critic network composed of sub-task critic networks was proposed to individually evaluate each sub-task in collision avoidance to clarify the AI decision-making processes involved. Additionally, an attempt was made to discern behavioral intentions through a Q-value analysis and an Attention mechanism. The former focused on interpreting intentions by examining the increment of the Q-value resulting from AI actions, while the latter incorporated the significance of other ships in the decision-making process for collision avoidance into the learning objective. AI's behavioral intentions in collision avoidance were visualized by combining the perceived collision danger with the degree of attention to other ships. The proposed method was evaluated through a numerical experiment. The developed AI was confirmed to be able to safely avoid collisions under various congestion levels, and AI's decision-making process was rendered comprehensible to humans. The proposed method not only facilitates the understanding of DRL-based controllers/systems in the ship collision avoidance task but also extends to any task comprising sub-tasks.
Updated: 2024-05-15 04:09:46
标题: 可解释的人工智能用于船舶避碰:解码决策过程和行为意图
摘要: 这项研究开发了一种可解释的AI用于船舶避碰。首先,提出了由子任务评论网络组成的评论网络,用于单独评估避碰中的每个子任务,以阐明涉及的AI决策过程。此外,通过Q值分析和Attention机制,试图识别行为意图。前者侧重于通过检查由AI行动导致的Q值增加来解释意图,而后者将决策过程中其他船舶的重要性纳入学习目标中。通过将感知到的碰撞危险与对其他船舶的关注程度相结合,将AI在避碰中的行为意图可视化。所提出的方法通过数值实验进行了评估。确认开发的AI能够在各种拥挤水平下安全避免碰撞,并且AI的决策过程对人类而言可理解。所提出的方法不仅有助于理解基于DRL的船舶避碰任务中的控制器/系统,还可扩展到包含子任务的任何任务中。
更新时间: 2024-05-15 04:09:46
领域: cs.RO,cs.AI,cs.SY,eess.SY
Asymptotically Unbiased Synthetic Control Methods by Distribution Matching
Synthetic Control Methods (SCMs) have become an essential tool for comparative case studies. The fundamental idea of SCMs is to estimate the counterfactual outcomes of a treated unit using a weighted sum of the observed outcomes of untreated units. The accuracy of the synthetic control (SC) is critical for evaluating the treatment effect of a policy intervention; therefore, the estimation of SC weights has been the focus of extensive research. In this study, we first point out that existing SCMs suffer from an endogeneity problem, the correlation between the outcomes of untreated units and the error term of the synthetic control, which yields a bias in the treatment effect estimator. We then propose a novel SCM based on density matching, assuming that the density of outcomes of the treated unit can be approximated by a weighted average of the joint density of untreated units (i.e., a mixture model). Based on this assumption, we estimate SC weights by matching the moments of treated outcomes with the weighted sum of moments of untreated outcomes. Our proposed method has three advantages over existing methods: first, our estimator is asymptotically unbiased under the assumption of the mixture model; second, due to the asymptotic unbiasedness, we can reduce the mean squared error in counterfactual predictions; third, our method generates full densities of the treatment effect, not merely expected values, which broadens the applicability of SCMs. We provide experimental results to demonstrate the effectiveness of our proposed method.
Updated: 2024-05-15 04:05:37
标题: 通过分布匹配实现渐近无偏的合成对照方法
摘要: 合成控制方法(SCMs)已成为比较案例研究的基本工具。SCMs的基本理念是利用未处理单位的观察结果的加权和来估计受治疗单位的反事实结果。合成控制(SC)的准确性对于评估政策干预的治疗效果至关重要;因此,SC权重的估计已成为广泛研究的焦点。在本研究中,我们首先指出现有的SCMs存在内生性问题,即未处理单位的结果与合成控制的误差项之间的相关性,这会导致治疗效果估计量的偏差。然后,我们提出了一种基于密度匹配的新型SCM,假设受治疗单位的结果密度可以通过未处理单位的联合密度的加权平均值来近似(即混合模型)。基于这一假设,我们通过将受治疗单位结果的矩与未处理单位结果的矩的加权和进行匹配来估计SC权重。我们提出的方法相对于现有方法具有三个优势:首先,在混合模型假设下,我们的估计量是渐近无偏的;其次,由于渐近无偏性,我们可以减少反事实预测中的均方误差;第三,我们的方法生成治疗效果的完整密度,而不仅仅是期望值,这扩展了SCMs的适用性。我们提供实验结果来展示我们提出的方法的有效性。
更新时间: 2024-05-15 04:05:37
领域: econ.EM,cs.LG,stat.ME
Enhancing Airline Customer Satisfaction: A Machine Learning and Causal Analysis Approach
This study explores the enhancement of customer satisfaction in the airline industry, a critical factor for retaining customers and building brand reputation, which are vital for revenue growth. Utilizing a combination of machine learning and causal inference methods, we examine the specific impact of service improvements on customer satisfaction, with a focus on the online boarding pass experience. Through detailed data analysis involving several predictive and causal models, we demonstrate that improvements in the digital aspects of customer service significantly elevate overall customer satisfaction. This paper highlights how airlines can strategically leverage these insights to make data-driven decisions that enhance customer experiences and, consequently, their market competitiveness.
Updated: 2024-05-15 04:01:47
标题: 提高航空公司客户满意度:机器学习和因果分析方法
摘要: 本研究探讨了航空业中客户满意度的提升,这是保留客户和建立品牌声誉的关键因素,对于收入增长至关重要。利用机器学习和因果推断方法的结合,我们研究了服务改进对客户满意度的具体影响,重点关注在线登机牌体验。通过涉及多个预测和因果模型的详细数据分析,我们证明了客户服务数字化方面的改进显著提升了整体客户满意度。本文强调了航空公司如何战略性地利用这些见解做出数据驱动的决策,以增强客户体验,从而提升市场竞争力。
更新时间: 2024-05-15 04:01:47
领域: cs.LG,stat.ME
A Unified Industrial Large Knowledge Model Framework in Smart Manufacturing
The recent emergence of large language models (LLMs) shows the potential for artificial general intelligence, revealing new opportunities in industry 4.0 and smart manufacturing. However, a notable gap exists in applying these LLMs in industry, primarily due to their training on general knowledge rather than domain-specific knowledge. Such specialized domain knowledge is vital for effectively addressing the complex needs of industrial applications. To bridge this gap, this paper proposes an Industrial Large Knowledge Model (ILKM) framework emphasizing their potential to revolutionize the industry in smart manufacturing. In addition, ILKMs and LLMs are compared from eight perspectives. Finally, the "6S Principle" is proposed as the guideline for ILKM development, and several potential opportunities are highlighted for ILKM deployment in smart manufacturing.
Updated: 2024-05-15 04:00:16
标题: 智能制造中的统一工业大型知识模型框架
摘要: 最近出现的大型语言模型(LLMs)展示了人工通用智能的潜力,揭示了工业4.0和智能制造业中的新机遇。然而,在工业中应用这些LLMs存在一个显著的差距,主要是因为它们是在一般知识而非特定领域知识上进行训练的。这种专业领域知识对于有效应对工业应用的复杂需求至关重要。为了弥合这一差距,本文提出了一个强调其潜力在智能制造业中革新行业的工业大型知识模型(ILKM)框架。此外,从八个角度比较了ILKMs和LLMs。最后,提出了“6S原则”作为ILKM发展的指导方针,并突出了在智能制造业中部署ILKM的几个潜在机会。
更新时间: 2024-05-15 04:00:16
领域: cs.LG,cs.AI
Graph Network Surrogate Model for Subsurface Flow Optimization
The optimization of well locations and controls is an important step in the design of subsurface flow operations such as oil production or geological CO2 storage. These optimization problems can be computationally expensive, however, as many potential candidate solutions must be evaluated. In this study, we propose a graph network surrogate model (GNSM) for optimizing well placement and controls. The GNSM transforms the flow model into a computational graph that involves an encoding-processing-decoding architecture. Separate networks are constructed to provide global predictions for the pressure and saturation state variables. Model performance is enhanced through the inclusion of the single-phase steady-state pressure solution as a feature. A multistage multistep strategy is used for training. The trained GNSM is applied to predict flow responses in a 2D unstructured model of a channelized reservoir. Results are presented for a large set of test cases, in which five injection wells and five production wells are placed randomly throughout the model, with a random control variable (bottom-hole pressure) assigned to each well. Median relative error in pressure and saturation for 300 such test cases is 1-2%. The ability of the trained GNSM to provide accurate predictions for a new (geologically similar) permeability realization is demonstrated. Finally, the trained GNSM is used to optimize well locations and controls with a differential evolution algorithm. GNSM-based optimization results are comparable to those from simulation-based optimization, with a runtime speedup of a factor of 36. Much larger speedups are expected if the method is used for robust optimization, in which each candidate solution is evaluated on multiple geological models.
Updated: 2024-05-15 03:58:12
标题: 地下流动优化的图网络替代模型
摘要: 优化井位和控制是设计地下流动操作(如油田生产或地质CO2储存)的重要步骤。然而,这些优化问题可能在计算上代价高昂,因为需要评估许多潜在的候选解决方案。在本研究中,我们提出了一种用于优化井位和控制的图网络代理模型(GNSM)。GNSM将流动模型转化为一个涉及编码-处理-解码架构的计算图。构建了单独的网络以提供对压力和饱和度状态变量的全局预测。通过将单相稳态压力解作为特征来增强模型性能。采用多阶段多步骤策略进行训练。训练后的GNSM被应用于预测通道化油藏的2D非结构化模型中的流动响应。对大量测试案例进行了结果展示,其中在模型中随机放置了五口注入井和五口生产井,并为每口井分配了一个随机控制变量(井底压力)。对300个这样的测试案例的压力和饱和度的中位相对误差为1-2%。展示了训练后的GNSM提供准确预测对于新的(地质相似)渗透率实现的能力。最后,使用训练后的GNSM结合差分进化算法优化井位和控制。基于GNSM的优化结果与基于模拟的优化结果相当,并且运行时间加速了36倍。如果该方法用于强健优化,预计会有更大的加速效果,其中每个候选解决方案在多个地质模型上进行评估。
更新时间: 2024-05-15 03:58:12
领域: physics.geo-ph,cs.LG
Active learning of effective Hamiltonian for super-large-scale atomic structures
The first-principles-based effective Hamiltonian scheme provides one of the most accurate modeling technique for large-scale structures, especially for ferroelectrics. However, the parameterization of the effective Hamiltonian is complicated and can be difficult for some complex systems such as high-entropy perovskites. Here, we propose a general form of effective Hamiltonian and develop an active machine learning approach to parameterize the effective Hamiltonian based on Bayesian linear regression. The parameterization is employed in molecular dynamics simulations with the prediction of energy, forces, stress and their uncertainties at each step, which decides whether first-principles calculations are executed to retrain the parameters. Structures of BaTiO$_3$, Pb(Zr$_{0.75}$Ti$_{0.25}$)O$_3$ and (Pb,Sr)TiO$_3$ system are taken as examples to show the accuracy of this approach, as compared with conventional parametrization method and experiments. This machine learning approach provides a universal and automatic way to compute the effective Hamiltonian parameters for any considered complex systems with super-large-scale (more than $10^7$ atoms) atomic structures.
Updated: 2024-05-15 03:46:41
标题: 主动学习超大规模原子结构的有效哈密顿量
摘要: 基于第一性原理的有效哈密顿方案为大规模结构提供了一种最准确的建模技术,特别是对于铁电材料。然而,有效哈密顿的参数化是复杂的,对于一些复杂系统如高熵钙钛矿可能会很困难。在这里,我们提出了一个通用形式的有效哈密顿,并开发了一种基于贝叶斯线性回归的主动机器学习方法来参数化有效哈密顿。参数化应用于分子动力学模拟,预测每一步的能量、力、应力及其不确定性,从而决定是否执行第一性原理计算以重新训练参数。以BaTiO$_3$、Pb(Zr$_{0.75}$Ti$_{0.25}$)O$_3$和(Pb,Sr)TiO$_3$系统的结构作为示例,展示了该方法与传统参数化方法和实验相比的准确性。这种机器学习方法提供了一种通用和自动的方式来计算超大规模(超过$10^7$个原子)原子结构的有效哈密顿参数。
更新时间: 2024-05-15 03:46:41
领域: cond-mat.mtrl-sci,cs.LG,physics.app-ph,physics.comp-ph
Naturalistic Music Decoding from EEG Data via Latent Diffusion Models
In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. We additionally perform song classification based on the generated tracks. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.
Updated: 2024-05-15 03:26:01
标题: 通过潜在扩散模型从脑电图数据中解码自然音乐
摘要: 在本文中,我们探讨了使用潜在扩散模型这一强大生成模型家族的潜力,用于从脑电图(EEG)记录中重建自然音乐的任务。与简单的具有有限音色的音乐不同,比如MIDI生成的曲调或单声部作品,这里的重点是复杂的音乐,包含各种乐器、声音和效果,富含和声和音色。本研究代表了在使用无创EEG数据进行高质量音乐重建方面的初步尝试,采用端到端训练方法,直接在原始数据上进行,无需手动预处理和通道选择。我们在公开的NMED-T数据集上训练我们的模型,并提出基于神经嵌入的量化评估指标。我们还根据生成的曲目进行歌曲分类。我们的工作为神经解码和脑-计算机界面的持续研究做出了贡献,为使用EEG数据进行复杂听觉信息重建的可行性提供了见解。
更新时间: 2024-05-15 03:26:01
领域: cs.SD,cs.LG,eess.AS
Improving Transformers using Faithful Positional Encoding
We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach is based on solid mathematical grounds and has a guarantee of not losing information about the positional order of the input sequence. We show that the new encoding approach systematically improves the prediction performance in the time-series classification task.
Updated: 2024-05-15 03:17:30
标题: 使用忠实的位置编码改进Transformer
摘要: 我们提出了一种新的位置编码方法,适用于一种称为Transformer的神经网络架构。与标准的正弦位置编码不同,我们的方法基于坚实的数学基础,并保证不会丢失关于输入序列位置顺序的信息。我们展示了这种新的编码方法在时间序列分类任务中系统地提高了预测性能。
更新时间: 2024-05-15 03:17:30
领域: cs.LG
Response Matching for generating materials and molecules
Machine learning has recently emerged as a powerful tool for generating new molecular and material structures. The success of state-of-the-art models stems from their ability to incorporate physical symmetries, such as translation, rotation, and periodicity. Here, we present a novel generative method called Response Matching (RM), which leverages the fact that each stable material or molecule exists at the minimum of its potential energy surface. Consequently, any perturbation induces a response in energy and stress, driving the structure back to equilibrium. Matching to such response is closely related to score matching in diffusion models. By employing the combination of a machine learning interatomic potential and random structure search as the denoising model, RM exploits the locality of atomic interactions, and inherently respects permutation, translation, rotation, and periodic invariances. RM is the first model to handle both molecules and bulk materials under the same framework. We demonstrate the efficiency and generalization of RM across three systems: a small organic molecular dataset, stable crystals from the Materials Project, and one-shot learning on a single diamond configuration.
Updated: 2024-05-15 03:08:21
标题: 响应匹配用于生成材料和分子
摘要: 最近,机器学习已经成为生成新分子和材料结构的强大工具。最先进模型的成功源于它们能够整合物理对称性,如平移、旋转和周期性。在这里,我们提出了一种名为响应匹配(RM)的新型生成方法,该方法利用了每种稳定材料或分子存在于其势能表面最小值的事实。因此,任何扰动都会引起能量和应力的响应,将结构驱动回平衡状态。与扩散模型中的得分匹配密切相关的是匹配到这种响应。通过将机器学习间原子势与随机结构搜索作为去噪模型的组合,RM利用了原子间相互作用的局部性,并固有地尊重排列、平移、旋转和周期不变性。RM是第一个在相同框架下处理分子和块材料的模型。我们展示了RM在三个系统上的效率和泛化能力:一个小的有机分子数据集,来自材料项目的稳定晶体,以及一次性学习单个金刚石配置。
更新时间: 2024-05-15 03:08:21
领域: cs.LG,cond-mat.mtrl-sci,physics.comp-ph
CTS: A Consistency-Based Medical Image Segmentation Model
In medical image segmentation tasks, diffusion models have shown significant potential. However, mainstream diffusion models suffer from drawbacks such as multiple sampling times and slow prediction results. Recently, consistency models, as a standalone generative network, have resolved this issue. Compared to diffusion models, consistency models can reduce the sampling times to once, not only achieving similar generative effects but also significantly speeding up training and prediction. However, they are not suitable for image segmentation tasks, and their application in the medical imaging field has not yet been explored. Therefore, this paper applies the consistency model to medical image segmentation tasks, designing multi-scale feature signal supervision modes and loss function guidance to achieve model convergence. Experiments have verified that the CTS model can obtain better medical image segmentation results with a single sampling during the test phase.
Updated: 2024-05-15 03:07:42
标题: CTS: 一种基于一致性的医学图像分割模型
摘要: 在医学图像分割任务中,扩散模型显示出显著的潜力。然而,主流的扩散模型存在诸如多次采样和预测结果缓慢等缺点。最近,一种单独的生成网络一致性模型解决了这个问题。与扩散模型相比,一致性模型可以将采样次数减少到一次,不仅实现类似的生成效果,而且显著加快了训练和预测速度。然而,它们不适用于图像分割任务,且在医学成像领域的应用尚未被探索。因此,本文将一致性模型应用于医学图像分割任务,设计多尺度特征信号监督模式和损失函数指导以实现模型收敛。实验证明,在测试阶段,CTS模型可以通过单次采样获得更好的医学图像分割结果。
更新时间: 2024-05-15 03:07:42
领域: cs.CV,cs.AI
Dielectric Tensor Prediction for Inorganic Materials Using Latent Information from Preferred Potential
Dielectrics are materials with widespread applications in flash memory, central processing units, photovoltaics, capacitors, etc. However, the availability of public dielectric data remains limited, hindering research and development efforts. Previously, machine learning models focused on predicting dielectric constants as scalars, overlooking the importance of dielectric tensors in understanding material properties under directional electric fields for material design and simulation. This study demonstrates the value of common equivariant structural embedding features derived from a universal neural network potential in enhancing the prediction of dielectric properties. To integrate channel information from various-rank latent features while preserving the desired SE(3) equivariance to the second-rank dielectric tensors, we design an equivariant readout decoder to predict the total, electronic, and ionic dielectric tensors individually, and compare our model with the state-of-the-art models. Finally, we evaluate our model by conducting virtual screening on thermodynamical stable structure candidates in Materials Project. The material Ba\textsubscript{2}SmTaO\textsubscript{6} with large band gaps ($E_g=3.36 \mathrm{eV}$) and dielectric constants ($\epsilon=93.81$) is successfully identified out of the 14k candidate set. The results show that our methods give good accuracy on predicting dielectric tensors of inorganic materials, emphasizing their potential in contributing to the discovery of novel dielectrics.
Updated: 2024-05-15 02:58:19
标题: 使用来自优先电位的潜在信息预测无机材料的介电张量
摘要: 介电材料在闪存存储器、中央处理单元、光伏、电容器等领域有着广泛的应用。然而,公开的介电数据的可用性仍然有限,阻碍了研究和开发工作。先前的机器学习模型主要集中在预测介电常数作为标量,忽视了了解材料在方向性电场下的性质的重要性,这对于材料设计和模拟至关重要。本研究展示了利用来自通用神经网络潜力的常见等变结构嵌入特征来增强介电性能预测的价值。为了整合来自各种秩潜在特征的通道信息,同时保持对第二秩介电张量的所需SE(3)等变性,我们设计了一个等变读出解码器,用于分别预测总介电张量、电子介电张量和离子介电张量,并将我们的模型与最先进的模型进行比较。最后,我们通过在Materials Project中对热力学稳定结构候选者进行虚拟筛选来评估我们的模型。具有较大带隙($E_g=3.36 \mathrm{eV}$)和介电常数($\epsilon=93.81$)的材料Ba\textsubscript{2}SmTaO\textsubscript{6}成功地从14k候选集中识别出来。结果表明,我们的方法在预测无机材料的介电张量方面具有很好的准确性,强调了它们对发现新型介电材料的潜力。
更新时间: 2024-05-15 02:58:19
领域: cond-mat.mtrl-sci,cs.LG
Perception Without Vision for Trajectory Prediction: Ego Vehicle Dynamics as Scene Representation for Efficient Active Learning in Autonomous Driving
This study investigates the use of trajectory and dynamic state information for efficient data curation in autonomous driving machine learning tasks. We propose methods for clustering trajectory-states and sampling strategies in an active learning framework, aiming to reduce annotation and data costs while maintaining model performance. Our approach leverages trajectory information to guide data selection, promoting diversity in the training data. We demonstrate the effectiveness of our methods on the trajectory prediction task using the nuScenes dataset, showing consistent performance gains over random sampling across different data pool sizes, and even reaching sub-baseline displacement errors at just 50% of the data cost. Our results suggest that sampling typical data initially helps overcome the ''cold start problem,'' while introducing novelty becomes more beneficial as the training pool size increases. By integrating trajectory-state-informed active learning, we demonstrate that more efficient and robust autonomous driving systems are possible and practical using low-cost data curation strategies.
Updated: 2024-05-15 02:54:11
标题: 没有视觉的感知:以自我车辆动态作为场景表示,实现自动驾驶中高效主动学习的路径预测
摘要: 本研究调查了在自动驾驶机器学习任务中利用轨迹和动态状态信息进行高效数据整理的方法。我们提出了在主动学习框架中对轨迹状态进行聚类和采样策略的方法,旨在降低注释和数据成本,同时保持模型性能。我们的方法利用轨迹信息来指导数据选择,促进训练数据的多样性。我们在使用nuScenes数据集进行轨迹预测任务时展示了我们方法的有效性,表明在不同数据池大小上相对于随机采样都取得了一致的性能提升,甚至在只使用了50%的数据成本时达到了亚基准位移误差。我们的结果表明,最初对典型数据进行采样有助于克服“冷启动问题”,而随着训练池大小的增加,引入新颖性变得更加有益。通过整合轨迹状态信息引导的主动学习,我们证明了使用低成本数据整理策略可以实现更高效和更稳健的自动驾驶系统。
更新时间: 2024-05-15 02:54:11
领域: cs.LG,cs.AI,cs.CV,cs.RO
On the Shape of Brainscores for Large Language Models (LLMs)
With the rise of Large Language Models (LLMs), the novel metric "Brainscore" emerged as a means to evaluate the functional similarity between LLMs and human brain/neural systems. Our efforts were dedicated to mining the meaning of the novel score by constructing topological features derived from both human fMRI data involving 190 subjects, and 39 LLMs plus their untrained counterparts. Subsequently, we trained 36 Linear Regression Models and conducted thorough statistical analyses to discern reliable and valid features from our constructed ones. Our findings reveal distinctive feature combinations conducive to interpreting existing brainscores across various brain regions of interest (ROIs) and hemispheres, thereby significantly contributing to advancing interpretable machine learning (iML) studies. The study is enriched by our further discussions and analyses concerning existing brainscores. To our knowledge, this study represents the first attempt to comprehend the novel metric brainscore within this interdisciplinary domain.
Updated: 2024-05-15 02:46:45
标题: 关于大型语言模型(LLMs)的Brainscores形状
摘要: 随着大型语言模型(LLMs)的兴起,“Brainscore”这一新型指标出现,成为评估LLMs与人类大脑/神经系统功能相似性的手段。我们的努力致力于通过构建从涉及190名受试者的人类fMRI数据和39个LLMs及其未经训练的对照组中衍生的拓扑特征来挖掘这一新型分数的含义。随后,我们训练了36个线性回归模型,并进行了彻底的统计分析,以辨别我们构建的可靠和有效特征。我们的研究发现,独特的特征组合有助于解释不同感兴趣的脑区域(ROIs)和半球之间的现有brainscores,从而显著推动了可解释的机器学习(iML)研究。该研究通过进一步讨论和分析现有brainscores得到了丰富。据我们所知,这项研究代表了首次尝试在这一跨学科领域理解新型指标brainscore。
更新时间: 2024-05-15 02:46:45
领域: q-bio.NC,cs.AI,cs.CL,cs.LG
Computational Thought Experiments for a More Rigorous Philosophy and Science of the Mind
We offer philosophical motivations for a method we call Virtual World Cognitive Science (VW CogSci), in which researchers use virtual embodied agents that are embedded in virtual worlds to explore questions in the field of Cognitive Science. We focus on questions about mental and linguistic representation and the ways that such computational modeling can add rigor to philosophical thought experiments, as well as the terminology used in the scientific study of such representations. We find that this method forces researchers to take a god's-eye view when describing dynamical relationships between entities in minds and entities in an environment in a way that eliminates the need for problematic talk of belief and concept types, such as the belief that cats are silly, and the concept CAT, while preserving belief and concept tokens in individual cognizers' minds. We conclude with some further key advantages of VW CogSci for the scientific study of mental and linguistic representation and for Cognitive Science more broadly.
Updated: 2024-05-15 02:32:00
标题: 计算思维实验:为更严谨的心灵哲学和科学而设
摘要: 我们提出了一种我们称之为虚拟世界认知科学(VW CogSci)的方法的哲学动机,其中研究人员使用嵌入虚拟世界中的虚拟体代理来探索认知科学领域的问题。我们关注有关心理和语言表示以及这种计算建模如何使哲学思想实验更加严谨,以及科学研究中使用的术语的问题。我们发现,这种方法迫使研究人员以上帝视角描述心智中实体之间的动态关系以及环境中实体的关系,从而消除了对信仰和概念类型的问题性讨论,例如认为猫很傻的信仰和CAT的概念,同时保留了个体认知者的信仰和概念令牌。我们最后总结了VW CogSci对心智和语言表示的科学研究以及认知科学更广泛的一些关键优势。
更新时间: 2024-05-15 02:32:00
领域: cs.CL,cs.AI,q-bio.NC
Efficient Pruning of Large Language Model with Adaptive Estimation Fusion
Large language models (LLMs) have become crucial for many generative downstream tasks, leading to an inevitable trend and significant challenge to deploy them efficiently on resource-constrained devices. Structured pruning is a widely used method to address this challenge. However, when dealing with the complex structure of the multiple decoder layers, general methods often employ common estimation approaches for pruning. These approaches lead to a decline in accuracy for specific downstream tasks. In this paper, we introduce a simple yet efficient method that adaptively models the importance of each substructure. Meanwhile, it can adaptively fuse coarse-grained and finegrained estimations based on the results from complex and multilayer structures. All aspects of our design seamlessly integrate into the endto-end pruning framework. Our experimental results, compared with state-of-the-art methods on mainstream datasets, demonstrate average accuracy improvements of 1.1%, 1.02%, 2.0%, and 1.2% for LLaMa-7B,Vicuna-7B, Baichuan-7B, and Bloom-7b1, respectively.
Updated: 2024-05-15 02:20:54
标题: 高效剪枝大型语言模型与自适应估计融合
摘要: 大型语言模型(LLMs)已经成为许多生成性下游任务至关重要,这导致将它们高效部署到资源受限设备上成为一种不可避免的趋势和重大挑战。结构化剪枝是一种广泛使用的方法来应对这一挑战。然而,当处理多个解码器层的复杂结构时,通常的方法会采用常见的剪枝估计方法。这些方法会导致特定下游任务的准确性下降。在本文中,我们介绍了一种简单而高效的方法,它自适应地建模每个子结构的重要性。同时,它可以根据复杂和多层结构的结果自适应地融合粗粒度和细粒度的估计。我们设计的各个方面都无缝地集成到端到端的剪枝框架中。我们的实验结果与主流数据集上的最先进方法进行比较,分别对LLaMa-7B、Vicuna-7B、Baichuan-7B和Bloom-7b1展示了平均准确性提高了1.1%、1.02%、2.0%和1.2%。
更新时间: 2024-05-15 02:20:54
领域: cs.CL,cs.AI,cs.LG
SMART: Towards Pre-trained Missing-Aware Model for Patient Health Status Prediction
Electronic health record (EHR) data has emerged as a valuable resource for analyzing patient health status. However, the prevalence of missing data in EHR poses significant challenges to existing methods, leading to spurious correlations and suboptimal predictions. While various imputation techniques have been developed to address this issue, they often obsess unnecessary details and may introduce additional noise when making clinical predictions. To tackle this problem, we propose SMART, a Self-Supervised Missing-Aware RepresenTation Learning approach for patient health status prediction, which encodes missing information via elaborated attentions and learns to impute missing values through a novel self-supervised pre-training approach that reconstructs missing data representations in the latent space. By adopting missing-aware attentions and focusing on learning higher-order representations, SMART promotes better generalization and robustness to missing data. We validate the effectiveness of SMART through extensive experiments on six EHR tasks, demonstrating its superiority over state-of-the-art methods.
Updated: 2024-05-15 02:19:34
标题: 智能:面向预训练缺失感知模型的患者健康状况预测
摘要: 电子健康记录(EHR)数据已被证实是分析患者健康状况的宝贵资源。然而,在EHR中缺失数据的普遍存在给现有方法带来了重大挑战,导致虚假的相关性和次优的预测结果。虽然已经开发了各种插补技术来解决这个问题,但它们通常过于关注不必要的细节,在进行临床预测时可能引入额外的噪音。为了解决这个问题,我们提出了SMART,一种适用于患者健康状况预测的自监督缺失感知表征学习方法,通过详细的注意力编码缺失信息,并通过一种新颖的自监督预训练方法学习填充缺失值,重建潜在空间中的缺失数据表示。通过采用缺失感知注意力并专注于学习高阶表示,SMART促进了更好的泛化和对缺失数据的鲁棒性。我们通过在六个EHR任务上进行大量实验验证了SMART的有效性,证明其优于现有最先进的方法。
更新时间: 2024-05-15 02:19:34
领域: cs.LG
CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration
Image-to-point cloud (I2P) registration is a fundamental task for robots and autonomous vehicles to achieve cross-modality data fusion and localization. Existing I2P registration methods estimate correspondences at the point/pixel level, often overlooking global alignment. However, I2P matching can easily converge to a local optimum when performed without high-level guidance from global constraints. To address this issue, this paper introduces CoFiI2P, a novel I2P registration network that extracts correspondences in a coarse-to-fine manner to achieve the globally optimal solution. First, the image and point cloud data are processed through a Siamese encoder-decoder network for hierarchical feature extraction. Second, a coarse-to-fine matching module is designed to leverage these features and establish robust feature correspondences. Specifically, In the coarse matching phase, a novel I2P transformer module is employed to capture both homogeneous and heterogeneous global information from the image and point cloud data. This enables the estimation of coarse super-point/super-pixel matching pairs with discriminative descriptors. In the fine matching module, point/pixel pairs are established with the guidance of super-point/super-pixel correspondences. Finally, based on matching pairs, the transform matrix is estimated with the EPnP-RANSAC algorithm. Extensive experiments conducted on the KITTI dataset demonstrate that CoFiI2P achieves impressive results, with a relative rotation error (RRE) of 1.14 degrees and a relative translation error (RTE) of 0.29 meters. These results represent a significant improvement of 84% in RRE and 89% in RTE compared to the current state-of-the-art (SOTA) method. The project page is available at \url{https://whu-usi3dv.github.io/CoFiI2P}.
Updated: 2024-05-15 02:18:23
标题: CoFiI2P: 图像到点云配准的粗到细对应关系
摘要: 图像到点云(I2P)配准是机器人和自主车辆实现跨模态数据融合和定位的基本任务。现有的I2P配准方法通常在点/像素级别估计对应关系,往往忽视全局对齐。然而,缺乏来自全局约束的高级引导时,I2P匹配很容易收敛到局部最优解。为解决这一问题,本文引入了CoFiI2P,一种新颖的I2P配准网络,以粗到精的方式提取对应关系以实现全局最优解。首先,图像和点云数据通过一个共享编码-解码网络进行分层特征提取。其次,设计了一个粗到精匹配模块,利用这些特征建立强健的特征对应关系。具体而言,在粗匹配阶段,采用一种新颖的I2P变换器模块从图像和点云数据中捕获均匀和异质全局信息。这使得能够估计带有具有区分性描述符的粗超点/超像素匹配对。在精匹配模块中,在超点/超像素对的指导下建立点/像素对。最后,基于匹配对,利用EPnP-RANSAC算法估计变换矩阵。在KITTI数据集上进行的大量实验表明,CoFiI2P取得了令人印象深刻的结果,相对旋转误差(RRE)为1.14度,相对平移误差(RTE)为0.29米。与当前最先进方法相比,RRE提高了84%,RTE提高了89%。项目页面可在\url{https://whu-usi3dv.github.io/CoFiI2P}上查看。
更新时间: 2024-05-15 02:18:23
领域: cs.CV,cs.AI,cs.RO
Unmasking Efficiency: Learning Salient Sparse Models in Non-IID Federated Learning
In this work, we propose Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient communication. SSFL identifies a sparse subnetwork prior to training, leveraging parameter saliency scores computed separately on local client data in non-IID scenarios, and then aggregated, to determine a global mask. Only the sparse model weights are communicated each round between the clients and the server. We validate SSFL's effectiveness using standard non-IID benchmarks, noting marked improvements in the sparsity--accuracy trade-offs. Finally, we deploy our method in a real-world federated learning framework and report improvement in communication time.
Updated: 2024-05-15 02:13:51
标题: 揭示效率:在非独立同分布联邦学习中学习显著的稀疏模型
摘要: 在这项工作中,我们提出了显著稀疏联邦学习(SSFL),这是一种简化的稀疏联邦学习方法,具有高效的通信。SSFL在训练之前识别稀疏子网络,利用在本地客户端数据中分别计算的参数显著性分数,在非独立同分布情况下进行聚合,以确定全局掩码。每轮只在客户端和服务器之间传输稀疏模型权重。我们使用标准的非独立同分布基准验证了SSFL的有效性,注意到在稀疏性-准确性权衡方面的明显改善。最后,我们将我们的方法部署在一个真实的联邦学习框架中,并报告了通信时间的改善。
更新时间: 2024-05-15 02:13:51
领域: cs.LG,cs.AI,cs.DC
Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background Knowledge
Center-based clustering has attracted significant research interest from both theory and practice. In many practical applications, input data often contain background knowledge that can be used to improve clustering results. In this work, we build on widely adopted $k$-center clustering and model its input background knowledge as must-link (ML) and cannot-link (CL) constraint sets. However, most clustering problems including $k$-center are inherently $\mathcal{NP}$-hard, while the more complex constrained variants are known to suffer severer approximation and computation barriers that significantly limit their applicability. By employing a suite of techniques including reverse dominating sets, linear programming (LP) integral polyhedron, and LP duality, we arrive at the first efficient approximation algorithm for constrained $k$-center with the best possible ratio of 2. We also construct competitive baseline algorithms and empirically evaluate our approximation algorithm against them on a variety of real datasets. The results validate our theoretical findings and demonstrate the great advantages of our algorithm in terms of clustering cost, clustering quality, and running time.
Updated: 2024-05-15 01:42:47
标题: 受限k中心聚类问题中基于实例级背景知识的近似最优算法
摘要: 基于中心的聚类已经引起了理论和实践领域的广泛研究兴趣。在许多实际应用中,输入数据通常包含可用于改善聚类结果的背景知识。在这项工作中,我们基于广泛采用的$k$-中心聚类,并将其输入背景知识建模为必连(ML)和不可连(CL)约束集。然而,包括$k$-中心在内的大多数聚类问题从根本上都是$\mathcal{NP}$-难题,而更复杂的受约束变体被认为受到更严重的近似和计算障碍的限制,这严重限制了它们的适用性。通过采用一系列技术,包括反支配集、线性规划(LP)整体多面体和LP对偶,我们得到了受限制的$k$-中心的第一个高效近似算法,具有最佳的比率2。我们还构建了竞争基线算法,并在各种真实数据集上对我们的近似算法进行了实证评估。结果验证了我们的理论发现,并展示了我们的算法在聚类成本、聚类质量和运行时间方面的巨大优势。
更新时间: 2024-05-15 01:42:47
领域: cs.LG,cs.AI
Deep Learning in Earthquake Engineering: A Comprehensive Review
This article surveys the growing interest in utilizing Deep Learning (DL) as a powerful tool to address challenging problems in earthquake engineering. Despite decades of advancement in domain knowledge, issues such as uncertainty in earthquake occurrence, unpredictable seismic loads, nonlinear structural responses, and community engagement remain difficult to tackle using domain-specific methods. DL offers promising solutions by leveraging its data-driven capacity for nonlinear mapping, sequential data modeling, automatic feature extraction, dimensionality reduction, optimal decision-making, etc. However, the literature lacks a comprehensive review that systematically covers a consistent scope intersecting DL and earthquake engineering. To bridge the gap, the article first discusses methodological advances to elucidate various applicable DL techniques, such as multi-layer perceptron (MLP), convolutional neural network (CNN), recurrent neural network (RNN), generative adversarial network (GAN), autoencoder (AE), transfer learning (TL), reinforcement learning (RL), and graph neural network (GNN). A thorough research landscape is then disclosed by exploring various DL applications across different research topics, including vision-based seismic damage assessment and structural characterization, seismic demand and damage state prediction, seismic response history prediction, regional seismic risk assessment and community resilience, ground motion (GM) for engineering use, seismic response control, and the inverse problem of system/damage identification. Suitable DL techniques for each research topic are identified, emphasizing the preeminence of CNN for vision-based tasks, RNN for sequential data, RL for community resilience, and unsupervised learning for GM analysis. The article also discusses opportunities and challenges for leveraging DL in earthquake engineering research and practice.
Updated: 2024-05-15 01:22:30
标题: 地震工程中的深度学习:一项全面评估
摘要: 本文调查了利用深度学习(DL)作为解决地震工程中具有挑战性问题的强大工具所引起的日益增长的兴趣。尽管在领域知识方面有几十年的进步,但地震发生的不确定性、不可预测的地震荷载、非线性结构响应和社区参与等问题仍然难以用特定领域的方法解决。DL通过利用其数据驱动的非线性映射、顺序数据建模、自动特征提取、维度降低、最佳决策等能力提供了有希望的解决方案。然而,文献缺乏一个系统地涵盖DL和地震工程交叉领域的全面审查。为了弥补这一空白,本文首先讨论了方法论进展,以阐明各种适用的DL技术,如多层感知器(MLP)、卷积神经网络(CNN)、循环神经网络(RNN)、生成对抗网络(GAN)、自动编码器(AE)、迁移学习(TL)、强化学习(RL)和图神经网络(GNN)。然后通过探索不同研究主题下的各种DL应用,包括基于视觉的地震损伤评估和结构特征化、地震需求和损伤状态预测、地震响应历史预测、区域地震风险评估和社区韧性、用于工程的地面运动(GM)、地震响应控制以及系统/损伤识别的反问题,揭示了一个全面的研究景观。确定了每个研究主题的适用DL技术,并强调了CNN在基于视觉任务中的卓越性,RNN在顺序数据中的优势,RL在社区韧性中的重要性以及无监督学习在GM分析中的作用。本文还讨论了在地震工程研究与实践中运用DL所面临的机遇和挑战。
更新时间: 2024-05-15 01:22:30
领域: cs.LG
Chinchilla Scaling: A replication attempt
Hoffmann et al. (2022) propose three methods for estimating a compute-optimal scaling law. We attempt to replicate their third estimation procedure, which involves fitting a parametric loss function to a reconstruction of data from their plots. We find that the reported estimates are inconsistent with their first two estimation methods, fail at fitting the extracted data, and report implausibly narrow confidence intervals--intervals this narrow would require over 600,000 experiments, while they likely only ran fewer than 500. In contrast, our rederivation of the scaling law using the third approach yields results that are compatible with the findings from the first two estimation procedures described by Hoffmann et al.
Updated: 2024-05-15 00:57:23
标题: 南美栗鼠标度:一个复制尝试
摘要: Hoffmann等人(2022)提出了三种估算计算最优缩放定律的方法。我们尝试复制他们的第三种估算方法,该方法涉及将参数损失函数拟合到从他们的图中重建的数据上。我们发现,报道的估算结果与他们的前两种估算方法不一致,无法拟合提取的数据,并报告了不合理地窄的置信区间--这么窄的区间需要超过60万次实验,而他们可能只进行了不到500次。相比之下,我们使用第三种方法重新推导缩放定律的结果与Hoffmann等人描述的前两种估算方法得出的结果相符。
更新时间: 2024-05-15 00:57:23
领域: cs.AI,cs.CL
Feature-based Federated Transfer Learning: Communication Efficiency, Robustness and Privacy
In this paper, we propose feature-based federated transfer learning as a novel approach to improve communication efficiency by reducing the uplink payload by multiple orders of magnitude compared to that of existing approaches in federated learning and federated transfer learning. Specifically, in the proposed feature-based federated learning, we design the extracted features and outputs to be uploaded instead of parameter updates. For this distributed learning model, we determine the required payload and provide comparisons with the existing schemes. Subsequently, we analyze the robustness of feature-based federated transfer learning against packet loss, data insufficiency, and quantization. Finally, we address privacy considerations by defining and analyzing label privacy leakage and feature privacy leakage, and investigating mitigating approaches. For all aforementioned analyses, we evaluate the performance of the proposed learning scheme via experiments on an image classification task and a natural language processing task to demonstrate its effectiveness.
Updated: 2024-05-15 00:43:19
标题: 基于特征的联邦迁移学习:通信效率、鲁棒性和隐私
摘要: 在这篇论文中,我们提出了基于特征的联合转移学习作为一种新颖的方法,通过将上行负载降低多个数量级,以改善通信效率,与现有的联合学习和联合转移学习方法相比。具体而言,在提出的基于特征的联合学习中,我们设计了提取的特征和输出,而不是参数更新进行上传。对于这种分布式学习模型,我们确定所需的负载,并与现有方案进行比较。随后,我们分析了基于特征的联合转移学习对数据丢失、数据不足和量化的鲁棒性。最后,我们通过定义和分析标签隐私泄漏和特征隐私泄漏,并研究缓解方法来解决隐私考虑。对于所有上述分析,我们通过在图像分类任务和自然语言处理任务上进行实验来评估所提出的学习方案的性能,以展示其有效性。
更新时间: 2024-05-15 00:43:19
领域: cs.LG,cs.MA
Cons-training tensor networks
In this study, we introduce a novel family of tensor networks, termed constrained matrix product states (MPS), designed to incorporate exactly arbitrary linear constraints into sparse block structures. These tensor networks effectively bridge the gap between U(1) symmetric MPS and traditional, unconstrained MPS. Central to our approach is the concept of a quantum region, an extension of quantum numbers traditionally used in symmetric tensor networks, adapted to capture any linear constraint, including the unconstrained scenario. We further develop canonical forms for these new MPS, which allow for the merging and factorization of tensor blocks according to quantum region fusion rules. Utilizing this canonical form, we apply an unsupervised training strategy to optimize arbitrary cost functions subject to linear constraints. We use this to solve the quadratic knapsack problem and show a superior performance against a leading nonlinear integer programming solver, highlighting the potential of our method in tackling complex constrained combinatorial optimization problems
Updated: 2024-05-15 00:13:18
标题: Cons-training tensor networks 张量网络的共享训练
摘要: 在这项研究中,我们介绍了一种新颖的张量网络系列,称为受限矩阵积态(MPS),旨在将任意线性约束精确地纳入稀疏块结构中。这些张量网络有效地弥合了U(1)对称MPS和传统非受限MPS之间的差距。我们方法的核心是量子区域的概念,这是对传统对称张量网络中使用的量子数的扩展,适应于捕捉任意线性约束,包括非受限情况。我们进一步为这些新的MPS开发了规范形式,允许根据量子区域融合规则合并和分解张量块。利用这种规范形式,我们应用了一种无监督训练策略来优化受限线性约束下的任意成本函数。我们将其用于解决二次背包问题,并展示了与一种领先的非线性整数规划求解器相比的卓越性能,突出了我们方法在应对复杂受限组合优化问题中的潜力。
更新时间: 2024-05-15 00:13:18
领域: math.NA,cs.LG,cs.NA,quant-ph
Improving Sequential Market Clearing via Value-oriented Renewable Energy Forecasting
Large penetration of renewable energy sources (RESs) brings huge uncertainty into the electricity markets. While existing deterministic market clearing fails to accommodate the uncertainty, the recently proposed stochastic market clearing struggles to achieve desirable market properties. In this work, we propose a value-oriented forecasting approach, which tactically determines the RESs generation that enters the day-ahead market. With such a forecast, the existing deterministic market clearing framework can be maintained, and the day-ahead and real-time overall operation cost is reduced. At the training phase, the forecast model parameters are estimated to minimize expected day-ahead and real-time overall operation costs, instead of minimizing forecast errors in a statistical sense. Theoretically, we derive the exact form of the loss function for training the forecast model that aligns with such a goal. For market clearing modeled by linear programs, this loss function is a piecewise linear function. Additionally, we derive the analytical gradient of the loss function with respect to the forecast, which inspires an efficient training strategy. A numerical study shows our forecasts can bring significant benefits of the overall cost reduction to deterministic market clearing, compared to quality-oriented forecasting approach.
Updated: 2024-05-15 00:04:08
标题: 通过面向价值的可再生能源预测来改善顺序市场清算
摘要: 大规模可再生能源(RESs)的渗透给电力市场带来了巨大的不确定性。虽然现有的确定性市场清算无法适应这种不确定性,但最近提出的随机市场清算也难以实现理想的市场特性。在这项工作中,我们提出了一种价值导向的预测方法,该方法 tively 确定了进入日前市场的RESs发电量。凭借这样的预测,现有的确定性市场清算框架可以得以保持,日前和实时总体运营成本也得以降低。在训练阶段,预测模型参数被估计为最小化预期的日前和实时总体运营成本,而不是在统计意义上最小化预测错误。从理论上讲,我们推导出了用于训练与此目标一致的预测模型的损失函数的确切形式。对于由线性程序建模的市场清算,这个损失函数是一个分段线性函数。此外,我们推导出了损失函数相对于预测的解析梯度,这激发了一种高效的训练策略。数值研究表明,与以质量为导向的预测方法相比,我们的预测可以为确定性市场清算带来显著的总体成本降低收益。
更新时间: 2024-05-15 00:04:08
领域: eess.SY,cs.LG,cs.SY