Lower Bounds for Differential Privacy Under Continual Observation and Online Threshold Queries
One of the most basic problems for studying the "price of privacy over time" is the so called private counter problem, introduced by Dwork et al. (2010) and Chan et al. (2010). In this problem, we aim to track the number of events that occur over time, while hiding the existence of every single event. More specifically, in every time step $t\in[T]$ we learn (in an online fashion) that $\Delta_t\geq 0$ new events have occurred, and must respond with an estimate $n_t\approx\sum_{j=1}^t \Delta_j$. The privacy requirement is that all of the outputs together, across all time steps, satisfy event level differential privacy. The main question here is how our error needs to depend on the total number of time steps $T$ and the total number of events $n$. Dwork et al. (2015) showed an upper bound of $O\left(\log(T)+\log^2(n)\right)$, and Henzinger et al. (2023) showed a lower bound of $\Omega\left(\min\{\log n, \log T\}\right)$. We show a new lower bound of $\Omega\left(\min\{n,\log T\}\right)$, which is tight w.r.t. the dependence on $T$, and is tight in the sparse case where $\log^2 n=O(\log T)$. Our lower bound has the following implications: $\bullet$ We show that our lower bound extends to the "online thresholds problem", where the goal is to privately answer many "quantile queries" when these queries are presented one-by-one. This resolves an open question of Bun et al. (2017). $\bullet$ Our lower bound implies, for the first time, a separation between the number of mistakes obtainable by a private online learner and a non-private online learner. This partially resolves a COLT'22 open question published by Sanyal and Ramponi. $\bullet$ Our lower bound also yields the first separation between the standard model of private online learning and a recently proposed relaxed variant of it, called private online prediction.
Updated: 2024-04-17 23:59:19
标题: 在持续观察和在线阈值查询下差分隐私的下界
摘要: 研究“隐私价格随时间变化”的最基本问题之一是所谓的私密计数器问题,由Dwork等人(2010年)和Chan等人(2010年)引入。在这个问题中,我们的目标是在隐藏每一个事件的存在的同时,跟踪随时间发生的事件数量。更具体地,在每个时间步$t\in[T]$中,我们以在线方式了解到$\Delta_t\geq 0$个新事件发生,并且必须以估计$n_t\approx\sum_{j=1}^t \Delta_j$进行回应。隐私要求是,在所有时间步中,所有输出一起满足事件级差分隐私。这里的主要问题是我们的误差如何取决于总时间步数$T$和总事件数$n$。Dwork等人(2015年)展示了$O\left(\log(T)+\log^2(n)\right)$的上界,而Henzinger等人(2023年)展示了$\Omega\left(\min\{\log n, \log T\}\right)$的下界。我们展示了一个新的下界$\Omega\left(\min\{n,\log T\}\right)$,在关于$T$的依赖上是紧致的,并且在稀疏情况下,其中$\log^2 n=O(\log T)$,也是紧致的。我们的下界具有以下含义: - 我们展示我们的下界延伸到“在线阈值问题”,其中目标是在逐个呈现这些查询时私下回答许多“分位数查询”。这解决了Bun等人(2017年)的一个未解问题。 - 我们的下界首次暗示了私人在线学习者和非私人在线学习者获得的错误数量之间的差距。这部分解决了Sanyal和Ramponi在COLT'22上发布的一个未解问题。 - 我们的下界还首次揭示了标准模型的私人在线学习和最近提出的一个被称为私人在线预测的宽松变体之间的差距。
更新时间: 2024-04-17 23:59:19
领域: cs.CR,cs.LG
Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches
The selection of the assumed effect size (AES) critically determines the duration of an experiment, and hence its accuracy and efficiency. Traditionally, experimenters determine AES based on domain knowledge. However, this method becomes impractical for online experimentation services managing numerous experiments, and a more automated approach is hence of great demand. We initiate the study of data-driven AES selection in for online experimentation services by introducing two solutions. The first employs a three-layer Gaussian Mixture Model considering the heteroskedasticity across experiments, and it seeks to estimate the true expected effect size among positive experiments. The second method, grounded in utility theory, aims to determine the optimal effect size by striking a balance between the experiment's cost and the precision of decision-making. Through comparisons with baseline methods using both simulated and real data, we showcase the superior performance of the proposed approaches.
Updated: 2024-04-17 23:56:20
标题: 在线实验中持续时间推荐的效应量估计:利用层次模型和客观效用方法
摘要: 假设效应大小(AES)的选择至关重要,它严重影响实验的持续时间,进而影响实验的准确性和效率。传统上,实验者基于领域知识确定AES。然而,对于管理大量实验的在线实验服务而言,这种方法变得不切实际,因此更自动化的方法需求量很大。我们通过引入两种解决方案,开始研究在线实验服务中基于数据驱动的AES选择。第一种方法采用考虑实验间异方差性的三层高斯混合模型,旨在估计正向实验中真实期望效应大小。第二种方法基于效用理论,旨在通过在实验成本和决策精度之间取得平衡来确定最佳效应大小。通过使用模拟数据和真实数据与基准方法进行比较,我们展示了所提出方法的优越性能。
更新时间: 2024-04-17 23:56:20
领域: cs.LG,stat.ML
TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation
Autonomous driving requires an accurate representation of the environment. A strategy toward high accuracy is to fuse data from several sensors. Learned Bird's-Eye View (BEV) encoders can achieve this by mapping data from individual sensors into one joint latent space. For cost-efficient camera-only systems, this provides an effective mechanism to fuse data from multiple cameras with different views. Accuracy can further be improved by aggregating sensor information over time. This is especially important in monocular camera systems to account for the lack of explicit depth and velocity measurements. Thereby, the effectiveness of developed BEV encoders crucially depends on the operators used to aggregate temporal information and on the used latent representation spaces. We analyze BEV encoders proposed in the literature and compare their effectiveness, quantifying the effects of aggregation operators and latent representations. While most existing approaches aggregate temporal information either in image or in BEV latent space, our analyses and performance comparisons suggest that these latent representations exhibit complementary strengths. Therefore, we develop a novel temporal BEV encoder, TempBEV, which integrates aggregated temporal information from both latent spaces. We consider subsequent image frames as stereo through time and leverage methods from optical flow estimation for temporal stereo encoding. Empirical evaluation on the NuScenes dataset shows a significant improvement by TempBEV over the baseline for 3D object detection and BEV segmentation. The ablation uncovers a strong synergy of joint temporal aggregation in the image and BEV latent space. These results indicate the overall effectiveness of our approach and make a strong case for aggregating temporal information in both image and BEV latent spaces.
Updated: 2024-04-17 23:49:00
标题: TempBEV: 结合图像和BEV空间时序聚合改进学习的BEV编码器
摘要: 自动驾驶需要对环境进行准确的表示。实现高精度的策略是将来自多个传感器的数据融合起来。学习到的鸟瞰图编码器可以通过将来自各个传感器的数据映射到一个共同的潜在空间中来实现这一点。对于成本效益的仅摄像头系统而言,这为融合来自多个具有不同视角的摄像头的数据提供了一种有效的机制。通过随时间聚合传感器信息可以进一步提高准确性。在单目摄像头系统中,这一点尤为重要,以弥补缺乏明确深度和速度测量的不足。因此,开发的BEV编码器的有效性在于聚合时间信息的操作者和使用的潜在表示空间。我们分析了文献中提出的BEV编码器,并比较了它们的有效性,量化了聚合操作符和潜在表示的影响。尽管大多数现有方法在图像或BEV潜在空间中聚合时间信息,但我们的分析和性能比较表明这些潜在表示具有互补的优势。因此,我们开发了一种新的时间BEV编码器,TempBEV,它从两个潜在空间中整合了聚合的时间信息。我们将后续的图像帧视为随时间的立体,并利用光流估计方法进行时间立体编码。在NuScenes数据集上的实证评估显示,TempBEV在3D物体检测和BEV分割方面比基准线有显著改进。消融实验揭示了图像和BEV潜在空间中联合时间聚合的强大协同作用。这些结果表明了我们方法的整体有效性,并为在图像和BEV潜在空间中聚合时间信息提供了有力的论据。
更新时间: 2024-04-17 23:49:00
领域: cs.CV,cs.AI,cs.LG,cs.RO
Developing Situational Awareness for Joint Action with Autonomous Vehicles
Unanswered questions about how human-AV interaction designers can support rider's informational needs hinders Autonomous Vehicles (AV) adoption. To achieve joint human-AV action goals - such as safe transportation, trust, or learning from an AV - sufficient situational awareness must be held by the human, AV, and human-AV system collectively. We present a systems-level framework that integrates cognitive theories of joint action and situational awareness as a means to tailor communications that meet the criteria necessary for goal success. This framework is based on four components of the shared situation: AV traits, action goals, subject-specific traits and states, and the situated driving context. AV communications should be tailored to these factors and be sensitive when they change. This framework can be useful for understanding individual, shared, and distributed human-AV situational awareness and designing for future AV communications that meet the informational needs and goals of diverse groups and in diverse driving contexts.
Updated: 2024-04-17 23:41:48
标题: 使用自动驾驶车辆实现联合行动的情境感知能力的发展
摘要: 未解答的问题是人类与自动驾驶汽车互动设计师如何支持乘客的信息需求,这阻碍了自动驾驶汽车的普及。为了实现共同的人类-自动驾驶汽车行动目标,如安全运输、信任或从自动驾驶汽车中学习,人类、自动驾驶汽车和人类-自动驾驶汽车系统必须共同拥有足够的情境意识。我们提出了一个系统级框架,将联合行动和情境意识的认知理论作为一种方式,以定制符合目标成功所需标准的通信。该框架基于共享情况的四个组成部分:自动驾驶汽车特性、行动目标、主体特性和状态以及情境化驾驶情境。自动驾驶汽车的通信应根据这些因素进行定制,并在它们发生变化时保持敏感。该框架对于理解个人、共享和分布式人类-自动驾驶汽车情境意识以及设计未来自动驾驶汽车通信以满足不同群体和不同驾驶情境的信息需求和目标非常有用。
更新时间: 2024-04-17 23:41:48
领域: cs.HC,cs.AI,cs.RO
When are Foundation Models Effective? Understanding the Suitability for Pixel-Level Classification Using Multispectral Imagery
Foundation models, i.e., very large deep learning models, have demonstrated impressive performances in various language and vision tasks that are otherwise difficult to reach using smaller-size models. The major success of GPT-type of language models is particularly exciting and raises expectations on the potential of foundation models in other domains including satellite remote sensing. In this context, great efforts have been made to build foundation models to test their capabilities in broader applications, and examples include Prithvi by NASA-IBM, Segment-Anything-Model, ViT, etc. This leads to an important question: Are foundation models always a suitable choice for different remote sensing tasks, and when or when not? This work aims to enhance the understanding of the status and suitability of foundation models for pixel-level classification using multispectral imagery at moderate resolution, through comparisons with traditional machine learning (ML) and regular-size deep learning models. Interestingly, the results reveal that in many scenarios traditional ML models still have similar or better performance compared to foundation models, especially for tasks where texture is less useful for classification. On the other hand, deep learning models did show more promising results for tasks where labels partially depend on texture (e.g., burn scar), while the difference in performance between foundation models and deep learning models is not obvious. The results conform with our analysis: The suitability of foundation models depend on the alignment between the self-supervised learning tasks and the real downstream tasks, and the typical masked autoencoder paradigm is not necessarily suitable for many remote sensing problems.
Updated: 2024-04-17 23:30:48
标题: 基金会模型何时有效? 使用多光谱图像理解像素级分类的适用性
摘要: 基础模型,即非常大的深度学习模型,在各种语言和视觉任务中展示了令人印象深刻的性能,这些任务通常很难用较小的模型达到。GPT类型的语言模型取得了重大成功,尤其令人兴奋的是,这引发了对基础模型在其他领域(包括卫星遥感)潜力的期望。在这种背景下,人们为构建基础模型并测试它们在更广泛应用中的能力做出了巨大努力,例如NASA-IBM的Prithvi,Segment-Anything-Model,ViT等。这带来了一个重要问题:基础模型总是不同遥感任务的合适选择吗,何时适用何时不适用?本研究旨在通过与传统机器学习(ML)和常规规模的深度学习模型进行比较,增进对基础模型在中分辨率多光谱图像的像素级分类中的状态和适用性的理解。有趣的是,结果显示,在许多场景中,传统的ML模型仍然具有类似或更好的性能,特别是对于纹理对分类不太有用的任务。另一方面,深度学习模型在标签部分依赖于纹理的任务(例如烧伤疤痕)方面表现出更有希望的结果,而基础模型和深度学习模型之间的性能差异并不明显。结果与我们的分析一致:基础模型的适用性取决于自监督学习任务与实际下游任务之间的对齐程度,并且典型的掩蔽自动编码器范式并不一定适用于许多遥感问题。
更新时间: 2024-04-17 23:30:48
领域: cs.CV,cs.AI,cs.LG
Automated mapping of virtual environments with visual predictive coding
Humans construct internal cognitive maps of their environment directly from sensory inputs without access to a system of explicit coordinates or distance measurements. While machine learning algorithms like SLAM utilize specialized visual inference procedures to identify visual features and construct spatial maps from visual and odometry data, the general nature of cognitive maps in the brain suggests a unified mapping algorithmic strategy that can generalize to auditory, tactile, and linguistic inputs. Here, we demonstrate that predictive coding provides a natural and versatile neural network algorithm for constructing spatial maps using sensory data. We introduce a framework in which an agent navigates a virtual environment while engaging in visual predictive coding using a self-attention-equipped convolutional neural network. While learning a next image prediction task, the agent automatically constructs an internal representation of the environment that quantitatively reflects distances. The internal map enables the agent to pinpoint its location relative to landmarks using only visual information.The predictive coding network generates a vectorized encoding of the environment that supports vector navigation where individual latent space units delineate localized, overlapping neighborhoods in the environment. Broadly, our work introduces predictive coding as a unified algorithmic framework for constructing cognitive maps that can naturally extend to the mapping of auditory, sensorimotor, and linguistic inputs.
Updated: 2024-04-17 23:27:02
标题: 自动映射具有视觉预测编码的虚拟环境
摘要: 人类直接从感觉输入中构建其环境的内部认知地图,而无需访问显式坐标或距离测量系统。虽然像SLAM这样的机器学习算法利用专门的视觉推理程序识别视觉特征,并从视觉和里程计数据构建空间地图,但大脑中认知地图的一般性质表明存在一种统一的映射算法策略,可以推广到听觉、触觉和语言输入。在这里,我们展示了预测编码提供了一种自然而多功能的神经网络算法,用于利用感官数据构建空间地图。我们介绍了一个框架,其中一个代理在参与视觉预测编码的同时在虚拟环境中导航,使用了一个配备自注意力的卷积神经网络。在学习下一个图像预测任务时,代理自动构建了一个内部对环境的表征,该表征在数量上反映了距离。内部地图使代理能够仅使用视觉信息就能确定其相对于地标的位置。预测编码网络生成了一个对环境的矢量化编码,支持矢量导航,其中单个潜在空间单元划定了环境中局部重叠的邻域。总的来说,我们的工作将预测编码引入作为一种统一的算法框架,用于构建认知地图,可以自然地扩展到对听觉、感觉运动和语言输入的映射。
更新时间: 2024-04-17 23:27:02
领域: q-bio.NC,cs.CV,cs.LG,eess.IV
Prompt-Driven Feature Diffusion for Open-World Semi-Supervised Learning
In this paper, we present a novel approach termed Prompt-Driven Feature Diffusion (PDFD) within a semi-supervised learning framework for Open World Semi-Supervised Learning (OW-SSL). At its core, PDFD deploys an efficient feature-level diffusion model with the guidance of class-specific prompts to support discriminative feature representation learning and feature generation, tackling the challenge of the non-availability of labeled data for unseen classes in OW-SSL. In particular, PDFD utilizes class prototypes as prompts in the diffusion model, leveraging their class-discriminative and semantic generalization ability to condition and guide the diffusion process across all the seen and unseen classes. Furthermore, PDFD incorporates a class-conditional adversarial loss for diffusion model training, ensuring that the features generated via the diffusion process can be discriminatively aligned with the class-conditional features of the real data. Additionally, the class prototypes of the unseen classes are computed using only unlabeled instances with confident predictions within a semi-supervised learning framework. We conduct extensive experiments to evaluate the proposed PDFD. The empirical results show PDFD exhibits remarkable performance enhancements over many state-of-the-art existing methods.
Updated: 2024-04-17 23:10:11
标题: 基于提示驱动的特征扩散的开放世界半监督学习
摘要: 在本文中,我们提出了一种新颖的方法,称为Prompt-Driven Feature Diffusion (PDFD),在半监督学习框架下用于开放世界半监督学习(OW-SSL)。在PDFD的核心,我们部署了一个高效的特征级扩散模型,通过类特定提示的指导来支持辨别特征表示学习和特征生成,解决了OW-SSL中对未知类别没有标记数据的挑战。特别是,PDFD在扩散模型中利用类原型作为提示,利用它们的类别辨别和语义泛化能力来调整和指导扩散过程,涵盖了所有已知和未知类别。此外,PDFD还结合了一个类条件对抗损失用于扩散模型的训练,确保通过扩散过程生成的特征能与真实数据的类条件特征有辨别地对齐。此外,未知类别的类原型是在半监督学习框架中仅使用具有自信预测的未标记实例计算而得到的。我们进行了大量实验来评估所提出的PDFD。实证结果显示,PDFD相较于许多现有方法表现出显著的性能提升。
更新时间: 2024-04-17 23:10:11
领域: cs.LG,cs.AI,cs.CV
Framework-agnostic Semantically-aware Global Reasoning for Segmentation
Recent advances in pixel-level tasks (e.g. segmentation) illustrate the benefit of of long-range interactions between aggregated region-based representations that can enhance local features. However, such aggregated representations, often in the form of attention, fail to model the underlying semantics of the scene (e.g. individual objects and, by extension, their interactions). In this work, we address the issue by proposing a component that learns to project image features into latent representations and reason between them using a transformer encoder to generate contextualized and scene-consistent representations which are fused with original image features. Our design encourages the latent regions to represent semantic concepts by ensuring that the activated regions are spatially disjoint and the union of such regions corresponds to a connected object segment. The proposed semantic global reasoning (SGR) component is end-to-end trainable and can be easily added to a wide variety of backbones (CNN or transformer-based) and segmentation heads (per-pixel or mask classification) to consistently improve the segmentation results on different datasets. In addition, our latent tokens are semantically interpretable and diverse and provide a rich set of features that can be transferred to downstream tasks like object detection and segmentation, with improved performance. Furthermore, we also proposed metrics to quantify the semantics of latent tokens at both class \& instance level.
Updated: 2024-04-17 23:02:39
标题: 跨框架语义感知全局推理用于分割
摘要: 最近在像素级任务(例如分割)方面取得的进展表明,聚合区域表示之间的长距离交互可以增强局部特征。然而,这种聚合表示通常以注意力的形式呈现,无法对场景的基础语义(例如个体对象及其相互作用)进行建模。在这项工作中,我们通过提出一个组件来解决这个问题,该组件学习将图像特征投影到潜在表示中,并使用变换器编码器在它们之间推理,生成具有情境化和场景一致性的表示,这些表示与原始图像特征融合。我们的设计鼓励潜在区域表示语义概念,确保激活的区域在空间上是不重叠的,并且这些区域的并集对应于一个连接的对象段。所提出的语义全局推理(SGR)组件是端到端可训练的,可以轻松添加到各种主干(CNN或基于变换器)和分割头(每像素或蒙版分类)中,以持续改善不同数据集上的分割结果。此外,我们的潜在令牌具有语义可解释性和多样性,并提供丰富的特征集,可转移到对象检测和分割等下游任务,并提高性能。此外,我们还提出了度量标准来量化潜在令牌的语义,包括类别和实例级别。
更新时间: 2024-04-17 23:02:39
领域: cs.CV,cs.LG
Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study
This paper investigates the impact of domain-specific model fine-tuning and of reasoning mechanisms on the performance of question-answering (Q&A) systems powered by large language models (LLMs) and Retrieval-Augmented Generation (RAG). Using the FinanceBench SEC financial filings dataset, we observe that, for RAG, combining a fine-tuned embedding model with a fine-tuned LLM achieves better accuracy than generic models, with relatively greater gains attributable to fine-tuned embedding models. Additionally, employing reasoning iterations on top of RAG delivers an even bigger jump in performance, enabling the Q&A systems to get closer to human-expert quality. We discuss the implications of such findings, propose a structured technical design space capturing major technical components of Q&A AI, and provide recommendations for making high-impact technical choices for such components. We plan to follow up on this work with actionable guides for AI teams and further investigations into the impact of domain-specific augmentation in RAG and into agentic AI capabilities such as advanced planning and reasoning.
Updated: 2024-04-17 23:00:03
标题: 通过领域特定的微调和迭代推理增强问答:一项比较研究
摘要: 本文研究了领域特定模型微调和推理机制对由大型语言模型(LLMs)和检索增强生成(RAG)驱动的问答(Q&A)系统性能的影响。使用FinanceBench SEC财务文件数据集,我们观察到,对于RAG,将微调的嵌入模型与微调的LLM结合使用比通用模型具有更好的准确性,微调的嵌入模型带来的增益相对更大。此外,在RAG之上使用推理迭代可以进一步提高性能,使Q&A系统更接近人类专家质量。我们讨论了这些发现的影响,提出了一个结构化的技术设计空间,捕捉了Q&A AI的主要技术组件,并为这些组件的高影响技术选择提供了建议。我们计划在这项工作中提供AI团队可操作的指南,并进一步研究RAG中领域特定增强的影响以及智能AI能力,如高级规划和推理。
更新时间: 2024-04-17 23:00:03
领域: cs.AI
NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads
Machine Learning (ML) operators are the building blocks to design ML models with various target applications. GEneral Matrix Multiplication (GEMM) operators are the backbone of ML models. They are notorious for being computationally expensive requiring billions of multiply-and-accumulate. Therefore, significant effort has been put to study and optimize the GEMM operators in order to speed up the execution of ML models. GPUs and accelerators are widely deployed to accelerate ML workloads by optimizing the execution of GEMM operators. Nonetheless, the performance of NonGEMM operators have not been studied as thoroughly as GEMMs. Therefore, this paper describes \bench, a benchmark to study NonGEMM operators. We first construct \bench using popular ML workloads from different domains, then perform case studies on various grade GPU platforms to analyze the behavior of NonGEMM operators in GPU accelerated systems. Finally, we present some key takeaways to bridge the gap between GEMM and NonGEMM operators and to offer the community with potential new optimization directions.
Updated: 2024-04-17 22:44:22
标题: NonGEMM Bench:通过NonGEMM工作负载了解最新ML工作负载的性能范围
摘要: 机器学习(ML)操作符是设计具有各种目标应用的ML模型的基本组件。通用矩阵乘法(GEMM)操作符是ML模型的支柱。它们因需要进行数十亿次乘法和累加而以计算开销高昂而臭名昭著。因此,为了加快ML模型的执行速度,人们付出了大量努力来研究和优化GEMM操作符。GPU和加速器被广泛部署以通过优化GEMM操作符的执行来加速ML工作负载。然而,非GEMM操作符的性能并没有像GEMM那样受到深入研究。因此,本文描述了bench,一个用于研究非GEMM操作符的基准。我们首先使用来自不同领域的流行ML工作负载构建bench,然后在各种等级的GPU平台上进行案例研究,以分析GPU加速系统中非GEMM操作符的行为。最后,我们提出了一些关键要点,以弥合GEMM和非GEMM操作符之间的差距,并为社区提供潜在的新优化方向。
更新时间: 2024-04-17 22:44:22
领域: cs.AR,cs.LG,cs.PF
REQUAL-LM: Reliability and Equity through Aggregation in Large Language Models
The extensive scope of large language models (LLMs) across various domains underscores the critical importance of responsibility in their application, beyond natural language processing. In particular, the randomized nature of LLMs, coupled with inherent biases and historical stereotypes in data, raises critical concerns regarding reliability and equity. Addressing these challenges are necessary before using LLMs for applications with societal impact. Towards addressing this gap, we introduce REQUAL-LM, a novel method for finding reliable and equitable LLM outputs through aggregation. Specifically, we develop a Monte Carlo method based on repeated sampling to find a reliable output close to the mean of the underlying distribution of possible outputs. We formally define the terms such as reliability and bias, and design an equity-aware aggregation to minimize harmful bias while finding a highly reliable output. REQUAL-LM does not require specialized hardware, does not impose a significant computing load, and uses LLMs as a blackbox. This design choice enables seamless scalability alongside the rapid advancement of LLM technologies. Our system does not require retraining the LLMs, which makes it deployment ready and easy to adapt. Our comprehensive experiments using various tasks and datasets demonstrate that REQUAL- LM effectively mitigates bias and selects a more equitable response, specifically the outputs that properly represents minority groups.
Updated: 2024-04-17 22:12:41
标题: REQUAL-LM:大型语言模型中的可靠性和公平性通过聚合实现
摘要: 大型语言模型(LLMs)在各个领域的广泛应用强调了在其应用中责任的重要性,超越了自然语言处理。特别是,LLMs的随机性质,结合数据中固有的偏见和历史刻板印象,引发了关于可靠性和公平性的重要关切。在将LLMs用于具有社会影响的应用之前,必须解决这些挑战。为了解决这一问题,我们引入了REQUAL-LM,一种通过聚合找到可靠和公平LLM输出的新方法。具体而言,我们开发了一种基于重复抽样的蒙特卡洛方法,以找到一个可靠的输出,接近于可能输出分布的平均值。我们正式定义了可靠性和偏见等术语,并设计了一个关注公平性的聚合方法,以最小化有害偏见,同时找到一个高度可靠的输出。REQUAL-LM不需要专门的硬件,不会给计算负载带来显著影响,并将LLMs视为黑匣子。这种设计选择使其能够随着LLM技术的快速发展而无缝扩展。我们的系统不需要重新训练LLMs,这使其可以立即部署并易于调整。我们使用各种任务和数据集进行的全面实验表明,REQUAL-LM有效地缓解了偏见,并选择了更具公平性的响应,特别是正确代表少数群体的输出。
更新时间: 2024-04-17 22:12:41
领域: cs.CL,cs.AI,cs.CY,cs.LG
3D object quality prediction for Metal Jet Printer with Multimodal thermal encoder
With the advancements in 3D printing technologies, it is extremely important that the quality of 3D printed objects, and dimensional accuracies should meet the customer's specifications. Various factors during metal printing affect the printed parts' quality, including the power quality, the printing stage parameters, the print part's location inside the print bed, the curing stage parameters, and the metal sintering process. With the large data gathered from HP's MetJet printing process, AI techniques can be used to analyze, learn, and effectively infer the printed part quality metrics, as well as assist in improving the print yield. In-situ thermal sensing data captured by printer-installed thermal sensors contains the part thermal signature of fusing layers. Such part thermal signature contains a convoluted impact from various factors. In this paper, we use a multimodal thermal encoder network to fuse data of a different nature including the video data vectorized printer control data, and exact part thermal signatures with a trained encoder-decoder module. We explored the data fusing techniques and stages for data fusing, the optimized end-to-end model architecture indicates an improved part quality prediction accuracy.
Updated: 2024-04-17 21:57:29
标题: 使用多模式热编码器的金属喷墨打印机的3D物体质量预测
摘要: 随着3D打印技术的发展,确保3D打印物体的质量和尺寸精度符合客户的规格要求变得极为重要。金属打印过程中的各种因素会影响打印零件的质量,包括电源质量、打印阶段参数、零件在打印床内的位置、固化阶段参数和金属烧结过程。通过从HP的MetJet打印过程中收集的大量数据,可以利用人工智能技术分析、学习并有效推断打印零件的质量指标,同时帮助提高打印产量。由打印机安装的热传感器捕获的原位热感知数据包含融合层的零件热特征。这种零件热特征受到各种因素的复杂影响。本文使用多模式热编码器网络融合不同性质的数据,包括视频数据、向量化的打印机控制数据以及精确的零件热特征,通过经过训练的编码器-解码器模块。我们探讨了数据融合技术和融合阶段,优化的端到端模型架构表明改进了零件质量预测的准确性。
更新时间: 2024-04-17 21:57:29
领域: cs.LG,cs.CV
Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommendation Systems
Large Language Models (LLMs) have demonstrated great potential in Conversational Recommender Systems (CRS). However, the application of LLMs to CRS has exposed a notable discrepancy in behavior between LLM-based CRS and human recommenders: LLMs often appear inflexible and passive, frequently rushing to complete the recommendation task without sufficient inquiry.This behavior discrepancy can lead to decreased accuracy in recommendations and lower user satisfaction. Despite its importance, existing studies in CRS lack a study about how to measure such behavior discrepancy. To fill this gap, we propose Behavior Alignment, a new evaluation metric to measure how well the recommendation strategies made by a LLM-based CRS are consistent with human recommenders'. Our experiment results show that the new metric is better aligned with human preferences and can better differentiate how systems perform than existing evaluation metrics. As Behavior Alignment requires explicit and costly human annotations on the recommendation strategies, we also propose a classification-based method to implicitly measure the Behavior Alignment based on the responses. The evaluation results confirm the robustness of the method.
Updated: 2024-04-17 21:56:27
标题: 行为对齐:评估基于LLM的会话推荐系统的新视角
摘要: 大型语言模型(LLMs)在对话式推荐系统(CRS)中展现了巨大潜力。然而,LLMs在CRS中的应用揭示了LLM-based CRS和人类推荐者之间行为上的显著差异:LLMs经常表现得僵化 passively,并经常匆忙完成推荐任务而没有足够的查询。这种行为差异可能导致推荐准确性降低和用户满意度降低。尽管这一点很重要,现有的CRS研究缺乏关于如何衡量这种行为差异的研究。为了填补这一空白,我们提出了行为一致性(Behavior Alignment)作为一个新的评估指标,用于衡量LLM-based CRS制定的推荐策略与人类推荐者的一致性有多好。我们的实验结果表明,这一新指标更符合人类偏好,并可以更好地区分系统的表现比现有的评估指标。由于行为一致性需要对推荐策略进行明确且昂贵的人类注释,我们还提出了一种基于分类的方法,可以根据响应隐含地衡量行为一致性。评估结果证实了该方法的稳健性。
更新时间: 2024-04-17 21:56:27
领域: cs.IR,cs.AI
Event-Based Eye Tracking. AIS 2024 Challenge Survey
This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research.
Updated: 2024-04-17 21:53:01
标题: 基于事件的眼动追踪。 AIS 2024挑战调查
摘要: 这项调查回顾了AIS 2024基于事件的眼动追踪(EET)挑战赛。挑战的任务集中在处理使用事件相机记录的眼动,并预测眼睛的瞳孔中心。挑战强调使用事件相机进行高效眼动追踪,以实现良好的任务准确性和效率权衡。在挑战期间,有38位参与者注册了Kaggle比赛,并有8个团队提交了挑战factsheet。本调查回顾和分析了提交的factsheets中的新颖和多样化方法,以推动未来基于事件的眼动追踪研究的发展。
更新时间: 2024-04-17 21:53:01
领域: cs.CV,cs.AI
QGen: On the Ability to Generalize in Quantization Aware Training
Quantization lowers memory usage, computational requirements, and latency by utilizing fewer bits to represent model weights and activations. In this work, we investigate the generalization properties of quantized neural networks, a characteristic that has received little attention despite its implications on model performance. In particular, first, we develop a theoretical model for quantization in neural networks and demonstrate how quantization functions as a form of regularization. Second, motivated by recent work connecting the sharpness of the loss landscape and generalization, we derive an approximate bound for the generalization of quantized models conditioned on the amount of quantization noise. We then validate our hypothesis by experimenting with over 2000 models trained on CIFAR-10, CIFAR-100, and ImageNet datasets on convolutional and transformer-based models.
Updated: 2024-04-17 21:52:21
标题: QGen:关于在量化感知训练中泛化能力的研究
摘要: 量化通过利用更少的位来表示模型的权重和激活,降低了内存使用、计算需求和延迟。在这项工作中,我们研究了量化神经网络的泛化特性,这一特性尽管对模型性能有影响,但却鲜为人知。具体来说,首先,我们为神经网络中的量化开发了一个理论模型,并展示了量化如何作为一种正则化形式。其次,受到最近将损失景观的陡峭度与泛化联系起来的工作的启发,我们推导出了一个关于量化模型泛化的近似界限,条件是量化噪声的量。然后,我们通过对在CIFAR-10、CIFAR-100和ImageNet数据集上训练的超过2000个模型进行实验,验证了我们的假设,这些模型包括卷积和基于Transformer的模型。
更新时间: 2024-04-17 21:52:21
领域: cs.LG,cs.CV
Tensor-Networks-based Learning of Probabilistic Cellular Automata Dynamics
Algorithms developed to solve many-body quantum problems, like tensor networks, can turn into powerful quantum-inspired tools to tackle problems in the classical domain. In this work, we focus on matrix product operators, a prominent numerical technique to study many-body quantum systems, especially in one dimension. It has been previously shown that such a tool can be used for classification, learning of deterministic sequence-to-sequence processes and of generic quantum processes. We further develop a matrix product operator algorithm to learn probabilistic sequence-to-sequence processes and apply this algorithm to probabilistic cellular automata. This new approach can accurately learn probabilistic cellular automata processes in different conditions, even when the process is a probabilistic mixture of different chaotic rules. In addition, we find that the ability to learn these dynamics is a function of the bit-wise difference between the rules and whether one is much more likely than the other.
Updated: 2024-04-17 21:51:03
标题: 基于张量网络的概率细胞自动机动力学学习
摘要: 针对解决多体量子问题的算法,如张量网络,可以成为处理经典领域问题的强大量子启发工具。本研究侧重于矩阵乘积算子,这是一种用于研究多体量子系统的重要数值技术,特别是在一维情况下。先前的研究表明,这种工具可以用于分类、学习确定性序列到序列过程和通用量子过程。我们进一步发展了一个矩阵乘积算子算法来学习概率序列到序列过程,并将该算法应用于概率元胞自动机。这种新方法可以准确学习不同条件下的概率元胞自动机过程,即使该过程是不同混沌规则的概率混合物。此外,我们发现学习这些动态的能力取决于规则之间的逐位差异以及其中一个规则比另一个规则更可能发生的程度。
更新时间: 2024-04-17 21:51:03
领域: cond-mat.stat-mech,cs.LG,quant-ph
End-to-End Mesh Optimization of a Hybrid Deep Learning Black-Box PDE Solver
Deep learning has been widely applied to solve partial differential equations (PDEs) in computational fluid dynamics. Recent research proposed a PDE correction framework that leverages deep learning to correct the solution obtained by a PDE solver on a coarse mesh. However, end-to-end training of such a PDE correction model over both solver-dependent parameters such as mesh parameters and neural network parameters requires the PDE solver to support automatic differentiation through the iterative numerical process. Such a feature is not readily available in many existing solvers. In this study, we explore the feasibility of end-to-end training of a hybrid model with a black-box PDE solver and a deep learning model for fluid flow prediction. Specifically, we investigate a hybrid model that integrates a black-box PDE solver into a differentiable deep graph neural network. To train this model, we use a zeroth-order gradient estimator to differentiate the PDE solver via forward propagation. Although experiments show that the proposed approach based on zeroth-order gradient estimation underperforms the baseline that computes exact derivatives using automatic differentiation, our proposed method outperforms the baseline trained with a frozen input mesh to the solver. Moreover, with a simple warm-start on the neural network parameters, we show that models trained by these zeroth-order algorithms achieve an accelerated convergence and improved generalization performance.
Updated: 2024-04-17 21:49:45
标题: 混合深度学习黑盒PDE求解器的端到端网格优化
摘要: 深度学习已被广泛应用于解决计算流体动力学中的偏微分方程(PDEs)。最近的研究提出了一个PDE校正框架,利用深度学习来校正在粗网格上由PDE求解器获得的解。然而,对这样一个PDE校正模型进行端到端训练,需要考虑求解器相关参数(如网格参数和神经网络参数),这要求PDE求解器能够通过迭代数值过程支持自动微分。许多现有的求解器并没有这样的功能。在本研究中,我们探讨了使用黑盒PDE求解器和深度学习模型进行流体流动预测的混合模型端到端训练的可行性。具体来说,我们研究了一个将黑盒PDE求解器集成到可微分深度图神经网络中的混合模型。为了训练这个模型,我们使用了零阶梯度估计器通过正向传播区分PDE求解器。尽管实验表明,基于零阶梯度估计的提出方法表现不如使用自动微分计算精确导数的基准方法,但我们提出的方法胜过了通过固定输入网格对求解器进行训练的基准方法。此外,通过对神经网络参数进行简单的热启动,我们展示了通过这些零阶算法训练的模型实现了加速收敛和改善泛化性能。
更新时间: 2024-04-17 21:49:45
领域: cs.LG,cs.NA,math.NA,math.OC
The Code the World Depends On: A First Look at Technology Makers' Open Source Software Dependencies
Open-source software (OSS) supply chain security has become a topic of concern for organizations. Patching an OSS vulnerability can require updating other dependent software products in addition to the original package. However, the landscape of OSS dependencies is not well explored: we do not know what packages are most critical to patch, hindering efforts to improve OSS security where it is most needed. There is thus a need to understand OSS usage in major software and device makers' products. Our work takes a first step toward closing this knowledge gap. We investigate published OSS dependency information for 108 major software and device makers, cataloging how available and how detailed this information is and identifying the OSS packages that appear the most frequently in our data.
Updated: 2024-04-17 21:44:38
标题: 世界依赖的代码:技术制造商开源软件依赖的初步研究
摘要: 开源软件(OSS)供应链安全已成为组织关注的话题。修补OSS漏洞可能需要更新其他依赖软件产品,而不仅仅是原始软件包。然而,开源软件依赖关系的情况尚未得到充分探讨:我们不知道哪些软件包对于修补最为关键,这阻碍了改善OSS安全性的努力。因此,有必要了解主要软件和设备制造商产品中的OSS使用情况。我们的工作是迈出了解决这一知识鸿沟的第一步。我们调查了108家主要软件和设备制造商发布的OSS依赖信息,对此信息的可用性和详细程度进行了分类,并识别了在我们的数据中出现最频繁的OSS软件包。
更新时间: 2024-04-17 21:44:38
领域: cs.SE,cs.CR
Protected QR Code-based Anti-counterfeit System for Pharmaceutical Manufacturing
The pharmaceutical manufacturing faces critical challenges due to the global threat of counterfeit drugs. This paper proposes a new approach of protected QR codes to secure unique product information for safeguarding the pharmaceutical supply chain. The proposed solution integrates secure QR code generation and encrypted data transmission to establish a comprehensive anti-counterfeit ecosystem. The protected QR codes encapsulate product information that cannot be identified using traditional QR code scanners which protect the information against replication and tampering. The system is developed with scalability in mind, which can be easily implemented without introducing any additional modification in the traditional supply chain.
Updated: 2024-04-17 21:43:28
标题: 药品制造的受保护二维码反伪系统
摘要: 药品制造业面临全球假药威胁的重要挑战。本文提出了一种新的受保护QR码的方法,以确保独特产品信息,从而保障药品供应链安全。所提出的解决方案整合了安全的QR码生成和加密数据传输,建立了一个全面的防伪生态系统。受保护的QR码封装了产品信息,传统QR码扫描器无法识别,从而保护信息免受复制和篡改。该系统考虑了可扩展性,可以在传统供应链中轻松实施,而无需引入任何额外的修改。
更新时间: 2024-04-17 21:43:28
领域: cs.CR
Stochastic Optimal Control Matching
Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. That is, the control is learned via a least squares problem by trying to fit a matching vector field. The training loss, which is closely connected to the cross-entropy loss, is optimized with respect to both the control function and a family of reparameterization matrices which appear in the matching vector field. The optimization with respect to the reparameterization matrices aims at minimizing the variance of the matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for stochastic optimal control for three out of four control problems, in some cases by an order of magnitude. The key idea underlying SOCM is the path-wise reparameterization trick, a novel technique that may be of independent interest. Code at https://github.com/facebookresearch/SOC-matching
Updated: 2024-04-17 21:39:34
标题: 随机最优控制匹配
摘要: 随机最优控制旨在驱动嘈杂系统的行为,在科学、工程和人工智能中具有广泛的适用性。我们的工作引入了随机最优控制匹配(SOCM),这是一种新颖的迭代扩散优化(IDO)技术,用于随机最优控制,其源自于扩散模型的条件得分匹配损失的相同哲学。也就是说,通过尝试拟合匹配矢量场来学习控制,通过尝试拟合匹配矢量场来学习控制。训练损失与交叉熵损失密切相关,针对控制函数和出现在匹配矢量场中的一系列重新参数化矩阵进行优化。针对重新参数化矩阵的优化旨在最小化匹配矢量场的方差。在实验中,我们的算法在四个控制问题中有三个的误差比所有现有的IDO技术都要低,有时甚至达到一个数量级。SOCM的关键思想是路径重参数化技巧,这是一种可能具有独立兴趣的新技术。代码请参见https://github.com/facebookresearch/SOC-matching
更新时间: 2024-04-17 21:39:34
领域: math.OC,cs.LG,cs.NA,math.NA,math.PR,stat.ML
Predictive Model Development to Identify Failed Healing in Patients after Non-Union Fracture Surgery
Bone non-union is among the most severe complications associated with trauma surgery, occurring in 10-30% of cases after long bone fractures. Treating non-unions requires a high level of surgical expertise and often involves multiple revision surgeries, sometimes even leading to amputation. Thus, more accurate prognosis is crucial for patient well-being. Recent advances in machine learning (ML) hold promise for developing models to predict non-union healing, even when working with smaller datasets, a commonly encountered challenge in clinical domains. To demonstrate the effectiveness of ML in identifying candidates at risk of failed non-union healing, we applied three ML models (logistic regression, support vector machine, and XGBoost) to the clinical dataset TRUFFLE, which includes 797 patients with long bone non-union. The models provided prediction results with 70% sensitivity, and the specificities of 66% (XGBoost), 49% (support vector machine), and 43% (logistic regression). These findings offer valuable clinical insights because they enable early identification of patients at risk of failed non-union healing after the initial surgical revision treatment protocol.
Updated: 2024-04-17 21:36:33
标题: 预测模型开发,用于识别非愈合骨折手术后患者中的失败愈合
摘要: 骨不连是与外伤手术相关的最严重并发症之一,在长骨骨折后的病例中发生率为10-30%。治疗不连需要高水平的外科手术专业知识,通常需要多次修正手术,有时甚至会导致截肢。因此,更准确的预后对患者的健康非常重要。最近机器学习(ML)的进展为开发预测不连愈合模型提供了希望,即使在临床领域中经常遇到的小数据集的情况下也能实现。为了证明ML在识别患者不连愈合失败风险方面的有效性,我们将三种ML模型(逻辑回归、支持向量机和XGBoost)应用于包含797名长骨不连患者的临床数据集TRUFFLE。这些模型提供了70%的敏感性预测结果,以及66%(XGBoost)、49%(支持向量机)和43%(逻辑回归)的特异性。这些发现提供了宝贵的临床见解,因为它们能够在初始外科修正治疗方案后早期识别处于不连愈合失败风险的患者。
更新时间: 2024-04-17 21:36:33
领域: cs.LG,J.3; I.5.4
Improved Generalization Bounds for Communication Efficient Federated Learning
This paper focuses on reducing the communication cost of federated learning by exploring generalization bounds and representation learning. We first characterize a tighter generalization bound for one-round federated learning based on local clients' generalizations and heterogeneity of data distribution (non-iid scenario). We also characterize a generalization bound in R-round federated learning and its relation to the number of local updates (local stochastic gradient descents (SGDs)). Then, based on our generalization bound analysis and our representation learning interpretation of this analysis, we show for the first time that less frequent aggregations, hence more local updates, for the representation extractor (usually corresponds to initial layers) leads to the creation of more generalizable models, particularly for non-iid scenarios. We design a novel Federated Learning with Adaptive Local Steps (FedALS) algorithm based on our generalization bound and representation learning analysis. FedALS employs varying aggregation frequencies for different parts of the model, so reduces the communication cost. The paper is followed with experimental results showing the effectiveness of FedALS.
Updated: 2024-04-17 21:17:48
标题: 通信高效联邦学习的改进泛化界限
摘要: 本文侧重于通过探索泛化界限和表示学习来减少联邦学习的通信成本。我们首先基于本地客户端的泛化和数据分布的异质性(非iid场景),为一轮联邦学习特征化了更紧密的泛化界限。我们还特征化了R轮联邦学习中的泛化界限,以及它与本地更新次数(本地随机梯度下降(SGD))的关系。然后,基于我们的泛化界限分析和我们对此分析的表示学习解释,我们首次展示了对于表示提取器(通常对应于初始层),较少频繁的聚合,因此更多本地更新,将导致创建更具泛化能力的模型,特别是对于非iid场景。我们设计了一种基于我们的泛化界限和表示学习分析的新颖的自适应本地步骤联邦学习(FedALS)算法。FedALS对模型的不同部分使用不同的聚合频率,从而减少通信成本。接下来的实验结果展示了FedALS的有效性。
更新时间: 2024-04-17 21:17:48
领域: cs.LG,cs.AI
Boomerang: Local sampling on image manifolds using diffusion models
The inference stage of diffusion models can be seen as running a reverse-time diffusion stochastic differential equation, where samples from a Gaussian latent distribution are transformed into samples from a target distribution that usually reside on a low-dimensional manifold, e.g., an image manifold. The intermediate values between the initial latent space and the image manifold can be interpreted as noisy images, with the amount of noise determined by the forward diffusion process noise schedule. We utilize this interpretation to present Boomerang, an approach for local sampling of image manifolds. As implied by its name, Boomerang local sampling involves adding noise to an input image, moving it closer to the latent space, and then mapping it back to the image manifold through a partial reverse diffusion process. Thus, Boomerang generates images on the manifold that are ``similar,'' but nonidentical, to the original input image. We can control the proximity of the generated images to the original by adjusting the amount of noise added. Furthermore, due to the stochastic nature of the reverse diffusion process in Boomerang, the generated images display a certain degree of stochasticity, allowing us to obtain local samples from the manifold without encountering any duplicates. Boomerang offers the flexibility to work seamlessly with any pretrained diffusion model, such as Stable Diffusion, without necessitating any adjustments to the reverse diffusion process. We present three applications for Boomerang. First, we provide a framework for constructing privacy-preserving datasets having controllable degrees of anonymity. Second, we show that using Boomerang for data augmentation increases generalization performance and outperforms state-of-the-art synthetic data augmentation. Lastly, we introduce a perceptual image enhancement framework, which enables resolution enhancement.
Updated: 2024-04-17 21:16:56
标题: 回飞镖:使用扩散模型在图像流形上进行本地采样
摘要: 扩散模型的推断阶段可以被看作是运行一个逆时间扩散随机微分方程,其中从一个高斯潜在分布中采样被转换为通常存在于低维流形上的目标分布中的样本,例如图像流形。初始潜在空间和图像流形之间的中间值可以被解释为带有噪声的图像,噪声量由正向扩散过程的噪声时间表决定。我们利用这一解释提出了Boomerang,一种用于局部采样图像流形的方法。正如其名称所暗示的那样,Boomerang局部采样涉及向输入图像添加噪声,使其更接近潜在空间,然后通过部分逆向扩散过程将其映射回图像流形。因此,Boomerang生成在流形上的图像与原始输入图像“相似”,但非完全相同。我们可以通过调整添加的噪声量来控制生成图像与原始图像的接近程度。此外,由于Boomerang中逆扩散过程的随机性,生成的图像显示出一定程度的随机性,使我们能够从流形中获取局部样本而不会遇到重复。Boomerang提供了与任何预训练扩散模型(如稳定扩散)无缝配合的灵活性,而不需要对逆向扩散过程进行任何调整。我们提出了Boomerang的三个应用。首先,我们提供了一个用于构建具有可控匿名程度的隐私保护数据集的框架。其次,我们展示使用Boomerang进行数据增强可以提高泛化性能,并优于最先进的合成数据增强方法。最后,我们介绍了一个感知图像增强框架,实现了分辨率增强。
更新时间: 2024-04-17 21:16:56
领域: cs.CV,cs.LG,stat.ML
Virtual Foundry Graphnet for Metal Sintering Deformation Prediction
Metal Sintering is a necessary step for Metal Injection Molded parts and binder jet such as HP's metal 3D printer. The metal sintering process introduces large deformation varying from 25 to 50% depending on the green part porosity. In this paper, we use a graph-based deep learning approach to predict the part deformation, which can speed up the deformation simulation substantially at the voxel level. Running a well-trained Metal Sintering inferencing engine only takes a range of seconds to obtain the final sintering deformation value. The tested accuracy on example complex geometry achieves 0.7um mean deviation for a 63mm testing part.
Updated: 2024-04-17 21:11:12
标题: 虚拟铸造图网用于金属烧结变形预测
摘要: 金属烧结是金属注射成型零件和喷射剂(如HP的金属3D打印机)的必要步骤。金属烧结过程引入了大变形,取决于绿色零件的多孔性,变化范围为25%到50%。本文使用基于图的深度学习方法来预测零件的变形,可以在体素级别显著加快变形模拟速度。运行经过良好训练的金属烧结推理引擎仅需几秒钟即可获得最终的烧结变形值。对于一个63mm的测试零件,复杂几何形状的测试精度达到0.7um的平均偏差。
更新时间: 2024-04-17 21:11:12
领域: cs.LG
Incremental Bootstrapping and Classification of Structured Scenes in a Fuzzy Ontology
We foresee robots that bootstrap knowledge representations and use them for classifying relevant situations and making decisions based on future observations. Particularly for assistive robots, the bootstrapping mechanism might be supervised by humans who should not repeat a training phase several times and should be able to refine the taught representation. We consider robots that bootstrap structured representations to classify some intelligible categories. Such a structure should be incrementally bootstrapped, i.e., without invalidating the identified category models when a new additional category is considered. To tackle this scenario, we presented the Scene Identification and Tagging (SIT) algorithm, which bootstraps structured knowledge representation in a crisp OWL-DL ontology. Over time, SIT bootstraps a graph representing scenes, sub-scenes and similar scenes. Then, SIT can classify new scenes within the bootstrapped graph through logic-based reasoning. However, SIT has issues with sensory data because its crisp implementation is not robust to perception noises. This paper presents a reformulation of SIT within the fuzzy domain, which exploits a fuzzy DL ontology to overcome the robustness issues. By comparing the performances of fuzzy and crisp implementations of SIT, we show that fuzzy SIT is robust, preserves the properties of its crisp formulation, and enhances the bootstrapped representations. On the contrary, the fuzzy implementation of SIT leads to less intelligible knowledge representations than the one bootstrapped in the crisp domain.
Updated: 2024-04-17 20:51:13
标题: 增量引导和模糊本体中结构场景的分类
摘要: 我们预见到机器人将自我启动知识表示,并利用它们来对相关情况进行分类,并基于未来观察做出决策。特别是对于辅助机器人,启动机制可能由人类监督,他们不应该多次重复训练阶段,并且应该能够完善所教授的表示。我们考虑机器人将启动结构化表示以对某些可理解的类别进行分类。这种结构应该是逐步启动的,即,在考虑新的附加类别时不应使已识别的类别模型失效。为了解决这种情况,我们提出了场景识别和标记(SIT)算法,它在清晰的OWL-DL本体中启动结构化知识表示。随着时间的推移,SIT通过逻辑推理在图中启动代表场景、子场景和相似场景。然后,SIT可以通过逻辑推理对启动的图中的新场景进行分类。然而,由于其清晰实现对感知噪声不稳健,SIT存在问题。本文提出了在模糊领域内重新制定SIT的方法,利用模糊DL本体来克服鲁棒性问题。通过比较SIT模糊和清晰实现的性能,我们表明模糊SIT具有鲁棒性,保留了其清晰公式的特性,并增强了启动表示。相反,SIT的模糊实现导致的知识表示不如在清晰领域中启动的知识表示那样可理解。
更新时间: 2024-04-17 20:51:13
领域: cs.AI,cs.HC,cs.LO,cs.RO,68T40 (Primary) 68T30, 68T27, 68T37, 03B52 (Secondary),I.2.4; I.2.6; I.2.3; I.2.9; I.2.10
Meta-Decomposition: Dynamic Segmentation Approach Selection in IoT-based Activity Recognition
Internet of Things (IoT) devices generate heterogeneous data over time; and relying solely on individual data points is inadequate for accurate analysis. Segmentation is a common preprocessing step in many IoT applications, including IoT-based activity recognition, aiming to address the limitations of individual events and streamline the process. However, this step introduces at least two families of uncontrollable biases. The first is caused by the changes made by the segmentation process on the initial problem space, such as dividing the input data into 60 seconds windows. The second category of biases results from the segmentation process itself, including the fixation of the segmentation method and its parameters. To address these biases, we propose to redefine the segmentation problem as a special case of a decomposition problem, including three key components: a decomposer, resolutions, and a composer. The inclusion of the composer task in the segmentation process facilitates an assessment of the relationship between the original problem and the problem after the segmentation. Therefore, It leads to an improvement in the evaluation process and, consequently, in the selection of the appropriate segmentation method. Then, we formally introduce our novel meta-decomposition or learning-to-decompose approach. It reduces the segmentation biases by considering the segmentation as a hyperparameter to be optimized by the outer learning problem. Therefore, meta-decomposition improves the overall system performance by dynamically selecting the appropriate segmentation method without including the mentioned biases. Extensive experiments on four real-world datasets demonstrate the effectiveness of our proposal.
Updated: 2024-04-17 20:50:28
标题: Meta-Decomposition:基于物联网的活动识别中的动态分割方法选择
摘要: 物联网(IoT)设备随着时间生成多种异构数据;仅依靠单个数据点进行准确分析是不够的。 分割是许多IoT应用中的常见预处理步骤,包括基于IoT的活动识别,旨在解决单个事件的限制并简化过程。然而,这一步骤引入了至少两类无法控制的偏差。第一种是由分割过程对初始问题空间进行的更改引起的,例如将输入数据分成60秒窗口。第二类偏差是由分割过程本身引起的,包括分割方法及其参数的固定。 为了解决这些偏差,我们提出将分割问题重新定义为分解问题的一个特殊情况,包括三个关键组件:分解器、分辨率和合成器。 在分割过程中包含合成器任务有助于评估原始问题与分割后问题之间的关系。因此,它导致评估过程的改进,进而选择适当的分割方法。 然后,我们正式介绍我们的新颖元分解或学习分解方法。通过将分割视为外部学习问题优化的超参数,它减少了分割偏差。因此,元分解通过动态选择适当的分割方法来提高整个系统性能,而不包括上述偏差。对四个真实数据集进行的大量实验证明了我们提案的有效性。
更新时间: 2024-04-17 20:50:28
领域: cs.AI
Tight bounds on Pauli channel learning without entanglement
Quantum entanglement is a crucial resource for learning properties from nature, but a precise characterization of its advantage can be challenging. In this work, we consider learning algorithms without entanglement to be those that only utilize states, measurements, and operations that are separable between the main system of interest and an ancillary system. Interestingly, we show that these algorithms are equivalent to those that apply quantum circuits on the main system interleaved with mid-circuit measurements and classical feedforward. Within this setting, we prove a tight lower bound for Pauli channel learning without entanglement that closes the gap between the best-known upper and lower bound. In particular, we show that $\Theta(2^n\varepsilon^{-2})$ rounds of measurements are required to estimate each eigenvalue of an $n$-qubit Pauli channel to $\varepsilon$ error with high probability when learning without entanglement. In contrast, a learning algorithm with entanglement only needs $\Theta(\varepsilon^{-2})$ copies of the Pauli channel. The tight lower bound strengthens the foundation for an experimental demonstration of entanglement-enhanced advantages for Pauli noise characterization.
Updated: 2024-04-17 20:41:34
标题: 在没有纠缠的情况下学习Pauli通道的紧密界限
摘要: 量子纠缠是从自然中学习性质的关键资源,但其优势的精确特征化可能具有挑战性。在这项工作中,我们认为没有纠缠的学习算法是那些仅利用状态、测量和操作的算法,这些状态、测量和操作在主系统和辅助系统之间是可分离的。有趣的是,我们展示了这些算法等价于在主系统上应用量子电路,其中穿插中间电路测量和经典前馈。在这种设置下,我们证明了没有纠缠的Pauli通道学习的紧密下界,它缩小了最佳已知上界和下界之间的差距。特别地,当没有纠缠时,我们展示了每个n比特Pauli通道的特征值估计需要$\Theta(2^n\varepsilon^{-2})$轮测量,以高概率达到$\varepsilon$误差。相比之下,具有纠缠的学习算法只需要$\Theta(\varepsilon^{-2})$个Pauli通道的副本。这个紧密的下界加强了为Pauli噪声特性化实验演示增强纠缠优势的基础。
更新时间: 2024-04-17 20:41:34
领域: quant-ph,cs.IT,cs.LG,math.IT
Learning with 3D rotations, a hitchhiker's guide to SO(3)
Many settings in machine learning require the selection of a rotation representation. However, choosing a suitable representation from the many available options is challenging. This paper acts as a survey and guide through rotation representations. We walk through their properties that harm or benefit deep learning with gradient-based optimization. By consolidating insights from rotation-based learning, we provide a comprehensive overview of learning functions with rotation representations. We provide guidance on selecting representations based on whether rotations are in the model's input or output and whether the data primarily comprises small angles.
Updated: 2024-04-17 20:37:29
标题: 学习3D旋转,SO(3)群的搭车者指南
摘要: 机器学习中的许多设置需要选择一个旋转表示。然而,从众多可用选项中选择合适的表示是具有挑战性的。本文作为对旋转表示的调查和指南。我们通过它们对深度学习以梯度为基础的优化带来的利弊属性。通过整合基于旋转的学习的见解,我们提供了学习旋转表示的学习函数的综合概述。我们根据旋转是否在模型的输入或输出中以及数据是否主要包含小角度来提供选择表示的指导。
更新时间: 2024-04-17 20:37:29
领域: cs.LG,cs.CV,cs.RO
Missed Connections: Lateral Thinking Puzzles for Large Language Models
The Connections puzzle published each day by the New York Times tasks players with dividing a bank of sixteen words into four groups of four words that each relate to a common theme. Solving the puzzle requires both common linguistic knowledge (i.e. definitions and typical usage) as well as, in many cases, lateral or abstract thinking. This is because the four categories ascend in complexity, with the most challenging category often requiring thinking about words in uncommon ways or as parts of larger phrases. We investigate the capacity for automated AI systems to play Connections and explore the game's potential as an automated benchmark for abstract reasoning and a way to measure the semantic information encoded by data-driven linguistic systems. In particular, we study both a sentence-embedding baseline and modern large language models (LLMs). We report their accuracy on the task, measure the impacts of chain-of-thought prompting, and discuss their failure modes. Overall, we find that the Connections task is challenging yet feasible, and a strong test-bed for future work.
Updated: 2024-04-17 20:31:05
标题: 错失的联系:大型语言模型的横向思维谜题
摘要: 每天由纽约时报发布的Connections拼图要求玩家将16个单词分成四组,每组四个单词与一个共同主题相关。解决这个拼图需要常识性的语言知识(即定义和典型用法),以及在许多情况下需要侧重思维或抽象思维。这是因为四个类别的复杂性不断上升,最具挑战性的类别通常需要考虑单词的非常规用法或作为更大短语的一部分。我们研究了自动化AI系统玩Connections的能力,并探讨了该游戏作为抽象推理的自动化基准和衡量数据驱动语言系统编码的语义信息的潜力。具体来说,我们研究了句子嵌入基线和现代大型语言模型(LLMs)。我们报告了它们在任务上的准确性,衡量了思维链提示的影响,并讨论了它们的失败模式。总体而言,我们发现Connections任务具有挑战性但可行,并且是未来工作的强大测试平台。
更新时间: 2024-04-17 20:31:05
领域: cs.CL,cs.AI
A Hybrid ANN-SNN Architecture for Low-Power and Low-Latency Visual Perception
Spiking Neural Networks (SNN) are a class of bio-inspired neural networks that promise to bring low-power and low-latency inference to edge devices through asynchronous and sparse processing. However, being temporal models, SNNs depend heavily on expressive states to generate predictions on par with classical artificial neural networks (ANNs). These states converge only after long transient periods, and quickly decay without input data, leading to higher latency, power consumption, and lower accuracy. This work addresses this issue by initializing the state with an auxiliary ANN running at a low rate. The SNN then uses the state to generate predictions with high temporal resolution until the next initialization phase. Our hybrid ANN-SNN model thus combines the best of both worlds: It does not suffer from long state transients and state decay thanks to the ANN, and can generate predictions with high temporal resolution, low latency, and low power thanks to the SNN. We show for the task of event-based 2D and 3D human pose estimation that our method consumes 88% less power with only a 4% decrease in performance compared to its fully ANN counterparts when run at the same inference rate. Moreover, when compared to SNNs, our method achieves a 74% lower error. This research thus provides a new understanding of how ANNs and SNNs can be used to maximize their respective benefits.
Updated: 2024-04-17 20:18:45
标题: 一个用于低功耗和低延迟视觉感知的混合ANN-SNN体系结构
摘要: 脉冲神经网络(SNN)是一类生物启发的神经网络,承诺通过异步和稀疏处理为边缘设备带来低功耗和低延迟的推断。然而,作为时间模型,SNN对于生成与经典人工神经网络(ANNs)相当的预测依赖于表达状态。这些状态只有在长时间瞬态期之后才会收敛,并在没有输入数据时迅速衰减,导致更高的延迟、功耗和更低的准确性。本研究通过在低速率下运行辅助ANN来初始化状态来解决这个问题。然后,SNN使用状态以高时间分辨率生成预测,直到下一次初始化阶段。因此,我们的混合ANN-SNN模型结合了两者的优点:由于ANN,它不会遭受长时间状态瞬态和状态衰减的影响,并且由于SNN,它可以生成具有高时间分辨率、低延迟和低功耗的预测。我们展示了针对基于事件的2D和3D人体姿势估计任务,与完全ANN对应物相比,在相同推断速率下,我们的方法功耗减少了88%,性能仅降低了4%。此外,与SNN相比,我们的方法实现了74%的较低误差。因此,这项研究提供了如何最大化ANNs和SNNs各自优势的新认识。
更新时间: 2024-04-17 20:18:45
领域: cs.CV,cs.AI
GEOBIND: Binding Text, Image, and Audio through Satellite Images
In remote sensing, we are interested in modeling various modalities for some geographic location. Several works have focused on learning the relationship between a location and type of landscape, habitability, audio, textual descriptions, etc. Recently, a common way to approach these problems is to train a deep-learning model that uses satellite images to infer some unique characteristics of the location. In this work, we present a deep-learning model, GeoBind, that can infer about multiple modalities, specifically text, image, and audio, from satellite imagery of a location. To do this, we use satellite images as the binding element and contrastively align all other modalities to the satellite image data. Our training results in a joint embedding space with multiple types of data: satellite image, ground-level image, audio, and text. Furthermore, our approach does not require a single complex dataset that contains all the modalities mentioned above. Rather it only requires multiple satellite-image paired data. While we only align three modalities in this paper, we present a general framework that can be used to create an embedding space with any number of modalities by using satellite images as the binding element. Our results show that, unlike traditional unimodal models, GeoBind is versatile and can reason about multiple modalities for a given satellite image input.
Updated: 2024-04-17 20:13:37
标题: GEOBIND:通过卫星图像将文本、图像和音频绑定在一起
摘要: 在遥感领域,我们对于对一些地理位置的各种模态进行建模感兴趣。一些研究专注于学习地理位置与景观类型、适居性、音频、文本描述等之间的关系。最近,解决这些问题的常见方法是训练一个使用卫星图像推断地理位置某些独特特征的深度学习模型。在这项工作中,我们提出了一个名为GeoBind的深度学习模型,可以从某一地理位置的卫星图像中推断出多种模态,特别是文本、图像和音频。为了实现这一点,我们使用卫星图像作为绑定元素,并将所有其他模态对齐到卫星图像数据。我们的训练结果是一个包含多种数据类型的联合嵌入空间:卫星图像、地面图像、音频和文本。此外,我们的方法不需要包含所有上述模态的单个复杂数据集。相反,它只需要多个卫星图像配对数据。虽然本文只对齐了三种模态,但我们提出了一个通用框架,可以通过使用卫星图像作为绑定元素来创建包含任意数量模态的嵌入空间。我们的结果表明,与传统的单模态模型不同,GeoBind具有多功能性,并能推断给定卫星图像输入的多种模态。
更新时间: 2024-04-17 20:13:37
领域: cs.AI
A Survey on Semantic Modeling for Building Energy Management
Buildings account for a substantial portion of global energy consumption. Reducing buildings' energy usage primarily involves obtaining data from building systems and environment, which are instrumental in assessing and optimizing the building's performance. However, as devices from various manufacturers represent their data in unique ways, this disparity introduces challenges for semantic interoperability and creates obstacles in developing scalable building applications. This survey explores the leading semantic modeling techniques deployed for energy management in buildings. Furthermore, it aims to offer tangible use cases for applying semantic models, shedding light on the pivotal concepts and limitations intrinsic to each model. Our findings will assist researchers in discerning the appropriate circumstances and methodologies for employing these models in various use cases.
Updated: 2024-04-17 20:10:43
标题: 一份关于建筑能源管理语义建模的调查
摘要: 建筑物占据全球能源消耗的相当大部分。减少建筑物能源消耗主要涉及从建筑系统和环境获取数据,这些数据对评估和优化建筑性能至关重要。然而,由于来自各种制造商的设备以独特的方式表示其数据,这种差异引入了语义互操作性方面的挑战,并在开发可扩展的建筑应用程序方面产生障碍。本调查探讨了在建筑能源管理中部署的主要语义建模技术。此外,它旨在提供应用语义模型的切实用例,阐明每个模型固有的关键概念和限制。我们的发现将帮助研究人员辨别在不同用例中使用这些模型的适当情况和方法。
更新时间: 2024-04-17 20:10:43
领域: cs.AI,I.2.4; C.m; H.m
Blume-Capel model analysis with microcanonical population annealing method
We present a modification of the Rose-Machta algorithm (Phys. Rev. E 100 (2019) 063304) and estimate the density of states for a two-dimensional Blume-Capel model, simulating $10^5$ replicas in parallel for each set of parameters. We perform a finite-size analysis of the specific heat and Binder cumulant, determine the critical temperature along the critical line, and evaluate the critical exponents. The results obtained are in good agreement with those obtained previously using various methods -- Markov Chain Monte Carlo simulation, Wang-Landau simulation, transfer matrix, and series expansion. The simulation results clearly illustrate the typical behavior of specific heat along the critical lines and through the tricritical point.
Updated: 2024-04-17 20:06:35
标题: 使用微正则族退火方法对Blume-Capel模型进行分析
摘要: 我们提出了Rose-Machta算法的修改(Phys。Rev。E 100(2019)063304),并估计了二维Blume-Capel模型的态密度,对每组参数并行模拟$10^5$个复制品。我们对比特定热和Binder累积进行有限尺寸分析,确定了沿临界线的临界温度,并评估了临界指数。所得结果与先前使用各种方法获得的结果一致--马尔可夫链蒙特卡洛模拟、王-兰道模拟、传输矩阵和级数展开。模拟结果清楚地说明了特定热在临界线上以及通过三重临界点的典型行为。
更新时间: 2024-04-17 20:06:35
领域: cond-mat.stat-mech,cs.AI
Sphere Neural-Networks for Rational Reasoning
The success of Large Language Models (LLMs), e.g., ChatGPT, is witnessed by their planetary popularity, their capability of human-like question-answering, and also by their steadily improved reasoning performance. However, it remains unclear whether LLMs reason. It is an open problem how traditional neural networks can be qualitatively extended to go beyond the statistic paradigm and achieve high-level cognition. Here, we present a minimalist qualitative extension by generalising computational building blocks from vectors to spheres. We propose Sphere Neural Networks (SphNNs) for human-like reasoning through model construction and inspection, and develop SphNN for syllogistic reasoning, a microcosm of human rationality. Instead of training data, SphNN uses a neuro-symbolic transition map of neighbourhood spatial relations to guide transformations from the current sphere configuration towards the target. SphNN is the first neural model that can determine the validity of long-chained syllogistic reasoning in one epoch by constructing sphere configurations as Euler diagrams, with the worst computational complexity of O(N^2). SphNN can evolve into various types of reasoning, such as spatio-temporal reasoning, logical reasoning with negation and disjunction, event reasoning, neuro-symbolic reasoning, and humour understanding (the highest level of cognition). All these suggest a new kind of Herbert A. Simon's scissors with two neural blades. SphNNs will tremendously enhance interdisciplinary collaborations to develop the two neural blades and realise deterministic neural reasoning and human-bounded rationality and elevate LLMs to reliable psychological AI. This work suggests that the non-zero radii of spheres are the missing components that prevent traditional deep-learning systems from reaching the realm of rational reasoning and cause LLMs to be trapped in the swamp of hallucination.
Updated: 2024-04-17 20:02:20
标题: 球形神经网络用于理性推理
摘要: 大语言模型(LLM)的成功,如ChatGPT等,体现在它们的全球流行、类人问题回答能力以及其不断改善的推理性能。然而,LLM是否进行推理仍不清楚。传统神经网络如何在质量上扩展以超越统计范例并实现高级认知是一个开放性问题。在这里,我们通过将计算构建块从向量泛化到球体,提出了一种极简的定性扩展。我们提出了用于类人推理的球形神经网络(SphNN),通过模型构建和检查,开发了用于三段论推理的SphNN,这是人类理性的微观模型。SphNN使用邻域空间关系的神经符号转换映射而不是训练数据来引导从当前球体配置向目标的转换。SphNN是第一个可以通过构建球体配置作为欧拉图表在一个时代内确定长链三段论推理有效性的神经模型,最坏的计算复杂度为O(N^2)。SphNN可以演化成各种类型的推理,如时空推理、带否定和析取的逻辑推理、事件推理、神经符号推理和幽默理解(最高级认知)。所有这些表明,SphNN将极大地促进跨学科合作,以发展两个神经刀片并实现确定性神经推理和人类有限理性,并将LLM提升为可靠的心理人工智能。这项工作表明,球体的非零半径是阻止传统深度学习系统达到理性推理领域并导致LLM陷入幻觉泥潭的缺失组件。
更新时间: 2024-04-17 20:02:20
领域: cs.AI
Implementation and Evaluation of a Gradient Descent-Trained Defensible Blackboard Architecture System
A variety of forms of artificial intelligence systems have been developed. Two well-known techniques are neural networks and rule-fact expert systems. The former can be trained from presented data while the latter is typically developed by human domain experts. A combined implementation that uses gradient descent to train a rule-fact expert system has been previously proposed. A related system type, the Blackboard Architecture, adds an actualization capability to expert systems. This paper proposes and evaluates the incorporation of a defensible-style gradient descent training capability into the Blackboard Architecture. It also introduces the use of activation functions for defensible artificial intelligence systems and implements and evaluates a new best path-based training algorithm.
Updated: 2024-04-17 19:55:58
标题: 梯度下降训练的可防御黑板架构系统的实施和评估
摘要: 人工智能系统有多种形式。两种著名的技术是神经网络和规则事实专家系统。前者可以从提供的数据中进行训练,而后者通常由领域专家开发。之前提出了一种使用梯度下降训练规则事实专家系统的组合实现。相关的系统类型,即黑板架构,为专家系统增加了一个实现能力。本文提出并评估了将可辩护风格的梯度下降训练能力纳入黑板架构的方法。它还介绍了激活函数在可辩护人工智能系统中的使用,并实施并评估了一种新的基于最佳路径的训练算法。
更新时间: 2024-04-17 19:55:58
领域: cs.AI
GLaM: Fine-Tuning Large Language Models for Domain Knowledge Graph Alignment via Neighborhood Partitioning and Generative Subgraph Encoding
Integrating large language models (LLMs) with knowledge graphs derived from domain-specific data represents an important advancement towards more powerful and factual reasoning. As these models grow more capable, it is crucial to enable them to perform multi-step inferences over real-world knowledge graphs while minimizing hallucination. While large language models excel at conversation and text generation, their ability to reason over domain-specialized graphs of interconnected entities remains limited. For example, can we query a LLM to identify the optimal contact in a professional network for a specific goal, based on relationships and attributes in a private database? The answer is no--such capabilities lie beyond current methods. However, this question underscores a critical technical gap that must be addressed. Many high-value applications in areas such as science, security, and e-commerce rely on proprietary knowledge graphs encoding unique structures, relationships, and logical constraints. We introduce a fine-tuning framework for developing Graph-aligned LAnguage Models (GLaM) that transforms a knowledge graph into an alternate text representation with labeled question-answer pairs. We demonstrate that grounding the models in specific graph-based knowledge expands the models' capacity for structure-based reasoning. Our methodology leverages the large-language model's generative capabilities to create the dataset and proposes an efficient alternate to retrieval-augmented generation styled methods.
Updated: 2024-04-17 19:55:37
标题: GLaM:通过邻域分区和生成子图编码对领域知识图对齐进行大型语言模型的微调
摘要: 将大型语言模型(LLMs)与从特定领域数据衍生的知识图集成,代表了朝着更强大和基于事实的推理的重要进步。随着这些模型变得更加强大,至关重要的是使它们能够在现实世界知识图上执行多步推理,同时最小化幻觉。虽然大型语言模型擅长对话和文本生成,但其在推理领域专业图中的相互连接实体方面的能力仍然有限。例如,我们是否可以查询LLM以基于私有数据库中的关系和属性,识别专业网络中的最佳联系人以实现特定目标?答案是否定的-这些功能超出了当前方法的范围。然而,这个问题突显了必须解决的关键技术差距。许多价值高的应用程序,如科学、安全和电子商务,依赖于编码独特结构、关系和逻辑约束的专有知识图。我们引入了一个用于开发Graph-aligned Language Models(GLaM)的微调框架,将知识图转换为带有标记的问题-答案对的替代文本表示。我们证明,将模型基于特定基于图的知识使其扩展了模型的结构推理能力。我们的方法利用大型语言模型的生成能力创建数据集,并提出了一种高效的检索增强生成风格方法的替代方案。
更新时间: 2024-04-17 19:55:37
领域: cs.AI
Improving Privacy-Preserving Techniques for Smart Grid using Lattice-based Cryptography
Advancements in communication and information tech birthed the Smart Grid, optimizing energy and data transmission. Yet, user privacy is at risk due to frequent data collection. Existing privacy schemes face vulnerability with quantum machines. To tackle this, the LPM2DA scheme is introduced, utilizing lattice-based encryption and signatures for secure data aggregation. It ensures privacy, integrity, and authentication, enabling statistical analysis while preserving user privacy. Traditional aggregation schemes suffer from weak network models and centralization issues. Enter SPDBlock, a blockchain-based solution ensuring privacy, integrity, and resistance to attacks. It detects and prosecutes malicious entities while efficiently handling multi-dimensional data transmission. Through distributed decryption and secret sharing, only valid data can be decrypted with minimal involvement from smart meters. Performance tests reveal SPDBlock's superiority in communication and computational efficiency over traditional schemes.
Updated: 2024-04-17 19:51:52
标题: 使用基于格的密码学改进智能电网的隐私保护技术
摘要: 通信和信息技术的进步催生了智能电网,优化了能源和数据传输。然而,由于频繁的数据收集,用户隐私面临风险。现有的隐私方案在量子机器方面存在脆弱性。为了解决这个问题,引入了LPM2DA方案,利用基于格的加密和签名进行安全数据聚合。它确保隐私、完整性和身份验证,同时保护用户隐私,使统计分析成为可能。传统的聚合方案存在网络模型薄弱和集中化问题。SPDBlock是一个基于区块链的解决方案,确保隐私、完整性和抵抗攻击。它可以检测和起诉恶意实体,并有效处理多维数据传输。通过分布式解密和秘密共享,只有有效数据才能以最少的智能电表参与进行解密。性能测试显示,与传统方案相比,SPDBlock在通信和计算效率方面具有优势。
更新时间: 2024-04-17 19:51:52
领域: cs.CR
Read Between the Layers: Leveraging Intra-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models
We address the Continual Learning (CL) problem, wherein a model must learn a sequence of tasks from non-stationary distributions while preserving prior knowledge upon encountering new experiences. With the advancement of foundation models, CL research has pivoted from the initial learning-from-scratch paradigm towards utilizing generic features from large-scale pre-training. However, existing approaches to CL with pre-trained models primarily focus on separating class-specific features from the final representation layer and neglect the potential of intermediate representations to capture low- and mid-level features, which are more invariant to domain shifts. In this work, we propose LayUP, a new prototype-based approach to continual learning that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network. Our method is conceptually simple, does not require access to prior data, and works out of the box with any foundation model. LayUP surpasses the state of the art in four of the seven class-incremental learning benchmarks, all three domain-incremental learning benchmarks and in six of the seven online continual learning benchmarks, while significantly reducing memory and computational requirements compared to existing baselines. Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes well beyond their final embeddings.
Updated: 2024-04-17 19:32:47
标题: 在层之间阅读:利用层内表示进行无需复习的预训练模型持续学习
摘要: 我们解决了持续学习(CL)问题,其中模型必须从非稳态分布中学习一系列任务,同时在遇到新经验时保留先前的知识。随着基础模型的进步,CL研究已经从最初的从头学习范式转向利用大规模预训练的通用特征。然而,现有的基于预训练模型的CL方法主要集中在将类别特定特征与最终表示层分离,并忽略了中间表示捕获低级和中级特征的潜力,这些特征对领域转移更具不变性。在这项工作中,我们提出了LayUP,一种基于原型的持续学习方法,利用了预训练网络的多个中间层的二阶特征统计。我们的方法在概念上简单,不需要访问先前的数据,并且可以与任何基础模型直接使用。与现有基线相比,LayUP在七个类增量学习基准中超越了现有技术水平,在所有三个领域增量学习基准中以及在七个在线持续学习基准中的六个基准中,同时显著减少了内存和计算需求。我们的结果表明,充分利用预训练模型在CL中的表征能力远远超出了它们的最终嵌入。
更新时间: 2024-04-17 19:32:47
领域: cs.LG,cs.CV
Pretraining Billion-scale Geospatial Foundational Models on Frontier
As AI workloads increase in scope, generalization capability becomes challenging for small task-specific models and their demand for large amounts of labeled training samples increases. On the contrary, Foundation Models (FMs) are trained with internet-scale unlabeled data via self-supervised learning and have been shown to adapt to various tasks with minimal fine-tuning. Although large FMs have demonstrated significant impact in natural language processing and computer vision, efforts toward FMs for geospatial applications have been restricted to smaller size models, as pretraining larger models requires very large computing resources equipped with state-of-the-art hardware accelerators. Current satellite constellations collect 100+TBs of data a day, resulting in images that are billions of pixels and multimodal in nature. Such geospatial data poses unique challenges opening up new opportunities to develop FMs. We investigate billion scale FMs and HPC training profiles for geospatial applications by pretraining on publicly available data. We studied from end-to-end the performance and impact in the solution by scaling the model size. Our larger 3B parameter size model achieves up to 30% improvement in top1 scene classification accuracy when comparing a 100M parameter model. Moreover, we detail performance experiments on the Frontier supercomputer, America's first exascale system, where we study different model and data parallel approaches using PyTorch's Fully Sharded Data Parallel library. Specifically, we study variants of the Vision Transformer architecture (ViT), conducting performance analysis for ViT models with size up to 15B parameters. By discussing throughput and performance bottlenecks under different parallelism configurations, we offer insights on how to leverage such leadership-class HPC resources when developing large models for geospatial imagery applications.
Updated: 2024-04-17 19:16:32
标题: 在前沿上对十亿级地理空间基础模型进行预训练
摘要: 随着人工智能工作负载范围的增加,小型任务特定模型的泛化能力变得具有挑战性,它们对大量标记训练样本的需求增加。相反,基础模型(FMs)通过自监督学习使用互联网规模的未标记数据进行训练,并已显示出在最小微调情况下适应各种任务。尽管大型基础模型已在自然语言处理和计算机视觉中展示了显著影响,但针对地理空间应用的基础模型的努力仅限于较小规模的模型,因为预训练较大模型需要配备最先进的硬件加速器的非常大的计算资源。当前卫星星座每天收集100多TB的数据,导致图像像素数以十亿计且具有多模态性。这种地理空间数据带来独特挑战,为开发基础模型提供了新机会。我们通过在公开可用数据上进行预训练来研究地理空间应用的十亿级基础模型和高性能计算(HPC)训练配置文件。我们从端到端研究了通过扩大模型规模来提高解决方案性能和影响。我们的更大的30亿参数大小模型在比较100亿参数模型时,实现了高达30%的top1场景分类准确性改进。此外,我们详细描述了在Frontier超级计算机上的性能实验,这是美国第一台百亿级系统,在这里我们研究了使用PyTorch的Fully Sharded Data Parallel库的不同模型和数据并行方法。具体来说,我们研究了Vision Transformer架构(ViT)的变体,对具有高达150亿参数的ViT模型进行性能分析。通过讨论在不同并行配置下的吞吐量和性能瓶颈,我们提供了如何利用这类领先级HPC资源开发用于地理空间图像应用的大型模型的见解。
更新时间: 2024-04-17 19:16:32
领域: cs.AI
Dynamic Frequency-Based Fingerprinting Attacks against Modern Sandbox Environments
The cloud computing landscape has evolved significantly in recent years, embracing various sandboxes to meet the diverse demands of modern cloud applications. These sandboxes encompass container-based technologies like Docker and gVisor, microVM-based solutions like Firecracker, and security-centric sandboxes relying on Trusted Execution Environments (TEEs) such as Intel SGX and AMD SEV. However, the practice of placing multiple tenants on shared physical hardware raises security and privacy concerns, most notably side-channel attacks. In this paper, we investigate the possibility of fingerprinting containers through CPU frequency reporting sensors in Intel and AMD CPUs. One key enabler of our attack is that the current CPU frequency information can be accessed by user-space attackers. We demonstrate that Docker images exhibit a unique frequency signature, enabling the distinction of different containers with up to 84.5% accuracy even when multiple containers are running simultaneously in different cores. Additionally, we assess the effectiveness of our attack when performed against several sandboxes deployed in cloud environments, including Google's gVisor, AWS' Firecracker, and TEE-based platforms like Gramine (utilizing Intel SGX) and AMD SEV. Our empirical results show that these attacks can also be carried out successfully against all of these sandboxes in less than 40 seconds, with an accuracy of over 70% in all cases. Finally, we propose a noise injection-based countermeasure to mitigate the proposed attack on cloud environments.
Updated: 2024-04-17 18:59:24
标题: 现代沙箱环境中的动态基于频率的指纹攻击
摘要: 近年来,云计算领域发展迅速,采用各种沙盒技术以满足现代云应用的多样化需求。这些沙盒技术包括基于容器的技术,如Docker和gVisor,基于微型虚拟机的解决方案,如Firecracker,以及依赖于可信执行环境(TEE)的安全中心沙盒,如Intel SGX和AMD SEV。然而,在共享物理硬件上放置多个租户的做法引发了安全和隐私问题,尤其是侧信道攻击问题。 在本文中,我们调查了通过Intel和AMD CPU中的CPU频率报告传感器对容器进行指纹识别的可能性。我们攻击的一个关键因素是当前CPU频率信息可被用户空间攻击者访问。我们展示了Docker镜像展示出独特的频率签名,使得即使多个容器在不同核心中同时运行,也能以高达84.5%的准确率区分不同的容器。此外,我们评估了我们的攻击对部署在云环境中的几种沙盒的有效性,包括Google的gVisor、AWS的Firecracker以及利用Intel SGX的Gramine和AMD SEV的TEE平台。我们的实证结果表明,这些攻击也可以在不到40秒的时间内成功地针对所有这些沙盒进行,而且在所有情况下的准确率均超过70%。最后,我们提出了一种基于噪声注入的对抗措施,以减轻云环境中所提出的攻击。
更新时间: 2024-04-17 18:59:24
领域: cs.CR,cs.LG
A Secure and Trustworthy Network Architecture for Federated Learning Healthcare Applications
Federated Learning (FL) has emerged as a promising approach for privacy-preserving machine learning, particularly in sensitive domains such as healthcare. In this context, the TRUSTroke project aims to leverage FL to assist clinicians in ischemic stroke prediction. This paper provides an overview of the TRUSTroke FL network infrastructure. The proposed architecture adopts a client-server model with a central Parameter Server (PS). We introduce a Docker-based design for the client nodes, offering a flexible solution for implementing FL processes in clinical settings. The impact of different communication protocols (HTTP or MQTT) on FL network operation is analyzed, with MQTT selected for its suitability in FL scenarios. A control plane to support the main operations required by FL processes is also proposed. The paper concludes with an analysis of security aspects of the FL architecture, addressing potential threats and proposing mitigation strategies to increase the trustworthiness level.
Updated: 2024-04-17 18:55:41
标题: 一个安全可信的联合学习医疗应用网络架构
摘要: Federated Learning(FL)已经成为一种有前途的隐私保护机器学习方法,特别是在敏感领域如医疗保健中。在这个背景下,TRUSTroke项目旨在利用FL来帮助临床医生预测缺血性中风。本文提供了TRUSTroke FL网络基础设施的概述。所提出的架构采用了一个具有中央参数服务器(PS)的客户端-服务器模型。我们介绍了基于Docker的设计用于客户端节点,为在临床环境中实现FL过程提供了灵活的解决方案。分析了不同通信协议(HTTP或MQTT)对FL网络操作的影响,选择了MQTT因其适用于FL场景。还提出了一个控制平面来支持FL过程所需的主要操作。文章最后对FL架构的安全方面进行了分析,解决了潜在威胁并提出了增加可信度水平的缓解策略。
更新时间: 2024-04-17 18:55:41
领域: cs.AI,cs.DC
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
The Socratic method is a way of guiding students toward solving a problem independently without directly revealing the solution to the problem. Although this method has been shown to significantly improve student learning outcomes, it remains a complex labor-intensive task for instructors. Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students. However, existing methods that involve prompting these LLMs sometimes produce invalid outputs, e.g., those that directly reveal the solution to the problem or provide irrelevant or premature questions. To alleviate this problem, inspired by reinforcement learning with AI feedback (RLAIF), we first propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways. Next, we propose a method to optimize open-source LLMs such as LLama 2 to prefer ground-truth questions over generated invalid ones, using direct preference optimization (DPO). Our experiments on a Socratic questions dataset for student code debugging show that a DPO-optimized 7B LLama 2 model can effectively avoid generating invalid questions, and as a result, outperforms existing state-of-the-art prompting methods.
Updated: 2024-04-17 18:53:55
标题: 通过数据增强和偏好优化改善苏格拉底问题生成
摘要: 苏格拉底式教学方法是一种引导学生独立解决问题的方式,而不是直接揭示问题的解决方案。尽管已经证明这种方法显著提高了学生的学习成果,但对教师来说仍然是一项复杂且需要大量劳动力的任务。大型语言模型(LLMs)可以通过自动生成苏格拉底式问题来增强人类努力。然而,现有的涉及提示这些LLMs的方法有时会产生无效的输出,例如直接揭示问题的解决方案或提供无关或过早的问题。为了缓解这个问题,受到带有AI反馈的强化学习(RLAIF)的启发,我们首先提出了一种数据增强方法,以特定方式丰富现有的苏格拉底式提问数据集,其中包含无效问题。接下来,我们提出了一种方法来优化开源LLMs,如LLama 2,使其更倾向于地面真实问题而不是生成的无效问题,使用直接偏好优化(DPO)。我们在一个用于学生代码调试的苏格拉底式问题数据集上进行的实验表明,经过DPO优化的7B LLama 2模型可以有效避免生成无效问题,从而优于现有的最先进提示方法。
更新时间: 2024-04-17 18:53:55
领域: cs.CL,cs.CY,cs.LG
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model's background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora's development and investigate the underlying technologies used to build this "world simulator". Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from film-making and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation.
Updated: 2024-04-17 18:41:39
标题: Sora:大视觉模型背景、技术、局限性和机遇综述
摘要: Sora是一个文本到视频生成的人工智能模型,由OpenAI于2024年2月发布。该模型经过训练,能够根据文本指令生成逼真或富有想象力的场景视频,并在模拟物理世界方面显示出潜力。本文基于公开的技术报告和逆向工程,全面回顾了该模型的背景、相关技术、应用、尚存挑战以及文本到视频人工智能模型的未来方向。我们首先追溯了Sora的发展历程,并调查了用于构建这个“世界模拟器”的基础技术。然后,我们详细描述了Sora在多个行业中的应用和潜在影响,涵盖从电影制作和教育到营销等领域。我们讨论了需要解决的主要挑战和限制,以广泛部署Sora,比如确保视频生成的安全和无偏见性。最后,我们讨论了Sora和视频生成模型的未来发展,以及该领域的进展如何能够启用新的人机交互方式,提升视频生成的生产力和创造力。
更新时间: 2024-04-17 18:41:39
领域: cs.CV,cs.AI,cs.LG
Overconfident and Unconfident AI Hinder Human-AI Collaboration
AI transparency is a central pillar of responsible AI deployment and effective human-AI collaboration. A critical approach is communicating uncertainty, such as displaying AI's confidence level, or its correctness likelihood (CL), to users. However, these confidence levels are often uncalibrated, either overestimating or underestimating actual CL, posing risks and harms to human-AI collaboration. This study examines the effects of uncalibrated AI confidence on users' trust in AI, AI advice adoption, and collaboration outcomes. We further examined the impact of increased transparency, achieved through trust calibration support, on these outcomes. Our results reveal that uncalibrated AI confidence leads to both the misuse of overconfident AI and disuse of unconfident AI, thereby hindering outcomes of human-AI collaboration. Deficiency of trust calibration support exacerbates this issue by making it harder to detect uncalibrated confidence, promoting misuse and disuse of AI. Conversely, trust calibration support aids in recognizing uncalibration and reducing misuse, but it also fosters distrust and causes disuse of AI. Our findings highlight the importance of AI confidence calibration for enhancing human-AI collaboration and suggest directions for AI design and regulation.
Updated: 2024-04-17 18:37:12
标题: 自信和缺乏自信的人工智能阻碍人工智能与人类的合作
摘要: 人工智能透明度是负责任的人工智能部署和有效的人工智能与人类合作的中心支柱。一个关键的方法是传达不确定性,比如向用户显示人工智能的置信水平,或其正确性可能性(CL)。然而,这些置信水平通常是未校准的,要么高估要么低估实际的CL,从而给人工智能与人类合作带来风险和危害。本研究检验了未校准的人工智能置信对用户对人工智能的信任、人工智能建议采纳和合作结果的影响。我们进一步研究了通过信任校准支持实现的增加透明度对这些结果的影响。我们的结果显示,未校准的人工智能置信导致对过于自信的人工智能的误用和对不自信的人工智能的不使用,从而阻碍了人工智能与人类合作的结果。信任校准支持的不足加剧了这一问题,使未校准的置信更难以检测,促进了人工智能的误用和不使用。相反,信任校准支持有助于识别未校准和减少误用,但也会促使不信任和导致人工智能的不使用。我们的研究结果强调了人工智能置信校准对增强人工智能与人类合作的重要性,并为人工智能设计和监管提出了建议方向。
更新时间: 2024-04-17 18:37:12
领域: cs.AI,cs.HC
Learning time-scales in two-layers neural networks
Gradient-based learning in multi-layer neural networks displays a number of striking features. In particular, the decrease rate of empirical risk is non-monotone even after averaging over large batches. Long plateaus in which one observes barely any progress alternate with intervals of rapid decrease. These successive phases of learning often take place on very different time scales. Finally, models learnt in an early phase are typically `simpler' or `easier to learn' although in a way that is difficult to formalize. Although theoretical explanations of these phenomena have been put forward, each of them captures at best certain specific regimes. In this paper, we study the gradient flow dynamics of a wide two-layer neural network in high-dimension, when data are distributed according to a single-index model (i.e., the target function depends on a one-dimensional projection of the covariates). Based on a mixture of new rigorous results, non-rigorous mathematical derivations, and numerical simulations, we propose a scenario for the learning dynamics in this setting. In particular, the proposed evolution exhibits separation of timescales and intermittency. These behaviors arise naturally because the population gradient flow can be recast as a singularly perturbed dynamical system.
Updated: 2024-04-17 18:36:27
标题: 学习时间尺度在双层神经网络中的应用
摘要: 多层神经网络中的基于梯度的学习显示了一些显著特征。特别是,即使在对大批量数据进行平均之后,经验风险的减小速率也是非单调的。在观察到几乎没有进展的长期平台和快速下降的间隔交替出现。这些连续的学习阶段往往发生在非常不同的时间尺度上。最后,在早期阶段学习的模型通常更“简单”或更“容易学习”,尽管以一种难以形式化的方式。 尽管已经提出了这些现象的理论解释,但其中每一个最多只能捕捉到特定的情况。在本文中,我们研究了高维空间中基于单指数模型(即目标函数依赖于协变量的一维投影)分布的广泛双层神经网络的梯度流动动力学。基于一系列新的严格结果、非严格的数学推导和数值模拟,我们提出了这种环境下学习动力学的情景。特别是,所提出的演化展示了时间尺度的分离和间歇性。这些行为自然产生,因为人口梯度流动可以重新构造为一个奇异扰动动力系统。
更新时间: 2024-04-17 18:36:27
领域: cs.LG,math.OC,stat.ML,34E15, 37N40, 68T07
A decoder-only foundation model for time-series forecasting
Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time-series corpus, and can work well across different forecasting history lengths, prediction lengths and temporal granularities.
Updated: 2024-04-17 18:24:45
标题: 一个仅解码器的时间序列预测基础模型
摘要: 受自然语言处理(NLP)大型语言模型的最新进展的启发,我们设计了一个用于预测的时间序列基础模型,其在各种公共数据集上的开箱即用的零-shot性能接近于每个单独数据集的最先进监督预测模型的准确度。我们的模型基于在大规模时间序列语料库上预训练的修补解码器风格的注意力模型,并且可以在不同的预测历史长度、预测长度和时间粒度上很好地工作。
更新时间: 2024-04-17 18:24:45
领域: cs.CL,cs.AI,cs.LG
Cross-Problem Learning for Solving Vehicle Routing Problems
Existing neural heuristics often train a deep architecture from scratch for each specific vehicle routing problem (VRP), ignoring the transferable knowledge across different VRP variants. This paper proposes the cross-problem learning to assist heuristics training for different downstream VRP variants. Particularly, we modularize neural architectures for complex VRPs into 1) the backbone Transformer for tackling the travelling salesman problem (TSP), and 2) the additional lightweight modules for processing problem-specific features in complex VRPs. Accordingly, we propose to pre-train the backbone Transformer for TSP, and then apply it in the process of fine-tuning the Transformer models for each target VRP variant. On the one hand, we fully fine-tune the trained backbone Transformer and problem-specific modules simultaneously. On the other hand, we only fine-tune small adapter networks along with the modules, keeping the backbone Transformer still. Extensive experiments on typical VRPs substantiate that 1) the full fine-tuning achieves significantly better performance than the one trained from scratch, and 2) the adapter-based fine-tuning also delivers comparable performance while being notably parameter-efficient. Furthermore, we empirically demonstrate the favorable effect of our method in terms of cross-distribution application and versatility.
Updated: 2024-04-17 18:17:50
标题: 跨问题学习用于解决车辆路径问题
摘要: 现有的神经启发式方法通常针对每个特定的车辆路径问题(VRP)从头开始训练深度架构,忽略了在不同VRP变体之间可转移的知识。本文提出了跨问题学习,以帮助启发式方法训练不同的下游VRP变体。特别地,我们将复杂VRP的神经架构模块化为1)用于解决旅行推销员问题(TSP)的主干Transformer,以及2)用于处理复杂VRP中特定问题特征的额外轻量级模块。因此,我们建议为TSP预先训练主干Transformer,然后将其应用于微调Transformer模型以适应每个目标VRP变体的过程。一方面,我们同时完全微调训练好的主干Transformer和特定问题模块。另一方面,我们仅微调小型适配器网络以及模块,保持主干Transformer不变。大量对典型VRP的实验证明,1)全面微调的性能明显优于从头开始训练的性能,2)基于适配器的微调也能提供可比较的性能,同时具有显著的参数效率。此外,我们还在交叉分布应用和多功能性方面经验性地证明了我们方法的有利效果。
更新时间: 2024-04-17 18:17:50
领域: cs.AI
Practical applications of machine-learned flows on gauge fields
Normalizing flows are machine-learned maps between different lattice theories which can be used as components in exact sampling and inference schemes. Ongoing work yields increasingly expressive flows on gauge fields, but it remains an open question how flows can improve lattice QCD at state-of-the-art scales. We discuss and demonstrate two applications of flows in replica exchange (parallel tempering) sampling, aimed at improving topological mixing, which are viable with iterative improvements upon presently available flows.
Updated: 2024-04-17 18:17:14
标题: 机器学习流在规范场上的实际应用
摘要: 正则流是不同晶格理论之间的机器学习映射,可用作精确抽样和推理方案中的组件。正在进行的工作产生了对规范场更具表现力的流,但如何通过流在最先进的尺度上改进晶格量子色动力学仍是一个未解决的问题。我们讨论并展示了在副本交换(平行淬火)抽样中利用流改进拓扑混合的两个应用,这些应用在目前可用的流基础上通过迭代改进是可行的。
更新时间: 2024-04-17 18:17:14
领域: hep-lat,cond-mat.stat-mech,cs.LG
Deep Dependency Networks and Advanced Inference Schemes for Multi-Label Classification
We present a unified framework called deep dependency networks (DDNs) that combines dependency networks and deep learning architectures for multi-label classification, with a particular emphasis on image and video data. The primary advantage of dependency networks is their ease of training, in contrast to other probabilistic graphical models like Markov networks. In particular, when combined with deep learning architectures, they provide an intuitive, easy-to-use loss function for multi-label classification. A drawback of DDNs compared to Markov networks is their lack of advanced inference schemes, necessitating the use of Gibbs sampling. To address this challenge, we propose novel inference schemes based on local search and integer linear programming for computing the most likely assignment to the labels given observations. We evaluate our novel methods on three video datasets (Charades, TACoS, Wetlab) and three image datasets (MS-COCO, PASCAL VOC, NUS-WIDE), comparing their performance with (a) basic neural architectures and (b) neural architectures combined with Markov networks equipped with advanced inference and learning techniques. Our results demonstrate the superiority of our new DDN methods over the two competing approaches.
Updated: 2024-04-17 18:04:37
标题: 深度依赖网络和高级推断方案用于多标签分类
摘要: 我们提出了一个统一的框架,称为深度依赖网络(DDNs),将依赖网络和深度学习架构结合起来,用于多标签分类,特别强调图像和视频数据。依赖网络的主要优势在于其易于训练,与马尔可夫网络等其他概率图模型形成对比。特别是当与深度学习架构结合时,它们提供了一个直观、易于使用的多标签分类损失函数。与马尔可夫网络相比,DDNs的一个缺点是它们缺乏高级推理方案,需要使用吉布斯采样。为了解决这一挑战,我们提出了基于局部搜索和整数线性规划的新型推理方案,用于计算给定观测的标签最可能的分配。我们在三个视频数据集(Charades、TACoS、Wetlab)和三个图像数据集(MS-COCO、PASCAL VOC、NUS-WIDE)上评估了我们的新方法,将它们的性能与(a)基本神经架构和(b)配备高级推理和学习技术的马尔可夫网络结合的神经架构进行比较。我们的结果表明,我们的新DDN方法优于这两种竞争方法。
更新时间: 2024-04-17 18:04:37
领域: cs.LG,cs.AI,cs.CV,stat.ML
Exploring DNN Robustness Against Adversarial Attacks Using Approximate Multipliers
Deep Neural Networks (DNNs) have advanced in many real-world applications, such as healthcare and autonomous driving. However, their high computational complexity and vulnerability to adversarial attacks are ongoing challenges. In this letter, approximate multipliers are used to explore DNN robustness improvement against adversarial attacks. By uniformly replacing accurate multipliers for state-of-the-art approximate ones in DNN layer models, we explore the DNNs robustness against various adversarial attacks in a feasible time. Results show up to 7% accuracy drop due to approximations when no attack is present while improving robust accuracy up to 10% when attacks applied.
Updated: 2024-04-17 18:03:12
标题: 使用近似乘法器探索深度神经网络对抗性攻击的稳健性
摘要: 深度神经网络(DNNs)在许多实际应用中取得了进展,如医疗保健和自动驾驶。然而,它们的高计算复杂性和对敌对攻击的脆弱性仍然是挑战。本文使用近似乘法器来探讨DNN对抗攻击的鲁棒性改进。通过在DNN层模型中统一替换最先进的准确乘法器为近似乘法器,我们在可行的时间内探索了DNN对抗各种攻击的鲁棒性。结果显示,当没有攻击时,由于近似,准确度下降了最多7%,而在应用攻击时,鲁棒准确度提高了最多10%。
更新时间: 2024-04-17 18:03:12
领域: cs.LG,cs.CR
Designing an Intelligent Parcel Management System using IoT & Machine Learning
Parcels delivery is a critical activity in railways. More importantly, each parcel must be thoroughly checked and sorted according to its destination address. We require an efficient and robust IoT system capable of doing all of these tasks with great precision and minimal human interaction. This paper discusses, We created a fully-fledged solution using IoT and machine learning to assist trains in performing this operation efficiently. In this study, we covered the product, which consists mostly of two phases. Scanning is the first step, followed by sorting. During the scanning process, the parcel will be passed through three scanners that will look for explosives, drugs, and any dangerous materials in the parcel and will trash it if any of the tests fail. When the scanning step is over, the parcel moves on to the sorting phase, where we use QR codes to retrieve the details of the parcels and sort them properly. The simulation of the system is done using the blender software. Our research shows that our procedure significantly improves accuracy as well as the assessment of cutting-edge technology and existing techniques.
Updated: 2024-04-17 18:00:46
标题: 设计一个智能包裹管理系统,利用物联网和机器学习技术。
摘要: 包裹交付在铁路中是一项关键活动。更重要的是,每个包裹必须根据其目的地地址进行彻底检查和分类。我们需要一个高效且稳健的物联网系统,能够以极高的精度和最少的人工干预完成所有这些任务。本文讨论了,我们利用物联网和机器学习创建了一个完整的解决方案,以帮助火车高效地执行这一操作。在这项研究中,我们涵盖的产品主要包括两个阶段。扫描是第一步,随后是分类。在扫描过程中,包裹将通过三台扫描仪进行检查,检查是否存在爆炸物、毒品和任何危险材料,并在任何测试失败时将其丢弃。当扫描步骤结束后,包裹将进入分类阶段,我们使用QR码来检索包裹的详细信息并进行正确的分类。系统的模拟是使用blender软件完成的。我们的研究表明,我们的程序显著提高了准确性以及对尖端技术和现有技术的评估。
更新时间: 2024-04-17 18:00:46
领域: cs.CY,cs.LG
Towards White Box Deep Learning
Deep neural networks learn fragile "shortcut" features, rendering them difficult to interpret (black box) and vulnerable to adversarial attacks. This paper proposes semantic features as a general architectural solution to this problem. The main idea is to make features locality-sensitive in the adequate semantic topology of the domain, thus introducing a strong regularization. The proof of concept network is lightweight, inherently interpretable and achieves almost human-level adversarial test metrics - with no adversarial training! These results and the general nature of the approach warrant further research on semantic features. The code is available at https://github.com/314-Foundation/white-box-nn
Updated: 2024-04-17 17:58:52
标题: 朝向白盒深度学习
摘要: 深度神经网络学习脆弱的“捷径”特征,使它们难以解释(黑盒)并容易受到对抗性攻击的影响。本文提出了语义特征作为解决这一问题的一般性架构方案。主要思想是在领域的适当语义拓扑中使特征对局部敏感,从而引入强大的正则化。概念验证网络轻量级,本质上可解释,并实现了几乎人类水平的对抗性测试指标 - 无需对抗训练!这些结果以及该方法的一般性质保证了对语义特征的进一步研究。代码可在https://github.com/314-Foundation/white-box-nn上找到。
更新时间: 2024-04-17 17:58:52
领域: cs.LG,cs.AI,cs.NE
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning
Large language models (LLMs) have shown great potential in complex reasoning tasks, yet their performance is often hampered by the scarcity of high-quality and reasoning-focused training datasets. Addressing this challenge, we propose Key-Point-Driven Data Synthesis (KPDDS), a novel data synthesis framework that synthesizes question-answer pairs by leveraging key points and exemplar practices from authentic data sources. KPDDS ensures the generation of novel questions with rigorous quality control and substantial scalability. As a result, we present KPMath, an extensive synthetic dataset tailored for mathematical reasoning, comprising over 800K question-answer pairs. Utilizing KPMath and augmenting it with additional reasoning-intensive corpora, we create the comprehensive KPMath-Plus dataset. The fine-tuned DeepSeekMath model on KPMath-Plus achieves zero-shot PASS@1 accuracies of 83.9% on GSM8K and 48.8% on MATH, and also reaches promising performance on other math reasoning datasets, outperforming competitors in the 7B to 70B range.
Updated: 2024-04-17 17:58:39
标题: 关键点驱动的数据合成及其对数学推理的增强
摘要: 大型语言模型(LLMs)在复杂推理任务中显示出巨大潜力,然而它们的性能常常受到高质量和以推理为重点的训练数据稀缺的限制。为了解决这一挑战,我们提出了关键点驱动的数据合成(KPDDS),这是一个新颖的数据合成框架,通过利用关键点和来自真实数据源的示例实践来合成问题-答案对。KPDDS确保通过严格的质量控制和可扩展性生成新颖的问题。因此,我们提出KPMath,一个专门用于数学推理的广泛合成数据集,包括超过80万个问题-答案对。利用KPMath并将其与其他推理密集型语料库相结合,我们创建了综合的KPMath-Plus数据集。在KPMath-Plus上对经过微调的DeepSeekMath模型实现了83.9%的GSM8K和48.8%的MATH零射击PASS@1准确率,并且在其他数学推理数据集上也取得了令人满意的表现,在7B到70B范围内超越了竞争对手。
更新时间: 2024-04-17 17:58:39
领域: cs.CL,cs.AI
Soil Fertility Prediction Using Combined USB-microscope Based Soil Image, Auxiliary Variables, and Portable X-Ray Fluorescence Spectrometry
This study explored the application of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis to rapidly assess soil fertility, focusing on critical parameters such as available B, organic carbon (OC), available Mn, available S, and the sulfur availability index (SAI). Analyzing 1,133 soil samples from various agro-climatic zones in Eastern India, the research combined color and texture features from microscopic soil images, PXRF data, and auxiliary soil variables (AVs) using a Random Forest model. Results indicated that integrating image features (IFs) with auxiliary variables (AVs) significantly enhanced prediction accuracy for available B (R^2 = 0.80) and OC (R^2 = 0.88). A data fusion approach, incorporating IFs, AVs, and PXRF data, further improved predictions for available Mn and SAI with R^2 values of 0.72 and 0.70, respectively. The study demonstrated how these integrated technologies have the potential to provide quick and affordable options for soil testing, opening up access to more sophisticated prediction models and a better comprehension of the fertility and health of the soil. Future research should focus on the application of deep learning models on a larger dataset of soil images, developed using soils from a broader range of agro-climatic zones under field condition.
Updated: 2024-04-17 17:57:20
标题: 使用基于组合USB显微镜土壤图像、辅助变量和便携式X射线荧光光谱仪进行土壤肥力预测
摘要: 这项研究探讨了便携式X射线荧光(PXRF)光谱法和土壤图像分析在快速评估土壤肥力方面的应用,重点关注可用硼、有机碳(OC)、可用锰、可用硫和硫可用性指数(SAI)等关键参数。通过分析来自印度东部各种农业气候区的1,133个土壤样本,研究结合了来自微观土壤图像、PXRF数据和辅助土壤变量(AV)的颜色和质地特征,使用随机森林模型。结果表明,将图像特征(IFs)与辅助变量(AVs)集成能够显著提高可用硼(R^2 = 0.80)和OC(R^2 = 0.88)的预测准确性。数据融合方法,结合IFs、AVs和PXRF数据,进一步提高了可用锰和SAI的预测准确性,分别达到了0.72和0.70的R^2值。该研究展示了这些集成技术有潜力提供快速且经济实惠的土壤测试选项,为更复杂的预测模型和对土壤肥力和健康的更好理解提供了途径。未来研究应重点关注在更广泛的农业气候区域内使用更大数据集的土壤图像开发深度学习模型的应用。
更新时间: 2024-04-17 17:57:20
领域: eess.IV,cs.CV,cs.LG
Learning to Solve the Constrained Most Probable Explanation Task in Probabilistic Graphical Models
We propose a self-supervised learning approach for solving the following constrained optimization task in log-linear models or Markov networks. Let $f$ and $g$ be two log-linear models defined over the sets $\mathbf{X}$ and $\mathbf{Y}$ of random variables respectively. Given an assignment $\mathbf{x}$ to all variables in $\mathbf{X}$ (evidence) and a real number $q$, the constrained most-probable explanation (CMPE) task seeks to find an assignment $\mathbf{y}$ to all variables in $\mathbf{Y}$ such that $f(\mathbf{x}, \mathbf{y})$ is maximized and $g(\mathbf{x}, \mathbf{y})\leq q$. In our proposed self-supervised approach, given assignments $\mathbf{x}$ to $\mathbf{X}$ (data), we train a deep neural network that learns to output near-optimal solutions to the CMPE problem without requiring access to any pre-computed solutions. The key idea in our approach is to use first principles and approximate inference methods for CMPE to derive novel loss functions that seek to push infeasible solutions towards feasible ones and feasible solutions towards optimal ones. We analyze the properties of our proposed method and experimentally demonstrate its efficacy on several benchmark problems.
Updated: 2024-04-17 17:55:17
标题: 学习在概率图模型中解决受限最可能解释任务
摘要: 我们提出了一种自监督学习方法,用于解决对数线性模型或马尔可夫网络中的以下约束优化任务。设$f$和$g$是分别定义在随机变量集$\mathbf{X}$和$\mathbf{Y}$上的两个对数线性模型。给定对所有变量$\mathbf{X}$(证据)的赋值$\mathbf{x}$和一个实数$q$,约束最可能解释(CMPE)任务旨在找到对所有变量$\mathbf{Y}$的赋值$\mathbf{y}$,使得$f(\mathbf{x}, \mathbf{y})$最大化且$g(\mathbf{x}, \mathbf{y})\leq q$。在我们提出的自监督方法中,给定对$\mathbf{X}$(数据)的赋值$\mathbf{x}$,我们训练一个深度神经网络,学习输出接近最优解的CMPE问题解决方案,而无需访问任何预先计算的解决方案。我们方法的关键思想是使用第一原理和近似推理方法来推导新的损失函数,旨在将不可行解决方案推向可行解决方案,将可行解决方案推向最优解决方案。我们分析了我们提出的方法的特性,并在几个基准问题上实验证明了其有效性。
更新时间: 2024-04-17 17:55:17
领域: cs.LG,cs.AI
VG4D: Vision-Language Model Goes 4D Video Recognition
Understanding the real world through point cloud video is a crucial aspect of robotics and autonomous driving systems. However, prevailing methods for 4D point cloud recognition have limitations due to sensor resolution, which leads to a lack of detailed information. Recent advances have shown that Vision-Language Models (VLM) pre-trained on web-scale text-image datasets can learn fine-grained visual concepts that can be transferred to various downstream tasks. However, effectively integrating VLM into the domain of 4D point clouds remains an unresolved problem. In this work, we propose the Vision-Language Models Goes 4D (VG4D) framework to transfer VLM knowledge from visual-text pre-trained models to a 4D point cloud network. Our approach involves aligning the 4D encoder's representation with a VLM to learn a shared visual and text space from training on large-scale image-text pairs. By transferring the knowledge of the VLM to the 4D encoder and combining the VLM, our VG4D achieves improved recognition performance. To enhance the 4D encoder, we modernize the classic dynamic point cloud backbone and propose an improved version of PSTNet, im-PSTNet, which can efficiently model point cloud videos. Experiments demonstrate that our method achieves state-of-the-art performance for action recognition on both the NTU RGB+D 60 dataset and the NTU RGB+D 120 dataset. Code is available at \url{https://github.com/Shark0-0/VG4D}.
Updated: 2024-04-17 17:54:49
标题: VG4D:视觉语言模型进入4D视频识别领域
摘要: 通过点云视频理解真实世界是机器人技术和自动驾驶系统的一个关键方面。然而,目前用于4D点云识别的方法由于传感器分辨率的限制存在局限性,导致缺乏详细信息。最近的进展表明,在网络规模的文本-图像数据集上预训练的视觉语言模型(VLM)可以学习细粒度的视觉概念,并可以转移到各种下游任务。然而,有效地将VLM整合到4D点云领域仍然是一个未解决的问题。在这项工作中,我们提出了Vision-Language Models Goes 4D(VG4D)框架,将VLM知识从视觉-文本预训练模型转移到4D点云网络。我们的方法涉及将4D编码器的表示与VLM对齐,通过在大规模图像-文本对上进行训练学习共享的视觉和文本空间。通过将VLM的知识转移到4D编码器并结合VLM,我们的VG4D实现了改进的识别性能。为了增强4D编码器,我们对经典的动态点云骨干进行了现代化,并提出了一个改进版本的PSTNet,即im-PSTNet,可以高效地建模点云视频。实验证明,我们的方法在NTU RGB+D 60数据集和NTU RGB+D 120数据集上均实现了最先进的动作识别性能。代码可在\url{https://github.com/Shark0-0/VG4D}上找到。
更新时间: 2024-04-17 17:54:49
领域: cs.CV,cs.AI,cs.RO
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for information-seeking tasks such as question answering (QA). By simply prepending retrieved documents in its input along with an instruction, these models can be adapted to various information domains and tasks without additional fine-tuning. While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics such as exact match (EM) and F1 unreliable for accurately quantifying model performance. In this work, we investigate the performance of instruction-following models across three information-seeking QA tasks. We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness), and 2) whether they produce a response based on the provided knowledge (faithfulness). Guided by human evaluation and analysis, we highlight the shortcomings of traditional metrics for both correctness and faithfulness. We then propose simple token-overlap based and model-based metrics that reflect the true performance of these models. Our analysis reveals that instruction-following models are competitive, and sometimes even outperform fine-tuned models for correctness. However, these models struggle to stick to the provided knowledge and often hallucinate in their responses. We hope our work encourages a more holistic evaluation of instruction-following models for QA. Our code and data is available at https://github.com/McGill-NLP/instruct-qa
Updated: 2024-04-17 17:52:18
标题: 评估指令遵循模型在问答中的正确性和忠实度
摘要: 检索增强的指令跟随模型是信息检索任务的一种有吸引力的替代方案,例如问答(QA)。通过简单地在输入中添加检索到的文档以及指令,这些模型可以适应各种信息领域和任务,而无需额外微调。虽然模型的响应往往自然流畅,但额外的冗余性使得传统的QA评估指标,如精确匹配(EM)和F1,无法准确量化模型性能。 在这项工作中,我们研究了指令跟随模型在三个信息检索QA任务中的性能。我们使用自动和人工评估来评估这些模型在两个维度上:1)它们如何满足用户的信息需求(正确性),以及2)它们是否基于提供的知识产生响应(忠实度)。在人工评估和分析的指导下,我们强调了传统指标在正确性和忠实度方面的缺点。然后,我们提出了基于简单的标记重叠和基于模型的指标,以反映这些模型的真实性能。我们的分析表明,指令跟随模型在正确性方面具有竞争力,有时甚至优于微调模型。然而,这些模型难以坚持提供的知识,并且在其响应中经常产生幻觉。我们希望我们的工作能够鼓励对QA中的指令跟随模型进行更全面的评估。我们的代码和数据可在https://github.com/McGill-NLP/instruct-qa找到。
更新时间: 2024-04-17 17:52:18
领域: cs.CL,cs.AI
Variational Bayesian Last Layers
We introduce a deterministic variational formulation for training Bayesian last layer neural networks. This yields a sampling-free, single-pass model and loss that effectively improves uncertainty estimation. Our variational Bayesian last layer (VBLL) can be trained and evaluated with only quadratic complexity in last layer width, and is thus (nearly) computationally free to add to standard architectures. We experimentally investigate VBLLs, and show that they improve predictive accuracy, calibration, and out of distribution detection over baselines across both regression and classification. Finally, we investigate combining VBLL layers with variational Bayesian feature learning, yielding a lower variance collapsed variational inference method for Bayesian neural networks.
Updated: 2024-04-17 17:50:24
标题: 变分贝叶斯最后层
摘要: 我们引入了一种确定性变分形式,用于训练贝叶斯最后一层神经网络。这产生了一个无需抽样、单次传递模型和损失函数,有效改善了不确定性估计。我们的变分贝叶斯最后一层(VBLL)可以通过仅有最后一层宽度的二次复杂度进行训练和评估,因此(几乎)可以免费添加到标准架构中。我们通过实验研究了VBLL,并展示了它们在回归和分类任务中相对于基线模型提高了预测准确性、校准性和超出分布检测的能力。最后,我们研究了将VBLL层与变分贝叶斯特征学习相结合,得到了一个更低方差的折叠变分推理方法,适用于贝叶斯神经网络。
更新时间: 2024-04-17 17:50:24
领域: cs.LG,cs.CV,stat.ML
Explainable Artificial Intelligence Techniques for Accurate Fault Detection and Diagnosis: A Review
As the manufacturing industry advances with sensor integration and automation, the opaque nature of deep learning models in machine learning poses a significant challenge for fault detection and diagnosis. And despite the related predictive insights Artificial Intelligence (AI) can deliver, advanced machine learning engines often remain a black box. This paper reviews the eXplainable AI (XAI) tools and techniques in this context. We explore various XAI methodologies, focusing on their role in making AI decision-making transparent, particularly in critical scenarios where humans are involved. We also discuss current limitations and potential future research that aims to balance explainability with model performance while improving trustworthiness in the context of AI applications for critical industrial use cases.
Updated: 2024-04-17 17:49:38
标题: 可解释人工智能技术用于准确故障检测与诊断:综述
摘要: 随着制造业在传感器集成和自动化方面的不断发展,机器学习中深度学习模型的不透明性对于故障检测和诊断提出了重大挑战。尽管人工智能(AI)提供了相关的预测性见解,但先进的机器学习引擎往往仍然是一个黑匣子。本文回顾了在这一背景下的可解释人工智能(XAI)工具和技术。我们探讨了各种XAI方法论,重点关注它们在使AI决策透明化方面的作用,特别是在涉及人类的关键场景中。我们还讨论了目前的局限性以及可能的未来研究,旨在在提高AI应用在关键工业用例中的可信度的同时,平衡解释性和模型性能。
更新时间: 2024-04-17 17:49:38
领域: cs.AI,cs.LG
Re-Nerfing: Improving Novel Views Synthesis through Novel Views Synthesis
Neural Radiance Fields (NeRFs) have shown remarkable novel view synthesis capabilities even in large-scale, unbounded scenes, albeit requiring hundreds of views or introducing artifacts in sparser settings. Their optimization suffers from shape-radiance ambiguities wherever only a small visual overlap is available. This leads to erroneous scene geometry and artifacts. In this paper, we propose Re-Nerfing, a simple and general multi-stage data augmentation approach that leverages NeRF's own view synthesis ability to address these limitations. With Re-Nerfing, we enhance the geometric consistency of novel views as follows: First, we train a NeRF with the available views. Then, we use the optimized NeRF to synthesize pseudo-views around the original ones with a view selection strategy to improve coverage and preserve view quality. Finally, we train a second NeRF with both the original images and the pseudo views masking out uncertain regions. Extensive experiments applying Re-Nerfing on various pipelines on the mip-NeRF 360 dataset, including Gaussian Splatting, provide valuable insights into the improvements achievable without external data or supervision, on denser and sparser input scenarios. Project page: https://renerfing.github.io
Updated: 2024-04-17 17:44:44
标题: Re-Nerfing:通过新视角合成改进新视角合成
摘要: 神经辐射场(NeRFs)已经展示出在大规模、无界场景中的出色新视图合成能力,尽管需要数百个视图或在稀疏场景中引入伪影。它们的优化在仅有少量视觉重叠的情况下存在形状-辐射模糊,导致错误的场景几何和伪影。在本文中,我们提出了Re-Nerfing,一种简单而通用的多阶段数据增强方法,利用NeRF自身的视图合成能力来解决这些限制。通过Re-Nerfing,我们增强了新视图的几何一致性:首先,我们用可用的视图训练一个NeRF。然后,我们使用优化后的NeRF在原始视图周围合成伪视图,采用视图选择策略来改善覆盖范围并保持视图质量。最后,我们用原始图像和遮盖不确定区域的伪视图来训练第二个NeRF。对mip-NeRF 360数据集上应用Re-Nerfing的大量实验,包括高斯点阵,为在更密集和稀疏输入情况下实现的改进提供了有价值的见解,而无需外部数据或监督。项目页面:https://renerfing.github.io
更新时间: 2024-04-17 17:44:44
领域: cs.CV,cs.GR,cs.LG
Mastering Diverse Domains through World Models
Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algorithms can be readily applied to tasks similar to what they have been developed for, configuring them for new application domains requires significant human expertise and experimentation. We present DreamerV3, a general algorithm that outperforms specialized methods across over 150 diverse tasks, with a single configuration. Dreamer learns a model of the environment and improves its behavior by imagining future scenarios. Robustness techniques based on normalization, balancing, and transformations enable stable learning across domains. Applied out of the box, Dreamer is the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula. This achievement has been posed as a significant challenge in artificial intelligence that requires exploring farsighted strategies from pixels and sparse rewards in an open world. Our work allows solving challenging control problems without extensive experimentation, making reinforcement learning broadly applicable.
Updated: 2024-04-17 17:41:20
标题: 通过世界模型掌握多样领域
摘要: 开发一个通用算法,学习解决各种应用中的任务,一直是人工智能领域的一个基本挑战。虽然当前的强化学习算法可以很容易地应用于它们被开发的类似任务,但为新的应用领域配置它们需要大量的人类专业知识和实验。我们提出DreamerV3,一个通用算法,在超过150个不同任务中表现优于专门方法,只需一个配置。Dreamer通过想象未来场景来学习环境模型,并改进其行为。基于归一化、平衡和转换的鲁棒性技术实现了跨领域的稳定学习。Dreamer是第一个不需要人类数据或课程,就能从头开始在《我的世界》中收集钻石的算法。这一成就被提出为人工智能领域的一个重大挑战,需要在一个开放世界中从像素和稀疏奖励中探索有前瞻性的策略。我们的工作可以解决具有挑战性的控制问题,而无需进行大量实验,使强化学习得以广泛应用。
更新时间: 2024-04-17 17:41:20
领域: cs.AI,cs.LG,stat.ML
Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding
The rapid evolution of text-to-image diffusion models has opened the door of generative AI, enabling the translation of textual descriptions into visually compelling images with remarkable quality. However, a persistent challenge within this domain is the optimization of prompts to effectively convey abstract concepts into concrete objects. For example, text encoders can hardly express "peace", while can easily illustrate olive branches and white doves. This paper introduces a novel approach named Prompt Optimizer for Abstract Concepts (POAC) specifically designed to enhance the performance of text-to-image diffusion models in interpreting and generating images from abstract concepts. We propose a Prompt Language Model (PLM), which is initialized from a pre-trained language model, and then fine-tuned with a curated dataset of abstract concept prompts. The dataset is created with GPT-4 to extend the abstract concept to a scene and concrete objects. Our framework employs a Reinforcement Learning (RL)-based optimization strategy, focusing on the alignment between the generated images by a stable diffusion model and optimized prompts. Through extensive experiments, we demonstrate that our proposed POAC significantly improves the accuracy and aesthetic quality of generated images, particularly in the description of abstract concepts and alignment with optimized prompts. We also present a comprehensive analysis of our model's performance across diffusion models under different settings, showcasing its versatility and effectiveness in enhancing abstract concept representation.
Updated: 2024-04-17 17:38:56
标题: 文本到图像扩散模型的快速优化器,用于抽象概念理解
摘要: 文本到图像扩散模型的快速发展打开了生成AI的大门,使得将文字描述转化为具有显著质量的视觉吸引力图像成为可能。然而,在这一领域中一个持续存在的挑战是优化提示以有效地将抽象概念传达为具体对象。例如,文本编码器很难表达“和平”,而可以很容易地描绘橄榄枝和白鸽。本文介绍了一种名为Prompt Optimizer for Abstract Concepts (POAC)的新方法,专门设计用于提升文本到图像扩散模型在解释和生成来自抽象概念的图像方面的性能。我们提出了一个Prompt Language Model (PLM),它从一个预训练的语言模型初始化,然后通过一个经过筛选的抽象概念提示数据集进行微调。该数据集是使用GPT-4创建的,将抽象概念扩展到一个场景和具体对象。我们的框架采用了基于强化学习(RL)的优化策略,重点关注通过稳定的扩散模型生成的图像与优化提示之间的对齐。通过广泛的实验,我们证明了我们提出的POAC显著提高了生成图像的准确性和审美质量,特别是在描述抽象概念和与优化提示对齐方面。我们还对我们的模型在不同设置下跨扩散模型的性能进行了全面分析,展示了它在增强抽象概念表达方面的多功能性和有效性。
更新时间: 2024-04-17 17:38:56
领域: cs.CV,cs.AI,cs.LG
TCJA-SNN: Temporal-Channel Joint Attention for Spiking Neural Networks
Spiking Neural Networks (SNNs) are attracting widespread interest due to their biological plausibility, energy efficiency, and powerful spatio-temporal information representation ability. Given the critical role of attention mechanisms in enhancing neural network performance, the integration of SNNs and attention mechanisms exhibits potential to deliver energy-efficient and high-performance computing paradigms. We present a novel Temporal-Channel Joint Attention mechanism for SNNs, referred to as TCJA-SNN. The proposed TCJA-SNN framework can effectively assess the significance of spike sequence from both spatial and temporal dimensions. More specifically, our essential technical contribution lies on: 1) We employ the squeeze operation to compress the spike stream into an average matrix. Then, we leverage two local attention mechanisms based on efficient 1D convolutions to facilitate comprehensive feature extraction at the temporal and channel levels independently. 2) We introduce the Cross Convolutional Fusion (CCF) layer as a novel approach to model the inter-dependencies between the temporal and channel scopes. This layer breaks the independence of these two dimensions and enables the interaction between features. Experimental results demonstrate that the proposed TCJA-SNN outperforms SOTA by up to 15.7% accuracy on standard static and neuromorphic datasets, including Fashion-MNIST, CIFAR10-DVS, N-Caltech 101, and DVS128 Gesture. Furthermore, we apply the TCJA-SNN framework to image generation tasks by leveraging a variation autoencoder. To the best of our knowledge, this study is the first instance where the SNN-attention mechanism has been employed for image classification and generation tasks. Notably, our approach has achieved SOTA performance in both domains, establishing a significant advancement in the field. Codes are available at https://github.com/ridgerchu/TCJA.
Updated: 2024-04-17 17:36:19
标题: TCJA-SNN: 脉冲神经网络的时空通道联合注意力
摘要: 脉冲神经网络(SNNs)因其生物合理性、能源效率和强大的时空信息表示能力而引起了广泛关注。鉴于注意力机制在提升神经网络性能方面的关键作用,将SNNs和注意力机制集成起来展现出提供能源高效和高性能计算范式的潜力。我们提出了一种新颖的面向SNNs的时间-通道联合注意力机制,称为TCJA-SNN。所提出的TCJA-SNN框架能够有效地评估来自空间和时间维度的脉冲序列的重要性。具体而言,我们的关键技术贡献在于:1)我们利用挤压操作将脉冲流压缩成平均矩阵。然后,我们利用基于高效1D卷积的两个局部注意力机制分别促进了在时间和通道级别的全面特征提取。2)我们引入了交叉卷积融合(CCF)层作为一种新颖方法来建模时间和通道范围之间的相互依赖关系。这一层打破了这两个维度的独立性,并实现了特征之间的交互。实验结果表明,所提出的TCJA-SNN在标准静态和神经形态数据集上的准确率比SOTA高出多达15.7%,包括Fashion-MNIST、CIFAR10-DVS、N-Caltech 101和DVS128 Gesture。此外,我们将TCJA-SNN框架应用于图像生成任务,利用变分自动编码器。据我们所知,这项研究是首次在图像分类和生成任务中采用SNN-注意力机制。值得注意的是,我们的方法在两个领域中均取得了SOTA性能,为该领域的显著进步奠定了基础。代码可在https://github.com/ridgerchu/TCJA 上找到。
更新时间: 2024-04-17 17:36:19
领域: cs.CV,cs.AI
Spatial Context-based Self-Supervised Learning for Handwritten Text Recognition
Handwritten Text Recognition (HTR) is a relevant problem in computer vision, and implies unique challenges owing to its inherent variability and the rich contextualization required for its interpretation. Despite the success of Self-Supervised Learning (SSL) in computer vision, its application to HTR has been rather scattered, leaving key SSL methodologies unexplored. This work focuses on one of them, namely Spatial Context-based SSL. We investigate how this family of approaches can be adapted and optimized for HTR and propose new workflows that leverage the unique features of handwritten text. Our experiments demonstrate that the methods considered lead to advancements in the state-of-the-art of SSL for HTR in a number of benchmark cases.
Updated: 2024-04-17 17:33:32
标题: 基于空间背景的自监督学习用于手写文本识别
摘要: 手写文本识别(HTR)是计算机视觉中一个相关的问题,由于其固有的可变性和需要丰富的语境化解释,它意味着独特的挑战。尽管自监督学习(SSL)在计算机视觉中取得了成功,但其在HTR中的应用相对零散,使得关键SSL方法尚未被探索。本文专注于其中之一,即基于空间上下文的SSL。我们研究这类方法如何被调整和优化用于HTR,并提出利用手写文本的独特特征的新工作流程。我们的实验表明,所考虑的方法在一些基准案例中带来了HTR的SSL最新技术进展。
更新时间: 2024-04-17 17:33:32
领域: cs.AI
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey
This survey paper examines the recent advancements in AI agent implementations, with a focus on their ability to achieve complex goals that require enhanced reasoning, planning, and tool execution capabilities. The primary objectives of this work are to a) communicate the current capabilities and limitations of existing AI agent implementations, b) share insights gained from our observations of these systems in action, and c) suggest important considerations for future developments in AI agent design. We achieve this by providing overviews of single-agent and multi-agent architectures, identifying key patterns and divergences in design choices, and evaluating their overall impact on accomplishing a provided goal. Our contribution outlines key themes when selecting an agentic architecture, the impact of leadership on agent systems, agent communication styles, and key phases for planning, execution, and reflection that enable robust AI agent systems.
Updated: 2024-04-17 17:32:41
标题: 新兴人工智能代理架构在推理、规划和工具调用方面的景观:一项调查
摘要: 这份调查论文考察了人工智能代理实现的最新进展,重点关注它们实现复杂目标的能力,这些目标需要增强的推理、规划和工具执行能力。本文的主要目标是:a) 传达现有人工智能代理实现的当前能力和局限性,b) 分享我们在这些系统实际运行中获得的见解,c) 提出未来人工智能代理设计的重要考虑因素。我们通过概述单一代理和多代理体系结构,识别设计选择中的关键模式和分歧,并评估它们对实现给定目标的整体影响,实现了这些目标。我们的贡献概述了在选择代理体系结构时的关键主题,领导对代理系统的影响,代理通信风格,以及规划、执行和反思的关键阶段,这些阶段可以实现强大的人工智能代理系统。
更新时间: 2024-04-17 17:32:41
领域: cs.AI,cs.CL
LLMTune: Accelerate Database Knob Tuning with Large Language Models
Database knob tuning is a critical challenge in the database community, aiming to optimize knob values to enhance database performance for specific workloads. DBMS often feature hundreds of tunable knobs, posing a significant challenge for DBAs to recommend optimal configurations. Consequently, many machine learning-based tuning methods have been developed to automate this process. Despite the introduction of various optimizers, practical applications have unveiled a new problem: they typically require numerous workload runs to achieve satisfactory performance, a process that is both time-consuming and resource-intensive. This inefficiency largely stems from the optimal configuration often being substantially different from the default setting, necessitating multiple iterations during tuning. Recognizing this, we argue that an effective starting point could significantly reduce redundant exploration in less efficient areas, thereby potentially speeding up the tuning process for the optimizers. Based on this assumption, we introduce LLMTune, a large language model-based configuration generator designed to produce an initial, high-quality configuration for new workloads. These generated configurations can then serve as starting points for various base optimizers, accelerating their tuning processes. To obtain training data for LLMTune's supervised fine-tuning, we have devised a new automatic data generation framework capable of efficiently creating a large number of <workload, configuration> pairs. We have conducted thorough experiments to evaluate LLMTune's effectiveness with different workloads, such as TPC-H and JOB. In comparison to leading methods, LLMTune demonstrates a quicker ability to identify superior configurations. For instance, with the challenging TPC-H workload, our LLMTune achieves a significant 15.6x speed-up ratio in finding the best-performing configurations.
Updated: 2024-04-17 17:28:05
标题: LLMTune:利用大型语言模型加速数据库参数调整
摘要: 数据库旋钮调优是数据库社区面临的一个关键挑战,旨在优化旋钮值以增强特定工作负载的数据库性能。DBMS通常具有数百个可调旋钮,这给DBA推荐最佳配置带来了重大挑战。因此,许多基于机器学习的调优方法已被开发出来,以自动化这一过程。尽管引入了各种优化器,实际应用揭示了一个新问题:它们通常需要大量工作负载运行才能达到令人满意的性能,这个过程既耗时又消耗资源。这种低效性主要源于最佳配置通常与默认设置差异很大,需要在调优过程中进行多次迭代。鉴于此,我们认为一个有效的起点可以显著减少在效率较低的区域中的冗余探索,从而可能加快优化器的调优过程。基于这一假设,我们引入了LLMTune,一个基于大语言模型的配置生成器,旨在为新工作负载生成一个初始的高质量配置。这些生成的配置可以作为各种基础优化器的起点,加快它们的调优过程。为了为LLMTune的监督微调获取训练数据,我们设计了一个新的自动数据生成框架,能够高效地创建大量<工作负载、配置>对。我们进行了彻底的实验来评估LLMTune在不同工作负载下的有效性,如TPC-H和JOB。与领先方法相比,LLMTune表现出更快地识别出优越配置的能力。例如,在具有挑战性的TPC-H工作负载中,我们的LLMTune实现了寻找性能最佳的配置的显著加速比率达到15.6倍。
更新时间: 2024-04-17 17:28:05
领域: cs.AI,cs.DB
Deep Policy Optimization with Temporal Logic Constraints
Temporal logics, such as linear temporal logic (LTL), offer a precise means of specifying tasks for (deep) reinforcement learning (RL) agents. In our work, we consider the setting where the task is specified by an LTL objective and there is an additional scalar reward that we need to optimize. Previous works focus either on learning a LTL task-satisfying policy alone or are restricted to finite state spaces. We make two contributions: First, we introduce an RL-friendly approach to this setting by formulating this problem as a single optimization objective. Our formulation guarantees that an optimal policy will be reward-maximal from the set of policies that maximize the likelihood of satisfying the LTL specification. Second, we address a sparsity issue that often arises for LTL-guided Deep RL policies by introducing Cycle Experience Replay (CyclER), a technique that automatically guides RL agents towards the satisfaction of an LTL specification. Our experiments demonstrate the efficacy of CyclER in finding performant deep RL policies in both continuous and discrete experimental domains.
Updated: 2024-04-17 17:24:44
标题: 具有时间逻辑约束的深度策略优化
摘要: 时间逻辑,如线性时间逻辑(LTL),为(深度)强化学习(RL)代理提供了一种精确的指定任务的方式。在我们的工作中,我们考虑了任务由LTL目标指定的情况,并且还有一个需要优化的额外标量奖励。先前的工作要么专注于学习仅满足LTL任务的策略,要么限于有限状态空间。我们做出了两点贡献:首先,我们通过将这个问题制定为一个单一的优化目标,引入了一个对RL友好的方法。我们的制定确保最优策略将从最大化满足LTL规范概率的策略集中获得最大奖励。其次,我们通过引入Cycle Experience Replay(CyclER)来解决LTL引导的深度RL策略经常出现的稀疏问题,这是一种自动引导RL代理朝向满足LTL规范的技术。我们的实验表明,CyclER在连续和离散实验领域中找到高性能深度RL策略的有效性。
更新时间: 2024-04-17 17:24:44
领域: cs.LG,cs.AI,cs.FL
Towards Reliable Empirical Machine Unlearning Evaluation: A Game-Theoretic View
Machine unlearning is the process of updating machine learning models to remove the information of specific training data samples, in order to comply with data protection regulations that allow individuals to request the removal of their personal data. Despite the recent development of numerous unlearning algorithms, reliable evaluation of these algorithms remains an open research question. In this work, we focus on membership inference attack (MIA) based evaluation, one of the most common approaches for evaluating unlearning algorithms, and address various pitfalls of existing evaluation metrics that lack reliability. Specifically, we propose a game-theoretic framework that formalizes the evaluation process as a game between unlearning algorithms and MIA adversaries, measuring the data removal efficacy of unlearning algorithms by the capability of the MIA adversaries. Through careful design of the game, we demonstrate that the natural evaluation metric induced from the game enjoys provable guarantees that the existing evaluation metrics fail to satisfy. Furthermore, we propose a practical and efficient algorithm to estimate the evaluation metric induced from the game, and demonstrate its effectiveness through both theoretical analysis and empirical experiments. This work presents a novel and reliable approach to empirically evaluating unlearning algorithms, paving the way for the development of more effective unlearning techniques.
Updated: 2024-04-17 17:20:27
标题: 朝向可靠的经验机器去学习评估:博弈论视角
摘要: 机器遗忘是更新机器学习模型的过程,以删除特定训练数据样本的信息,以便遵守允许个人请求删除其个人数据的数据保护法规。尽管最近开发了许多遗忘算法,但这些算法的可靠评估仍然是一个未解决的研究问题。在这项工作中,我们专注于基于成员推理攻击(MIA)的评估,这是评估遗忘算法最常见的方法之一,并解决现有评估指标存在缺乏可靠性的各种问题。具体而言,我们提出了一个博弈论框架,将评估过程形式化为遗忘算法和MIA对手之间的博弈,通过MIA对手的能力来衡量遗忘算法的数据删除效果。通过精心设计博弈,我们证明了从博弈中产生的自然评估指标具有可证明的保证,而现有的评估指标无法满足。此外,我们提出了一种实用且高效的算法来估计从博弈中产生的评估指标,并通过理论分析和实证实验证明了其有效性。这项工作提出了一种新颖且可靠的方法来实证评估遗忘算法,为更有效的遗忘技术的发展铺平了道路。
更新时间: 2024-04-17 17:20:27
领域: cs.LG,cs.AI
Predicting Traffic Congestion at Urban Intersections Using Data-Driven Modeling
Traffic congestion at intersections is a significant issue in urban areas, leading to increased commute times, safety hazards, and operational inefficiencies. This study aims to develop a predictive model for congestion at intersections in major U.S. cities, utilizing a dataset of trip-logging metrics from commercial vehicles across 4,800 intersections. The dataset encompasses 27 features, including intersection coordinates, street names, time of day, and traffic metrics (Kashyap et al., 2019). Additional features, such as rainfall/snowfall percentage, distance from downtown and outskirts, and road types, were incorporated to enhance the model's predictive power. The methodology involves data exploration, feature transformation, and handling missing values through low-rank models and label encoding. The proposed model has the potential to assist city planners and governments in anticipating traffic hot spots, optimizing operations, and identifying infrastructure challenges.
Updated: 2024-04-17 17:20:04
标题: 利用数据驱动建模预测城市交叉口的交通拥堵
摘要: 交通拥堵是城市地区的一个重要问题,导致通勤时间增加、安全隐患以及运营效率低下。本研究旨在开发一个预测模型,用于预测美国主要城市交叉口的拥堵情况,利用了来自4800个交叉口的商用车辆的行程记录数据集。该数据集包含27个特征,包括交叉口坐标、街道名称、时间以及交通指标。为增强模型的预测能力,还加入了其他特征,如降雨/降雪百分比、距市中心和郊区的距离以及道路类型。方法包括数据探索、特征转换以及通过低秩模型和标签编码处理缺失值。该提出的模型有潜力帮助城市规划者和政府预测交通热点,优化运营,并识别基础设施挑战。
更新时间: 2024-04-17 17:20:04
领域: cs.LG
Decentralized Personalized Federated Learning for Min-Max Problems
Personalized Federated Learning (PFL) has witnessed remarkable advancements, enabling the development of innovative machine learning applications that preserve the privacy of training data. However, existing theoretical research in this field has primarily focused on distributed optimization for minimization problems. This paper is the first to study PFL for saddle point problems encompassing a broader range of optimization problems, that require more than just solving minimization problems. In this work, we consider a recently proposed PFL setting with the mixing objective function, an approach combining the learning of a global model together with locally distributed learners. Unlike most previous work, which considered only the centralized setting, we work in a more general and decentralized setup that allows us to design and analyze more practical and federated ways to connect devices to the network. We proposed new algorithms to address this problem and provide a theoretical analysis of the smooth (strongly) convex-(strongly) concave saddle point problems in stochastic and deterministic cases. Numerical experiments for bilinear problems and neural networks with adversarial noise demonstrate the effectiveness of the proposed methods.
Updated: 2024-04-17 17:16:55
标题: 去中心化个性化联邦学习用于极小化-极大化问题
摘要: 个性化联邦学习(PFL)已经取得了显著的进展,使得能够开发创新的机器学习应用程序,同时保护训练数据的隐私。然而,该领域现有的理论研究主要集中在分布式优化的最小化问题上。本文是第一篇研究涉及更广泛范围的优化问题的鞍点问题的PFL论文,这些问题需要不仅仅解决最小化问题。在这项工作中,我们考虑了一个最近提出的PFL设置,其中包含混合目标函数,这是一种将全局模型的学习与本地分布式学习者结合起来的方法。与大多数先前的研究不同,这些研究只考虑了中心化设置,我们在一个更一般和分散的设置中进行工作,这使我们能够设计和分析更实用和联邦化的方式将设备连接到网络。我们提出了新的算法来解决这个问题,并在随机和确定性情况下对光滑(强)凸-(强)凹鞍点问题进行了理论分析。通过双线性问题和具有对抗性噪声的神经网络的数值实验,证明了所提出方法的有效性。
更新时间: 2024-04-17 17:16:55
领域: cs.LG,cs.DC,math.OC
ML-Bench: Evaluating Large Language Models for Code Generation in Repository-Level Machine Learning Tasks
While Large Language Models (LLMs) have demonstrated proficiency in code generation benchmarks, translating these results into practical development scenarios - where leveraging existing repository-level libraries is the norm - remains challenging. To bridge the gap between lab-scale benchmarks and real-world coding practices, we introduce ML-Bench: a novel benchmark designed to assess LLMs' ability to integrate and utilize repository-level open-source libraries to complete machine learning tasks. ML-Bench comprises a diverse set of 9,641 samples across 169 distinct tasks derived from 18 GitHub repositories. Our findings reveal that while GPT-4 outshines other LLMs, it successfully addresses only 33.82% of the tasks, highlighting the complexity of the challenge. Complementarily, we introduce a baseline agent, ML-Agent, capable of skillful codebase navigation and precise generation of functional code segments. This groundwork aims at catalyzing the development of more sophisticated LLM agents that can handle the intricacies of real-world programming. Our code, data, and models are available at https://github.com/gersteinlab/ML-bench.
Updated: 2024-04-17 17:13:03
标题: ML-Bench:在存储库级机器学习任务中评估用于代码生成的大型语言模型
摘要: 尽管大型语言模型(LLMs)在代码生成基准测试中展示了出色的表现,但将这些结果转化为实际开发场景 - 在这些场景中,利用现有的存储库级别库是常态 - 仍然具有挑战性。为了弥合实验室规模基准测试和现实世界编码实践之间的差距,我们引入了ML-Bench:一个旨在评估LLMs整合和利用存储库级别开源库完成机器学习任务能力的新型基准测试。ML-Bench包括来自18个GitHub存储库的169个不同任务派生的9,641个样本的多样化集合。我们的研究结果显示,虽然GPT-4胜过其他LLMs,但它仅成功解决了33.82%的任务,突显了挑战的复杂性。此外,我们引入了一个基准代理ML-Agent,能够熟练导航代码库并精确生成功能代码段。这项基础工作旨在促进更复杂的LLM代理的开发,这些代理可以处理现实世界编程的复杂性。我们的代码、数据和模型可在https://github.com/gersteinlab/ML-bench上找到。
更新时间: 2024-04-17 17:13:03
领域: cs.CL,cs.AI
Generative Representational Instruction Tuning
All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB) and outperforms all models up to its size on a range of generative tasks. By scaling up further, GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models. Notably, we find that GRIT matches training on only generative or embedding data, thus we can unify both at no performance loss. Among other benefits, the unification via GRIT speeds up Retrieval-Augmented Generation (RAG) by > 60% for long documents, by no longer requiring separate retrieval and generation models. Models, code, etc. are freely available at https://github.com/ContextualAI/gritlm.
Updated: 2024-04-17 17:12:05
标题: 生成式表征指导调整
摘要: 所有基于文本的语言问题都可以归结为生成或嵌入。当前模型只能在其中一个方面表现良好。我们引入了生成表示指令调整(GRIT),通过该方法,一个大型语言模型被训练来处理生成和嵌入任务,通过指令来区分它们。与其他开放模型相比,我们的结果GritLM 7B在大型文本嵌入基准测试(MTEB)上创立了一个新的最先进技术,并在一系列生成任务中胜过所有同规模的模型。通过进一步扩大规模,GritLM 8x7B在所有尝试过的开放生成语言模型中表现最好,同时仍然是最好的嵌入模型之一。值得注意的是,我们发现GRIT匹配仅在生成或嵌入数据上训练,因此我们可以在没有性能损失的情况下统一两者。通过GRIT的统一,Retrieval-Augmented Generation(RAG)的速度提高了超过60%,不再需要单独的检索和生成模型。模型、代码等均可在https://github.com/ContextualAI/gritlm 上免费获取。
更新时间: 2024-04-17 17:12:05
领域: cs.CL,cs.AI,cs.LG
Simple Image Signal Processing using Global Context Guidance
In modern smartphone cameras, the Image Signal Processor (ISP) is the core element that converts the RAW readings from the sensor into perceptually pleasant RGB images for the end users. The ISP is typically proprietary and handcrafted and consists of several blocks such as white balance, color correction, and tone mapping. Deep learning-based ISPs aim to transform RAW images into DSLR-like RGB images using deep neural networks. However, most learned ISPs are trained using patches (small regions) due to computational limitations. Such methods lack global context, which limits their efficacy on full-resolution images and harms their ability to capture global properties such as color constancy or illumination. First, we propose a novel module that can be integrated into any neural ISP to capture the global context information from the full RAW images. Second, we propose an efficient and simple neural ISP that utilizes our proposed module. Our model achieves state-of-the-art results on different benchmarks using diverse and real smartphone images.
Updated: 2024-04-17 17:11:47
标题: 简单图像信号处理技术的全局上下文引导
摘要: 在现代智能手机相机中,图像信号处理器(ISP)是将传感器的原始读数转换为感知上宜人的RGB图像供最终用户使用的核心元素。ISP通常是专有的和手工制作的,由几个模块组成,如白平衡、颜色校正和色调映射。基于深度学习的ISP旨在使用深度神经网络将原始图像转换为类似单反相机的RGB图像。然而,大多数学习的ISP由于计算限制而使用补丁(小区域)进行训练。这种方法缺乏全局上下文,限制了它们在全分辨率图像上的有效性,并损害了它们捕捉全局属性(如色彩恒定性或照明)的能力。首先,我们提出了一个新颖的模块,可以集成到任何神经ISP中,从完整的原始图像中捕捉全局上下文信息。其次,我们提出了一种高效简单的神经ISP,利用我们提出的模块。我们的模型在使用不同和真实的智能手机图像的各种基准测试上取得了最先进的结果。
更新时间: 2024-04-17 17:11:47
领域: cs.CV,cs.LG,eess.IV
On the Scalability of GNNs for Molecular Graphs
Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have observed a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets, resulting in a 30.25% improvement when scaling to 1 billion parameters and 28.98% improvement when increasing size of dataset to eightfold. We further demonstrate strong finetuning scaling behavior on 38 tasks, outclassing previous large models. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.
Updated: 2024-04-17 17:11:31
标题: 关于分子图神经网络在可伸缩性方面的研究
摘要: 深度学习模型的扩展一直是最近语言建模和图像生成领域革命的核心。从实践者的观察中可以看出,模型大小、数据集大小和性能之间存在着很强的关系。然而,基于结构的架构,如图神经网络(GNNs)由于稀疏操作效率低、大量数据需求以及各种架构效果缺乏明确性,尚未展现出规模带来的好处。我们通过研究GNNs的扩展行为来解决这一缺点。具体而言,我们在最大的公开2D分子图集合上分析了消息传递网络、图变换器和混合架构。我们首次观察到,GNNs在深度、宽度、分子数量、标签数量以及预训练数据集的多样性增加时得到了巨大的收益,当参数规模扩展到10亿时,改进了30.25%,当数据集规模增加到八倍时,改进了28.98%。在38项任务上,我们进一步展示了强大的微调扩展行为,超越了以前的大型模型。我们希望我们的工作为基础性GNNs推动制药行业的药物发现开辟了道路。
更新时间: 2024-04-17 17:11:31
领域: cs.LG
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA distributes the generation workload between two attention pathways: a personalized branch and a non-personalized prior branch. MoA is designed to retain the original model's prior by fixing its attention layers in the prior branch, while minimally intervening in the generation process with the personalized branch that learns to embed subjects in the layout and context generated by the prior branch. A novel routing mechanism manages the distribution of pixels in each layer across these branches to optimize the blend of personalized and generic content creation. Once trained, MoA facilitates the creation of high-quality, personalized images featuring multiple subjects with compositions and interactions as diverse as those generated by the original model. Crucially, MoA enhances the distinction between the model's pre-existing capability and the newly augmented personalized intervention, thereby offering a more disentangled subject-context control that was previously unattainable. Project page: https://snap-research.github.io/mixture-of-attention
Updated: 2024-04-17 17:08:05
标题: MoA:混合注意力用于个性化图像生成中的主题-上下文解耦
摘要: 我们介绍了一种用于个性化文本到图像扩散模型的新架构,称为注意力混合(MoA)。受到大型语言模型(LLMs)中使用的专家混合机制的启发,MoA在两个注意力路径之间分配生成工作负载:一个个性化分支和一个非个性化先验分支。MoA旨在通过将其关注层固定在先验分支中来保留原始模型的先验,同时通过学习将主题嵌入先验分支生成的布局和上下文中的个性化分支,最小干预生成过程。一种新颖的路由机制管理每个层中像素在这些分支之间的分布,以优化个性化和通用内容创建的混合。一旦训练完成,MoA促进了创建高质量的个性化图像,其中包含多个主题,其构图和互动与原始模型生成的一样多样。至关重要的是,MoA增强了模型的现有能力和新增的个性化干预之间的区别,从而提供了一个以前无法实现的更分离的主题-上下文控制。项目页面:https://snap-research.github.io/mixture-of-attention
更新时间: 2024-04-17 17:08:05
领域: cs.CV,cs.AI,cs.GR
HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models
The ubiquitousness of social media has led to the need for reliable and efficient detection of offensive content to limit harmful effects. This has led to a proliferation of datasets and models related to detecting offensive content. While sophisticated models have attained strong performance on individual datasets, these models often do not generalize due to differences between how "offensive content" is conceptualized, and the resulting differences in how these datasets are labeled. In this paper, we introduce HateCOT, a dataset of 52,000 samples drawn from diverse existing sources with explanations generated by GPT-3.5-Turbo and human-curated. We show that pre-training models for the detection of offensive content on HateCOT significantly boots open-sourced Language Models on three benchmark datasets in both zero and few-shot settings, despite differences in domain and task.} We further find that HateCOT enables effective K-shot fine-tuning in the low-resource settings.
Updated: 2024-04-17 16:59:35
标题: HateCOT:通过大型语言模型进行可泛化攻击性言论检测的解释增强数据集
摘要: 社交媒体的无处不在导致了对可靠和高效检测攻击性内容的需求,以限制其有害影响。这导致了涉及检测攻击性内容的数据集和模型的大量产生。虽然复杂的模型在单个数据集上表现出色,但由于对“攻击性内容”概念的差异,以及这些数据集标记方式的不同,这些模型通常无法泛化。在本文中,我们介绍了HateCOT,这是一个包含来自各种现有来源的52,000个样本的数据集,其解释由GPT-3.5-Turbo和人工策划生成。我们展示了在HateCOT上为检测攻击性内容进行预训练模型显著提升了在零和小样本场景下的开源语言模型在三个基准数据集上的性能,尽管在领域和任务上存在差异。我们进一步发现,HateCOT在低资源环境中实现了有效的K-shot微调。
更新时间: 2024-04-17 16:59:35
领域: cs.CL,cs.AI,cs.SI
mEdIT: Multilingual Text Editing via Instruction Tuning
We introduce mEdIT, a multi-lingual extension to CoEdIT -- the recent state-of-the-art text editing models for writing assistance. mEdIT models are trained by fine-tuning multi-lingual large, pre-trained language models (LLMs) via instruction tuning. They are designed to take instructions from the user specifying the attributes of the desired text in the form of natural language instructions, such as Grammatik korrigieren (German) or Parafrasee la oraci\'on (Spanish). We build mEdIT by curating data from multiple publicly available human-annotated text editing datasets for three text editing tasks (Grammatical Error Correction (GEC), Text Simplification, and Paraphrasing) across diverse languages belonging to six different language families. We detail the design and training of mEdIT models and demonstrate their strong performance on many multi-lingual text editing benchmarks against other multilingual LLMs. We also find that mEdIT generalizes effectively to new languages over multilingual baselines. We publicly release our data, code, and trained models at https://github.com/vipulraheja/medit.
Updated: 2024-04-17 16:59:30
标题: mEdIT: 通过指导调整实现多语言文本编辑
摘要: 我们介绍了mEdIT,这是CoEdIT的多语言扩展——这是最近的最先进的文本编辑模型,用于写作辅助。mEdIT模型通过调整多语言大型、预训练的语言模型(LLMs)进行训练,通过指导调整。它们旨在接受用户的指令,用户可以用自然语言指令的形式指定所需文本的属性,例如Grammatik korrigieren (德语)或Parafrasee la oración (西班牙语)。我们通过整理来自多个公开可用的人工标注文本编辑数据集的数据来构建mEdIT,用于三个文本编辑任务(语法错误纠正(GEC)、文本简化和改写),涵盖六种不同语系的多种语言。我们详细描述了mEdIT模型的设计和训练,并展示了它们在许多多语言文本编辑基准测试中与其他多语言LLM模型的强大性能。我们还发现mEdIT在多语言基线上有效地泛化到新语言。我们在https://github.com/vipulraheja/medit上公开发布我们的数据、代码和训练模型。
更新时间: 2024-04-17 16:59:30
领域: cs.CL,cs.AI,I.2.7
Information theory for data-driven model reduction in physics and biology
Model reduction is the construction of simple yet predictive descriptions of the dynamics of many-body systems in terms of a few relevant variables. A prerequisite to model reduction is the identification of these relevant variables, a task for which no general method exists. Here, we develop a systematic approach based on the information bottleneck to identify the relevant variables, defined as those most predictive of the future. We elucidate analytically the relation between these relevant variables and the eigenfunctions of the transfer operator describing the dynamics. Further, we show that in the limit of high compression, the relevant variables are directly determined by the slowest-decaying eigenfunctions. Our information-based approach indicates when to optimally stop increasing the complexity of the reduced model. Furthermore, it provides a firm foundation to construct interpretable deep learning tools that perform model reduction. We illustrate how these tools work in practice by considering uncurated videos of atmospheric flows from which our algorithms automatically extract the dominant slow collective variables, as well as experimental videos of cyanobacteria colonies in which we discover an emergent synchronization order parameter.
Updated: 2024-04-17 16:58:36
标题: 物理学和生物学中数据驱动模型简化的信息论
摘要: 模型简化是在少数相关变量的基础上构建简单且具有预测性的多体系统动态描述的过程。模型简化的前提是识别这些相关变量,这是一个没有通用方法的任务。在这里,我们基于信息瓶颈提出了一种系统方法来识别相关变量,这些相关变量被定义为对未来最具预测性的变量。我们通过分析阐明了这些相关变量与描述动态的传输算子的特征函数之间的关系。此外,我们展示了在高压缩极限下,相关变量直接由衰减最慢的特征函数确定。我们的基于信息的方法指示了何时最佳地停止增加简化模型的复杂性。此外,它为构建可解释的深度学习工具提供了坚实基础,这些工具可以执行模型简化。我们通过考虑未经筛选的大气流视频以及我们的算法自动提取主要缓慢集体变量,以及实验视频中的蓝藻菌落,发现新兴的同步序参数,以说明这些工具如何在实践中发挥作用。
更新时间: 2024-04-17 16:58:36
领域: cond-mat.stat-mech,cs.IT,cs.LG,math.IT
Quantifying Multilingual Performance of Large Language Models Across Languages
The training process of Large Language Models (LLMs) requires extensive text corpus. However, these data are often unevenly distributed in different languages. As a result, LLMs perform well on common languages, such as English, German, and French, but perform poorly on low-resource languages. However, currently there is no work to quantitatively measure the performance of LLMs in low-resource languages. To fill this gap, we proposed the Language Ranker that aims to benchmark and rank different languages according to the performance of LLMs on those languages. We employ the LLM's performance on the English corpus as a baseline to compare the performances of different languages and English. We have the following three findings: 1. The performance rankings of different LLMs in all languages are roughly the same. 2. LLMs with different sizes have the same partial order of performance. 3. There is a strong correlation between LlaMa2's performance in different languages and the proportion of the pre-training corpus. These findings illustrate that the Language Ranker can be used as an indicator to measure the language performance of LLMs.
Updated: 2024-04-17 16:53:16
标题: 量化大型语言模型在不同语言之间的多语言表现
摘要: 大型语言模型(LLMs)的训练过程需要大量的文本语料库。然而,这些数据在不同语言中往往分布不均。因此,LLMs在常见语言(如英语、德语和法语)上表现良好,但在资源匮乏的语言上表现不佳。然而,目前还没有工作来定量衡量LLMs在资源匮乏的语言中的表现。为了填补这一空白,我们提出了Language Ranker,旨在基于LLMs在不同语言上的表现来对不同语言进行基准测试和排名。我们将LLM在英语语料库上的表现作为基准,来比较不同语言和英语的表现。我们有以下三个发现:1. 所有语言中不同LLMs的表现排名大致相同。2. 不同大小的LLMs具有相同的部分性能顺序。3. LlaMa2在不同语言中的表现与预训练语料库的比例之间存在很强的相关性。这些发现说明Language Ranker可以作为衡量LLMs语言表现的指标。
更新时间: 2024-04-17 16:53:16
领域: cs.CL,cs.AI,cs.LG
SDIP: Self-Reinforcement Deep Image Prior Framework for Image Processing
Deep image prior (DIP) proposed in recent research has revealed the inherent trait of convolutional neural networks (CNN) for capturing substantial low-level image statistics priors. This framework efficiently addresses the inverse problems in image processing and has induced extensive applications in various domains. However, as the whole algorithm is initialized randomly, the DIP algorithm often lacks stability. Thus, this method still has space for further improvement. In this paper, we propose the self-reinforcement deep image prior (SDIP) as an improved version of the original DIP. We observed that the changes in the DIP networks' input and output are highly correlated during each iteration. SDIP efficiently utilizes this trait in a reinforcement learning manner, where the current iteration's output is utilized by a steering algorithm to update the network input for the next iteration, guiding the algorithm toward improved results. Experimental results across multiple applications demonstrate that our proposed SDIP framework offers improvement compared to the original DIP method and other state-of-the-art methods.
Updated: 2024-04-17 16:50:14
标题: SDIP:自我强化深度图像先验框架用于图像处理
摘要: 最近的研究提出的深度图像先验(DIP)揭示了卷积神经网络(CNN)捕获重要低级图像统计先验的固有特性。这个框架有效地解决了图像处理中的逆问题,并在各个领域引发了广泛的应用。然而,由于整个算法是随机初始化的,DIP算法经常缺乏稳定性。因此,这种方法仍有进一步改进的空间。在本文中,我们提出了自我强化深度图像先验(SDIP)作为原始DIP的改进版本。我们观察到DIP网络输入和输出在每次迭代中高度相关的变化。SDIP以一种强化学习的方式有效地利用了这一特性,其中当前迭代的输出由一个控制算法利用来更新网络输入以供下一次迭代使用,引导算法朝着更好的结果发展。跨多个应用的实验结果表明,我们提出的SDIP框架相比原始DIP方法和其他最先进方法具有改进。
更新时间: 2024-04-17 16:50:14
领域: cs.CV,cs.LG,eess.IV
Graph Set-colorings And Hypergraphs In Topological Coding
In order to make more complex number-based strings from topological coding for defending against the intelligent attacks equipped with quantum computing and providing effective protection technology for the age of quantum computing, we will introduce set-colored graphs admitting set-colorings that has been considerable cryptanalytic significance, and especially related with hypergraphs. We use the set-coloring of graphs to reflect the intersection of elements, and add other constraint requirements to express more connections between sets (as hyperedges). Since we try to find some easy and effective techniques based on graph theory for practical application, we use intersected-graphs admitting set-colorings defined on hyperedge sets to observe topological structures of hypergraphs, string-type Topcode-matrix, set-type Topcode-matrix, graph-type Topcode-matrix, hypergraph-type Topcode-matrix, matrix-type Topcode-matrix \emph{etc}. We will show that each connected graph is the intersected-graph of some hypergraph and investigate hypergraph's connectivity, colorings of hypergraphs, hypergraph homomorphism, hypernetworks, scale-free network generator, compound hypergraphs having their intersected-graphs with vertices to be hypergraphs (for high-dimensional extension diagram). Naturally, we get various graphic lattices, such as edge-coincided intersected-graph lattice, vertex-coincided intersected-graph lattice, edge-hamiltonian graphic lattice, hypergraph lattice and intersected-network lattice. Many techniques in this article can be translated into polynomial algorithms, since we are aiming to apply hypergraphs and graph set-colorings to homomorphic encryption and asymmetric cryptograph.
Updated: 2024-04-17 16:42:04
标题: 图集着色和拓扑编码中的超图
摘要: 为了抵御配备量子计算的智能攻击,并为量子计算时代提供有效的保护技术,我们将引入允许具有相当密钥分析意义的集合着色图,并特别与超图相关的集合着色图。我们使用图的集合着色来反映元素的交集,并添加其他约束要求来表达集合之间更多的连接(作为超边)。由于我们试图找到一些基于图论的易于有效的技术用于实际应用,我们使用定义在超边集上的集合着色的交集图来观察超图的拓扑结构,字符串类型的Topcode矩阵,集合类型的Topcode矩阵,图类型的Topcode矩阵,超图类型的Topcode矩阵,矩阵类型的Topcode矩阵等。我们将展示每个连通图都是某个超图的交集图,并研究超图的连通性,超图的着色,超图同态,超网络,无标度网络生成器,具有交集图的复合超图,其顶点为超图(用于高维扩展图表)。自然地,我们得到各种图形格,如边重合的交集图格,顶点重合的交集图格,边哈密顿图格,超图格和交集网络格。本文中许多技术可以转化为多项式算法,因为我们旨在将超图和图集合着色应用于同态加密和非对称密码学。
更新时间: 2024-04-17 16:42:04
领域: cs.CR,cs.IT,math.IT
Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models
In a mixed generalized linear model, the objective is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two statistically independent signals in a mixed generalized linear model with Gaussian covariates. Spectral methods are a popular class of estimators which output the top two eigenvectors of a suitable data-dependent matrix. However, despite the wide applicability, their design is still obtained via heuristic considerations, and the number of samples $n$ needed to guarantee recovery is super-linear in the signal dimension $d$. In this paper, we develop exact asymptotics on spectral methods in the challenging proportional regime in which $n, d$ grow large and their ratio converges to a finite constant. By doing so, we are able to optimize the design of the spectral method, and combine it with a simple linear estimator, in order to minimize the estimation error. Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms. Numerical simulations for mixed linear regression and phase retrieval demonstrate the advantage enabled by our analysis over existing designs of spectral methods.
Updated: 2024-04-17 16:36:36
标题: 混合广义线性模型中谱方法的精确渐近性
摘要: 在混合广义线性模型中,目标是从未标记的观测中学习多个信号:每个样本来自于确切的一个信号,但不知道是哪一个。我们考虑在具有高斯协变量的混合广义线性模型中估计两个统计独立信号的原型问题。谱方法是一种流行的估计器类别,它输出适当的数据相关矩阵的前两个特征向量。然而,尽管其广泛适用性,其设计仍然是通过启发式考虑获得的,并且需要保证恢复的样本数量$n$在信号维度$d$上是超线性的。在本文中,我们在具有挑战性的比例区间中对谱方法进行精确渐近分析,在该区间中$n,d$增长并且它们的比率收敛到一个有限常数。通过这样做,我们能够优化谱方法的设计,并将其与简单的线性估计器结合起来,以最小化估计误差。我们的表征利用了来自随机矩阵,自由概率和近似消息传递算法理论的多种工具。对混合线性回归和相位恢复的数值模拟显示了我们的分析相对于现有谱方法设计的优势。
更新时间: 2024-04-17 16:36:36
领域: math.ST,cs.IT,cs.LG,math.IT,stat.ML,stat.TH
GenFighter: A Generative and Evolutive Textual Attack Removal
Adversarial attacks pose significant challenges to deep neural networks (DNNs) such as Transformer models in natural language processing (NLP). This paper introduces a novel defense strategy, called GenFighter, which enhances adversarial robustness by learning and reasoning on the training classification distribution. GenFighter identifies potentially malicious instances deviating from the distribution, transforms them into semantically equivalent instances aligned with the training data, and employs ensemble techniques for a unified and robust response. By conducting extensive experiments, we show that GenFighter outperforms state-of-the-art defenses in accuracy under attack and attack success rate metrics. Additionally, it requires a high number of queries per attack, making the attack more challenging in real scenarios. The ablation study shows that our approach integrates transfer learning, a generative/evolutive procedure, and an ensemble method, providing an effective defense against NLP adversarial attacks.
Updated: 2024-04-17 16:32:13
标题: GenFighter:一种生成和演化的文本攻击消除方法
摘要: 对深度神经网络(DNNs)如变压器模型在自然语言处理(NLP)中存在的对抗性攻击提出了重大挑战。本文引入了一种新颖的防御策略,称为GenFighter,通过学习和推理训练分类分布来增强对抗性鲁棒性。GenFighter识别潜在的恶意实例偏离分布,将其转换为与训练数据对齐的语义等价实例,并采用集成技术实现统一和稳健的响应。通过进行大量实验,我们展示了GenFighter在受攻击时的准确性和攻击成功率指标方面胜过最先进的防御措施。此外,它需要高数量的查询来进行攻击,使得在实际情况下攻击变得更具挑战性。消融实验表明,我们的方法整合了迁移学习、生成/进化过程和集成方法,提供了一种有效的防御NLP对抗性攻击的方法。
更新时间: 2024-04-17 16:32:13
领域: cs.LG,cs.CL
DiscDiff: Latent Diffusion Model for DNA Sequence Generation
This paper introduces a novel framework for DNA sequence generation, comprising two key components: DiscDiff, a Latent Diffusion Model (LDM) tailored for generating discrete DNA sequences, and Absorb-Escape, a post-training algorithm designed to refine these sequences. Absorb-Escape enhances the realism of the generated sequences by correcting `round errors' inherent in the conversion process between latent and input spaces. Our approach not only sets new standards in DNA sequence generation but also demonstrates superior performance over existing diffusion models, in generating both short and long DNA sequences. Additionally, we introduce EPD-GenDNA, the first comprehensive, multi-species dataset for DNA generation, encompassing 160,000 unique sequences from 15 species. We hope this study will advance the generative modelling of DNA, with potential implications for gene therapy and protein production.
Updated: 2024-04-17 16:31:33
标题: DiscDiff:DNA序列生成的潜在扩散模型
摘要: 本文介绍了一个新颖的DNA序列生成框架,包括两个关键组件:DiscDiff,一种专为生成离散DNA序列而设计的潜在扩散模型(LDM),以及Absorb-Escape,一种后训练算法,旨在改进这些序列。Absorb-Escape通过纠正在潜在空间和输入空间之间的转换过程中固有的“四舍五入误差”,增强了生成序列的逼真感。我们的方法不仅在DNA序列生成方面树立了新的标准,而且在生成短序列和长序列时表现出优越性能。此外,我们介绍了EPD-GenDNA,这是第一个涵盖15个物种的DNA生成的综合性多物种数据集,包含来自15个物种的160,000个独特序列。我们希望这项研究将推动DNA的生成建模,可能对基因治疗和蛋白质生产产生影响。
更新时间: 2024-04-17 16:31:33
领域: q-bio.GN,cs.AI,cs.LG
FedPFT: Federated Proxy Fine-Tuning of Foundation Models
Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. In this paper, we propose Federated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules. First, the sub-FM construction module employs a layer-wise compression approach, facilitating comprehensive FM fine-tuning across all layers by emphasizing those crucial neurons. Second, the sub-FM alignment module conducts a two-step distillations-layer-level and neuron-level-before and during FL fine-tuning respectively, to reduce error of gradient by accurately aligning sub-FM with FM under theoretical guarantees. Experimental results on seven commonly used datasets (i.e., four text and three vision) demonstrate the superiority of FedPFT.
Updated: 2024-04-17 16:30:06
标题: FedPFT:基于联邦代理的基础模型微调
摘要: 将基础模型(FMs)通过联邦学习(FL)调整为下游任务是一种保护数据隐私和有价值的FMs的有前途的策略。现有方法通过在FL中将子-FM分配给客户来微调FM,然而,由于调整不足和梯度的不可避免的错误积累,导致性能不佳。在本文中,我们提出了一种新颖的方法Federated Proxy Fine-Tuning(FedPFT),通过两个关键模块增强FMs在下游任务中通过FL的适应性。首先,子-FM构建模块采用逐层压缩方法,通过强调那些关键神经元,促进对所有层的FM进行全面微调。其次,子-FM对齐模块在FL微调过程中分别进行两步蒸馏-层级和神经元级别-以准确对齐子-FM与FM,以减少梯度误差。对七个常用数据集(即四个文本和三个视觉)的实验结果证明了FedPFT的优越性。
更新时间: 2024-04-17 16:30:06
领域: cs.LG,cs.AI
Decomposing and Editing Predictions by Modeling Model Computation
How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question. The goal of component modeling is to decompose an ML model's prediction in terms of its components -- simple functions (e.g., convolution filters, attention heads) that are the "building blocks" of model computation. We focus on a special case of this task, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions; we demonstrate its effectiveness across models, datasets, and modalities. Finally, we show that component attributions estimated with COAR directly enable model editing across five tasks, namely: fixing model errors, ``forgetting'' specific classes, boosting subpopulation robustness, localizing backdoor attacks, and improving robustness to typographic attacks. We provide code for COAR at https://github.com/MadryLab/modelcomponents .
Updated: 2024-04-17 16:28:08
标题: 分解和编辑预测通过建模模型计算
摘要: 机器学习模型内部计算如何将输入转化为预测?本文介绍了一项名为组件建模的任务,旨在解决这个问题。组件建模的目标是将ML模型的预测分解为其组件 - 简单函数(例如,卷积滤波器,注意头),这些函数是模型计算的“构建模块”。我们专注于这个任务的一个特例,即组件归因,其目标是估计个别组件对给定预测的反事实影响。然后,我们提出了COAR,一个可扩展的算法,用于估计组件归因;我们展示了它在模型、数据集和模态之间的有效性。最后,我们展示了使用COAR直接实现的组件归因在五个任务中启用模型编辑,即:修复模型错误,``遗忘''特定类别,增强子群体鲁棒性,定位后门攻击,以及提高对排印攻击的鲁棒性。我们在https://github.com/MadryLab/modelcomponents提供了COAR的代码。
更新时间: 2024-04-17 16:28:08
领域: cs.LG,cs.AI,stat.ML
What are human values, and how do we align AI to them?
There is an emerging consensus that we need to align AI systems with human values (Gabriel, 2020; Ji et al., 2024), but it remains unclear how to apply this to language models in practice. We split the problem of "aligning to human values" into three parts: first, eliciting values from people; second, reconciling those values into an alignment target for training ML models; and third, actually training the model. In this paper, we focus on the first two parts, and ask the question: what are "good" ways to synthesize diverse human inputs about values into a target for aligning language models? To answer this question, we first define a set of 6 criteria that we believe must be satisfied for an alignment target to shape model behavior in accordance with human values. We then propose a process for eliciting and reconciling values called Moral Graph Elicitation (MGE), which uses a large language model to interview participants about their values in particular contexts; our approach is inspired by the philosophy of values advanced by Taylor (1977), Chang (2004), and others. We trial MGE with a representative sample of 500 Americans, on 3 intentionally divisive prompts (e.g. advice about abortion). Our results demonstrate that MGE is promising for improving model alignment across all 6 criteria. For example, almost all participants (89.1%) felt well represented by the process, and (89%) thought the final moral graph was fair, even if their value wasn't voted as the wisest. Our process often results in "expert" values (e.g. values from women who have solicited abortion advice) rising to the top of the moral graph, without defining who is considered an expert in advance.
Updated: 2024-04-17 16:27:37
标题: 人类价值观是什么,我们如何使人工智能与之保持一致?
摘要: 有一个新兴的共识认为我们需要将人工智能系统与人类价值观保持一致(Gabriel, 2020; Ji等,2024),但如何将这一点应用到语言模型实践中仍不清楚。我们将“与人类价值观保持一致”的问题分为三个部分:首先,引导人们的价值观;其次,将这些价值观调和为训练机器学习模型的对齐目标;第三,实际训练模型。在本文中,我们专注于前两部分,并提出一个问题:如何将关于价值观的多样化人类输入综合成一个对齐语言模型的目标的“好”方法是什么?为了回答这个问题,我们首先定义了一组我们认为必须满足的6个标准,以便对齐目标能够按照人类价值观塑造模型行为。然后,我们提出了一种引导和调和价值观的过程,称为道德图引导(MGE),该过程利用一个大型语言模型在特定情境下对参与者进行价值观采访;我们的方法受到Taylor(1977)、Chang(2004)等人提出的价值观哲学的启发。我们在代表性的500名美国人中试行了MGE,针对3个故意具有分裂性的提示(例如关于堕胎的建议)。我们的结果表明,MGE对改善模型在所有6个标准上的对齐是有希望的。例如,几乎所有参与者(89.1%)认为自己在这个过程中得到了很好的代表,(89%)认为最终的道德图是公平的,即使他们的价值观没有被选为最明智的。我们的过程经常导致“专家”价值观(例如向有关堕胎建议的妇女征求意见)上升到道德图的顶部,而不预先定义谁被认为是专家。
更新时间: 2024-04-17 16:27:37
领域: cs.CY,cs.AI,cs.CL,cs.HC,cs.LG
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities. Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adapt the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large models to adapt it to a specific task while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design. In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to the algorithmic perspective, we overview various real-world system designs to investigate the implementation costs associated with different PEFT algorithms. This survey serves as an indispensable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed insights into recent advancements and practical applications.
Updated: 2024-04-17 16:23:47
标题: 大型模型的参数高效调整:一项综合调查
摘要: 大型模型代表了多个应用领域的突破性进展,使得在各种任务中取得了显著的成就。然而,它们前所未有的规模带来了巨大的计算成本。这些模型通常包含数十亿个参数,需要大量的计算资源来执行。特别是,在将它们定制为特定下游任务时,这种巨大的规模和计算需求会带来相当大的挑战,尤其是在受计算性能限制的硬件平台上。参数高效微调(PEFT)通过有效地调整大型模型以适应各种下游任务,提供了一个实际的解决方案。具体来说,PEFT是指调整预训练大型模型的参数以适应特定任务,同时最小化引入的额外参数数量或所需的计算资源。在处理参数数量庞大的大型语言模型时,这种方法尤为重要,因为从头开始微调这些模型可能会耗费大量的计算资源,对支撑系统平台设计构成相当大的挑战。在本调查中,我们对各种PEFT算法进行了全面研究,考察它们的性能和计算开销。此外,我们概述了使用不同PEFT算法开发的应用程序,并讨论了用于减轻PEFT计算成本的常用技术。除了算法的角度,我们还概述了各种现实世界的系统设计,以探讨与不同PEFT算法相关的实施成本。这项调查是研究人员理解PEFT算法及其系统实现的不可或缺的资源,提供了对最近进展和实际应用的详细见解。
更新时间: 2024-04-17 16:23:47
领域: cs.LG
Landmark Guided Active Exploration with State-specific Balance Coefficient
Goal-conditioned hierarchical reinforcement learning (GCHRL) decomposes long-horizon tasks into sub-tasks through a hierarchical framework and it has demonstrated promising results across a variety of domains. However, the high-level policy's action space is often excessively large, presenting a significant challenge to effective exploration and resulting in potentially inefficient training. In this paper, we design a measure of prospect for sub-goals by planning in the goal space based on the goal-conditioned value function. Building upon the measure of prospect, we propose a landmark-guided exploration strategy by integrating the measures of prospect and novelty which aims to guide the agent to explore efficiently and improve sample efficiency. In order to dynamically consider the impact of prospect and novelty on exploration, we introduce a state-specific balance coefficient to balance the significance of prospect and novelty. The experimental results demonstrate that our proposed exploration strategy significantly outperforms the baseline methods across multiple tasks.
Updated: 2024-04-17 16:19:48
标题: Landmark引导的具有特定状态平衡系数的主动探索
摘要: 目标条件的分层强化学习(GCHRL)通过分层框架将长期任务分解为子任务,并在各个领域展现了有前途的结果。然而,高级策略的行动空间通常过大,给有效探索带来了重大挑战,可能导致训练效率低下。在本文中,我们设计了一种基于目标条件值函数在目标空间中进行规划的子目标前景度量。基于前景度量,我们提出了一种以地标为指导的探索策略,通过整合前景度量和新颖性度量来引导智能体进行高效探索,提高样本效率。为了动态考虑前景和新颖性对探索的影响,我们引入了一个状态特定的平衡系数来平衡前景和新颖性的重要性。实验结果表明,我们提出的探索策略在多个任务上明显优于基准方法。
更新时间: 2024-04-17 16:19:48
领域: cs.LG
StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning
Graph contrastive learning (GCL) has become a powerful tool for learning graph data, but its scalability remains a significant challenge. In this work, we propose a simple yet effective training framework called Structural Compression (StructComp) to address this issue. Inspired by a sparse low-rank approximation on the diffusion matrix, StructComp trains the encoder with the compressed nodes. This allows the encoder not to perform any message passing during the training stage, and significantly reduces the number of sample pairs in the contrastive loss. We theoretically prove that the original GCL loss can be approximated with the contrastive loss computed by StructComp. Moreover, StructComp can be regarded as an additional regularization term for GCL models, resulting in a more robust encoder. Empirical studies on various datasets show that StructComp greatly reduces the time and memory consumption while improving model performance compared to the vanilla GCL models and scalable training methods.
Updated: 2024-04-17 16:19:29
标题: StructComp:在训练图对比学习中用结构压缩代替传播
摘要: 图对比学习(GCL)已经成为学习图数据的强大工具,但其可扩展性仍然是一个重要挑战。在这项工作中,我们提出了一个简单而有效的训练框架,称为结构压缩(StructComp),以解决这个问题。受扩散矩阵上的稀疏低秩逼近的启发,StructComp使用压缩节点对编码器进行训练。这使得编码器在训练阶段不需要执行任何消息传递,并显著减少了对比损失中的样本对数量。我们从理论上证明了原始的GCL损失可以用StructComp计算的对比损失来近似。此外,StructComp可以被看作是GCL模型的一个额外正则化项,从而产生一个更健壮的编码器。对各种数据集的实证研究表明,与普通的GCL模型和可扩展训练方法相比,StructComp大大减少了时间和内存消耗,同时提高了模型性能。
更新时间: 2024-04-17 16:19:29
领域: cs.LG
PAC Privacy Preserving Diffusion Models
Data privacy protection is garnering increased attention among researchers. Diffusion models (DMs), particularly with strict differential privacy, can potentially produce images with both high privacy and visual quality. However, challenges arise such as in ensuring robust protection in privatizing specific data attributes, areas where current models often fall short. To address these challenges, we introduce the PAC Privacy Preserving Diffusion Model, a model leverages diffusion principles and ensure Probably Approximately Correct (PAC) privacy. We enhance privacy protection by integrating a private classifier guidance into the Langevin Sampling Process. Additionally, recognizing the gap in measuring the privacy of models, we have developed a novel metric to gauge privacy levels. Our model, assessed with this new metric and supported by Gaussian matrix computations for the PAC bound, has shown superior performance in privacy protection over existing leading private generative models according to benchmark tests.
Updated: 2024-04-17 16:18:54
标题: PAC隐私保护扩散模型
摘要: 数据隐私保护正日益受到研究者的关注。扩散模型(DMs),特别是严格的差分隐私,有可能产生具有高隐私和视觉质量的图像。然而,挑战也随之而来,比如在确保对特定数据属性进行有效保护方面,目前的模型经常存在不足之处。为了解决这些挑战,我们引入了PAC隐私保护扩散模型,该模型利用扩散原则并确保可能近似正确(PAC)隐私。我们通过将私有分类器指导集成到Langevin采样过程中来增强隐私保护。此外,鉴于目前衡量模型隐私性的差距,我们开发了一种新的度量标准来评估隐私水平。我们的模型经过这一新度量标准评估,并通过高斯矩阵计算支持PAC界限,根据基准测试表现出了优越的隐私保护性能,超过了现有领先的私有生成模型。
更新时间: 2024-04-17 16:18:54
领域: cs.LG,cs.AI
A Comparison of Traditional and Deep Learning Methods for Parameter Estimation of the Ornstein-Uhlenbeck Process
We consider the Ornstein-Uhlenbeck (OU) process, a stochastic process widely used in finance, physics, and biology. Parameter estimation of the OU process is a challenging problem. Thus, we review traditional tracking methods and compare them with novel applications of deep learning to estimate the parameters of the OU process. We use a multi-layer perceptron to estimate the parameters of the OU process and compare its performance with traditional parameter estimation methods, such as the Kalman filter and maximum likelihood estimation. We find that the multi-layer perceptron can accurately estimate the parameters of the OU process given a large dataset of observed trajectories; however, traditional parameter estimation methods may be more suitable for smaller datasets.
Updated: 2024-04-17 16:16:50
标题: 一种传统方法和深度学习方法在Ornstein-Uhlenbeck过程参数估计中的比较
摘要: 我们考虑奥恩斯坦-乌伦贝克(OU)过程,这是一种在金融、物理学和生物学中广泛使用的随机过程。OU过程的参数估计是一个具有挑战性的问题。因此,我们回顾了传统的跟踪方法,并将它们与深度学习在估计OU过程参数方面的新应用进行比较。我们使用多层感知器来估计OU过程的参数,并将其性能与传统的参数估计方法(如卡尔曼滤波器和最大似然估计)进行比较。我们发现,多层感知器可以在给定大量观测轨迹数据集的情况下准确估计OU过程的参数;然而,传统的参数估计方法可能更适合较小的数据集。
更新时间: 2024-04-17 16:16:50
领域: q-fin.CP,cs.LG
Provable Reward-Agnostic Preference-Based Reinforcement Learning
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals. While PbRL has demonstrated practical success in fine-tuning language models, existing theoretical work focuses on regret minimization and fails to capture most of the practical frameworks. In this study, we fill in such a gap between theoretical PbRL and practical algorithms by proposing a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired before collecting any human feedback. Theoretical analysis demonstrates that our algorithm requires less human feedback for learning the optimal policy under preference-based models with linear parameterization and unknown transitions, compared to the existing theoretical literature. Specifically, our framework can incorporate linear and low-rank MDPs with efficient sample complexity. Additionally, we investigate reward-agnostic RL with action-based comparison feedback and introduce an efficient querying algorithm tailored to this scenario.
Updated: 2024-04-17 16:13:54
标题: 可证明的奖励不可知偏好驱动增强学习
摘要: Preference-based Reinforcement Learning (PbRL)是一种范式,其中RL代理通过轨迹上的成对偏好反馈来学习优化任务,而不是明确的奖励信号。尽管PbRL在微调语言模型方面取得了实际成功,但现有的理论工作主要关注后悔最小化,并未涵盖大部分实际框架。在本研究中,我们填补了理论PbRL和实际算法之间的这一差距,提出了一个理论奖励无关的PbRL框架,在收集任何人类反馈之前获得能够准确学习隐藏奖励函数的探索性轨迹。理论分析表明,与现有的理论文献相比,我们的算法在基于偏好的模型下学习最优策略时需要更少的人类反馈,且参数化为线性且转移未知。具体而言,我们的框架可以整合具有高效样本复杂度的线性和低秩MDPs。此外,我们还研究了具有基于动作比较反馈的奖励无关RL,并引入了一种针对该场景量身定制的高效查询算法。
更新时间: 2024-04-17 16:13:54
领域: cs.LG,cs.AI,math.ST,stat.ML,stat.TH
Embedding Privacy in Computational Social Science and Artificial Intelligence Research
Privacy is a human right. It ensures that individuals are free to engage in discussions, participate in groups, and form relationships online or offline without fear of their data being inappropriately harvested, analyzed, or otherwise used to harm them. Preserving privacy has emerged as a critical factor in research, particularly in the computational social science (CSS), artificial intelligence (AI) and data science domains, given their reliance on individuals' data for novel insights. The increasing use of advanced computational models stands to exacerbate privacy concerns because, if inappropriately used, they can quickly infringe privacy rights and lead to adverse effects for individuals - especially vulnerable groups - and society. We have already witnessed a host of privacy issues emerge with the advent of large language models (LLMs), such as ChatGPT, which further demonstrate the importance of embedding privacy from the start. This article contributes to the field by discussing the role of privacy and the primary issues that researchers working in CSS, AI, data science and related domains are likely to face. It then presents several key considerations for researchers to ensure participant privacy is best preserved in their research design, data collection and use, analysis, and dissemination of research results.
Updated: 2024-04-17 16:07:53
标题: 将隐私嵌入到计算社会科学和人工智能研究中
摘要: 隐私是一项人权。它确保个体可以自由参与讨论、参加团体,并在线下或线上建立关系,而不必担心他们的数据被不当收集、分析或以其他方式用于伤害他们。保护隐私已经成为研究中一个关键因素,特别是在计算社会科学(CSS)、人工智能(AI)和数据科学领域,因为它们依赖于个人数据来获得新的见解。对先进计算模型的不断使用可能会加剧隐私问题,因为如果不当使用,它们很快就会侵犯隐私权,并对个人(特别是弱势群体)和社会造成不利影响。我们已经目睹了随着大型语言模型(LLMs)的出现而出现的一系列隐私问题,比如ChatGPT,进一步展示了从一开始就嵌入隐私的重要性。本文通过讨论隐私的作用以及CSS、AI、数据科学等领域的研究人员可能面临的主要问题,为该领域作出了贡献。然后提出了几个重要的考虑因素,以确保研究人员在研究设计、数据收集和使用、分析以及研究结果的传播中最好地保护参与者的隐私。
更新时间: 2024-04-17 16:07:53
领域: cs.AI,cs.CY,cs.ET,cs.HC
VC Theory for Inventory Policies
Advances in computational power and AI have increased interest in reinforcement learning approaches to inventory management. This paper provides a theoretical foundation for these approaches and investigates the benefits of restricting to policy structures that are well-established by decades of inventory theory. In particular, we prove generalization guarantees for learning several well-known classes of inventory policies, including base-stock and (s, S) policies, by leveraging the celebrated Vapnik-Chervonenkis (VC) theory. We apply the concepts of the Pseudo-dimension and Fat-shattering dimension from VC theory to determine the generalizability of inventory policies, that is, the difference between an inventory policy's performance on training data and its expected performance on unseen data. We focus on a classical setting without contexts, but allow for an arbitrary distribution over demand sequences and do not make any assumptions such as independence over time. We corroborate our supervised learning results using numerical simulations. Managerially, our theory and simulations translate to the following insights. First, there is a principle of "learning less is more" in inventory management: depending on the amount of data available, it may be beneficial to restrict oneself to a simpler, albeit suboptimal, class of inventory policies to minimize overfitting errors. Second, the number of parameters in a policy class may not be the correct measure of overfitting error: in fact, the class of policies defined by T time-varying base-stock levels exhibits a generalization error comparable to that of the two-parameter (s, S) policy class. Finally, our research suggests situations in which it could be beneficial to incorporate the concepts of base-stock and inventory position into black-box learning machines, instead of having these machines directly learn the order quantity actions.
Updated: 2024-04-17 16:05:03
标题: 库存政策的VC理论
摘要: 计算能力和人工智能的进步增加了对强化学习方法在库存管理中的兴趣。本文为这些方法提供了理论基础,并研究了限制到几十年来库存理论中已经建立的政策结构的好处。具体来说,我们利用著名的Vapnik-Chervonenkis(VC)理论为学习几种知名的库存政策类别提供了泛化保证,包括基本库存和(s,S)政策。我们应用VC理论中的Pseudo-dimension和Fat-shattering dimension的概念来确定库存政策的泛化能力,即库存政策在训练数据上表现与在未知数据上预期表现之间的差异。我们专注于一个经典设置,没有上下文,但允许对需求序列进行任意分布,并不做出任何关于时间独立性的假设。我们通过数值模拟验证了我们的监督学习结果。 在管理层面,我们的理论和模拟结果可以转化为以下见解。首先,在库存管理中存在一个“学习更少即更多”的原则:根据可用数据量,限制自己仅使用更简单但可能次优的库存政策类别可能有益于减少过拟合错误。其次,政策类别中的参数数量可能不是过拟合错误的正确衡量标准:事实上,由T个时间变化的基本库存水平定义的政策类别表现出与两参数(s,S)政策类别相当的泛化错误。最后,我们的研究表明,在某些情况下,将基本库存和库存位置的概念纳入黑匣子学习机器中,而不是让这些机器直接学习订购数量动作,可能是有益的。
更新时间: 2024-04-17 16:05:03
领域: stat.ML,cs.LG
A Text Classification Framework for Simple and Effective Early Depression Detection Over Social Media Streams
With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people's lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF's eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale.
Updated: 2024-04-17 15:58:18
标题: 一个简单有效的社交媒体流早期抑郁症检测的文本分类框架
摘要: 随着互联网的兴起,建立能够有效处理社交媒体上的早期风险检测(ERD)问题的智能系统的需求日益增长,如早期抑郁症检测、早期谣言检测或性侵犯者识别等。这些系统,如今主要基于机器学习技术,必须能够处理数据流,因为用户随时间提供数据。此外,这些系统必须能够决定处理数据何时足以对用户进行分类。此外,由于ERD任务涉及风险决策,可能影响人们的生活,这些系统还必须能够证明其决策的合理性。然而,大多数标准和最先进的监督机器学习模型并不适合处理这种情况。这是因为它们要么作为黑匣子,要么不支持增量分类/学习。在本文中,我们介绍了SS3,一种新颖的用于文本分类的监督学习模型,自然支持这些方面。SS3被设计为用于处理ERD问题的通用框架。我们在CLEF的eRisk2017早期抑郁症检测试验任务上评估了我们的模型。此次竞赛提交的30份贡献中,大多数使用了最先进的方法。实验结果显示,尽管我们的分类器计算成本较低且具有解释其基本原理的能力,但它能够胜过这些模型和标准分类器。
更新时间: 2024-04-17 15:58:18
领域: cs.CY,cs.CL,cs.IR,cs.LG,cs.SI
Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models
In real world, large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications. For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work, and numerous optimization algorithms and code libraries have been proposed to improve it. Nonetheless, users still find it challenging to compare the effectiveness of all the above methods and understand the underlying mechanisms. In this work, we perform a detailed coarse-to-fine analysis of the inference performance of various code libraries. To evaluate the overall effectiveness, we examine four usage scenarios within two practical applications. We further provide both theoretical and empirical fine-grained analyses of each module in the Transformer architecture. Our experiments yield comprehensive results that are invaluable for researchers to evaluate code libraries and improve inference strategies.
Updated: 2024-04-17 15:57:50
标题: 朝着大型语言模型推理效率的粗到细评估
摘要: 在现实世界中,大型语言模型(LLMs)可以作为助手帮助用户完成工作,并支持先进应用程序的开发。为了广泛应用LLMs,推理效率是一个重要关注点,已经在现有工作中得到广泛研究,并提出了许多优化算法和代码库以改进它。然而,用户仍然发现很难比较所有上述方法的有效性并理解其中的基本机制。在这项工作中,我们对各种代码库的推理性能进行了详细的从粗到细的分析。为了评估整体效果,我们在两个实际应用程序中考察了四种使用场景。我们进一步对Transformer架构中的每个模块进行了理论和经验细致的分析。我们的实验产生了全面的结果,对研究人员评估代码库和改进推理策略至关重要。
更新时间: 2024-04-17 15:57:50
领域: cs.CL,cs.AI
Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models
This paper studies the relationship between the surface form of a mathematical problem and its solvability by large language models. We find that subtle alterations in the surface form can significantly impact the answer distribution and the solve rate, exposing the language model's lack of robustness and sensitivity to the surface form in reasoning through complex problems. To improve mathematical reasoning performance, we propose Self-Consistency-over-Paraphrases (SCoP), which diversifies reasoning paths from specific surface forms of the problem. We evaluate our approach on four mathematics reasoning benchmarks over three large language models and show that SCoP improves mathematical reasoning performance over vanilla self-consistency, particularly for problems initially deemed unsolvable. Finally, we provide additional experiments and discussion regarding problem difficulty and surface forms, including cross-model difficulty agreement and paraphrasing transferability, and Variance of Variations (VOV) for language model evaluation.
Updated: 2024-04-17 15:53:49
标题: 释义与解决:探究和利用表面形式对大型语言模型中数学推理的影响
摘要: 本文研究了数学问题的表面形式与大型语言模型求解能力之间的关系。我们发现,表面形式上的微小变化可以显著影响答案分布和解决率,暴露了语言模型在推理复杂问题时缺乏稳健性和对表面形式的敏感性。为了提高数学推理性能,我们提出了自洽性优于释义(SCoP)方法,该方法可以从问题的具体表面形式中多样化推理路径。我们在三个大型语言模型上的四个数学推理基准上评估了我们的方法,结果显示SCoP相对于普通的自洽性方法可以改善数学推理性能,尤其是对于最初被视为无法解决的问题。最后,我们提供了关于问题难度和表面形式的额外实验和讨论,包括跨模型难度一致性和释义可转移性,以及用于语言模型评估的变化的变化(VOV)。
更新时间: 2024-04-17 15:53:49
领域: cs.CL,cs.AI
A Data-Driven Representation for Sign Language Production
Phonetic representations are used when recording spoken languages, but no equivalent exists for recording signed languages. As a result, linguists have proposed several annotation systems that operate on the gloss or sub-unit level; however, these resources are notably irregular and scarce. Sign Language Production (SLP) aims to automatically translate spoken language sentences into continuous sequences of sign language. However, current state-of-the-art approaches rely on scarce linguistic resources to work. This has limited progress in the field. This paper introduces an innovative solution by transforming the continuous pose generation problem into a discrete sequence generation problem. Thus, overcoming the need for costly annotation. Although, if available, we leverage the additional information to enhance our approach. By applying Vector Quantisation (VQ) to sign language data, we first learn a codebook of short motions that can be combined to create a natural sequence of sign. Where each token in the codebook can be thought of as the lexicon of our representation. Then using a transformer we perform a translation from spoken language text to a sequence of codebook tokens. Each token can be directly mapped to a sequence of poses allowing the translation to be performed by a single network. Furthermore, we present a sign stitching method to effectively join tokens together. We evaluate on the RWTH-PHOENIX-Weather-2014T (PHOENIX14T) and the more challenging Meine DGS Annotated (mDGS) datasets. An extensive evaluation shows our approach outperforms previous methods, increasing the BLEU-1 back translation score by up to 72%.
Updated: 2024-04-17 15:52:38
标题: 手语制作的数据驱动表征
摘要: 语音表征在记录口头语言时被使用,但在记录手语时没有相应的存在。因此,语言学家提出了几种在语素或子单元级别操作的注释系统;然而,这些资源明显不规则且稀缺。 手语制作(SLP)旨在将口语句子自动翻译成连续的手语序列。然而,目前领先的方法依赖于稀缺的语言资源来运作。这限制了该领域的进展。本文提出了一种创新的解决方案,将连续姿势生成问题转化为离散序列生成问题。因此,克服了昂贵注释的需求。但是,如果可用,我们会利用额外信息来增强我们的方法。 通过将矢量量化(VQ)应用于手语数据,我们首先学习了一个短动作的码书,这些动作可以结合在一起创建自然的手语序列。码书中的每个标记可以被视为我们表征的词汇。然后使用变换器,我们执行从口语文本到码书标记序列的翻译。每个标记可以直接映射到一系列姿势,允许翻译由单个网络执行。此外,我们提出了一种手语拼接方法,有效地将标记连接在一起。我们在RWTH-PHOENIX-Weather-2014T(PHOENIX14T)和更具挑战性的Meine DGS Annotated(mDGS)数据集上进行评估。广泛的评估显示我们的方法胜过以前的方法,将BLEU-1反向翻译分数提高了高达72%。
更新时间: 2024-04-17 15:52:38
领域: cs.CL,cs.AI
Runtime Analysis of Evolutionary Diversity Optimization on the Multi-objective (LeadingOnes, TrailingZeros) Problem
The diversity optimization is the class of optimization problems, in which we aim at finding a diverse set of good solutions. One of the frequently used approaches to solve such problems is to use evolutionary algorithms which evolve a desired diverse population. This approach is called evolutionary diversity optimization (EDO). In this paper, we analyse EDO on a 3-objective function LOTZ$_k$, which is a modification of the 2-objective benchmark function (LeadingOnes, TrailingZeros). We prove that the GSEMO computes a set of all Pareto-optimal solutions in $O(kn^3)$ expected iterations. We also analyze the runtime of the GSEMO$_D$ (a modification of the GSEMO for diversity optimization) until it finds a population with the best possible diversity for two different diversity measures, the total imbalance and the sorted imbalances vector. For the first measure we show that the GSEMO$_D$ optimizes it asymptotically faster than it finds a Pareto-optimal population, in $O(kn^2\log(n))$ expected iterations, and for the second measure we show an upper bound of $O(k^2n^3\log(n))$ expected iterations. We complement our theoretical analysis with an empirical study, which shows a very similar behavior for both diversity measures that is close to the theory predictions.
Updated: 2024-04-17 15:51:15
标题: 多目标(LeadingOnes、TrailingZeros)问题上进化多样性优化的运行时分析
摘要: 多样性优化是一类优化问题,我们旨在找到一组多样化的好解决方案。解决这类问题的一种常用方法是使用进化算法,进化出所需的多样化人口。这种方法被称为进化多样性优化(EDO)。 在本文中,我们分析了一个三目标函数LOTZ$_k$上的EDO,这是一个修改自两目标基准函数(LeadingOnes, TrailingZeros)。我们证明了GSEMO在$O(kn^3)$个期望迭代中计算出所有帕累托最优解的集合。我们还分析了GSEMO$_D$(用于多样性优化的GSEMO的修改版)的运行时间,直到找到最佳多样性人口为止,对于两种不同的多样性度量,总不平衡和排序不平衡向量。对于第一种度量,我们表明GSEMO$_D$在渐近上比找到帕累托最优人口更快地优化它,在$O(kn^2\log(n))$个期望迭代中,并且对于第二种度量,我们展示了$O(k^2n^3\log(n))$个期望迭代的上界。我们通过实证研究补充了我们的理论分析,结果显示了两种多样性度量的非常相似的行为,接近于理论预测。
更新时间: 2024-04-17 15:51:15
领域: cs.NE,cs.AI
arcjetCV: an open-source software to analyze material ablation
arcjetCV is an open-source Python software designed to automate time-resolved measurements of heatshield material recession and recession rates from arcjet test video footage. This new automated and accessible capability greatly exceeds previous manual extraction methods, enabling rapid and detailed characterization of material recession for any sample with a profile video. arcjetCV automates the video segmentation process using machine learning models, including a one-dimensional (1D) Convolutional Neural Network (CNN) to infer the time-window of interest, a two-dimensional (2D) CNN for image and edge segmentation, and a Local Outlier Factor (LOF) for outlier filtering. A graphical user interface (GUI) simplifies the user experience and an application programming interface (API) allows users to call the core functions from scripts, enabling video batch processing. arcjetCV's capability to measure time-resolved recession in turn enables characterization of non-linear processes (shrinkage, swelling, melt flows, etc.), contributing to higher fidelity validation and improved modeling of heatshield material performance. The source code associated with this article can be found at https://github.com/magnus-haw/arcjetCV.
Updated: 2024-04-17 15:47:26
标题: arcjetCV:一种用于分析材料剥蚀的开源软件
摘要: arcjetCV是一个开源的Python软件,旨在自动化从电弧喷射试验视频素材中测量热护盾材料退后和退后速率的时间分辨率。这种新的自动化和易于访问的能力远远超过了先前的手动提取方法,使得对任何带有轮廓视频的样本进行快速和详细的材料退后特征化成为可能。arcjetCV利用机器学习模型自动化视频分割过程,包括一维卷积神经网络(CNN)来推断感兴趣的时间窗口,二维CNN用于图像和边缘分割,以及局部异常因子(LOF)用于异常值过滤。图形用户界面(GUI)简化了用户体验,应用程序编程接口(API)允许用户从脚本调用核心功能,实现视频批处理。arcjetCV测量时间分辨率的退后能力进而使非线性过程(收缩、膨胀、熔流等)的特征化成为可能,有助于更高保真度的验证和改进热护盾材料性能建模。与本文相关的源代码可在https://github.com/magnus-haw/arcjetCV找到。
更新时间: 2024-04-17 15:47:26
领域: cs.CV,cs.AI,cs.LG
Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems
This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors. This method reduces the average compute load of an off-the-shelf Deep Neural Network (DNN) based object detector by up to 2.25$\times$ by alternating the processing of high-resolution images (320$\times$320 pixels) with multiple down-sized frames (192$\times$192 pixels). To tackle the accuracy degradation due to the reduced image input size, MR2-ByteTrack correlates the output detections over time using the ByteTrack tracker and corrects potential misclassification using a novel probabilistic Rescore algorithm. By interleaving two down-sized images for every high-resolution one as the input of different state-of-the-art DNN object detectors with our MR2-ByteTrack, we demonstrate an average accuracy increase of 2.16% and a latency reduction of 43% on the GAP9 microcontroller compared to a baseline frame-by-frame inference scheme using exclusively full-resolution images. Code available at: https://github.com/Bomps4/Multi_Resolution_Rescored_ByteTrack
Updated: 2024-04-17 15:45:49
标题: 多分辨率重新评分的ByteTrack用于超低功耗嵌入式系统上的视频目标检测
摘要: 本文介绍了Multi-Resolution Rescored Byte-Track(MR2-ByteTrack),这是一种新颖的视频目标检测框架,适用于超低功耗嵌入式处理器。该方法通过交替处理高分辨率图像(320×320像素)和多个缩小帧(192×192像素),将基于深度神经网络(DNN)的目标检测器的平均计算负载减少了最多2.25倍。为了解决由于减小图像输入尺寸而导致的准确性降低,MR2-ByteTrack利用ByteTrack跟踪器随时间相关联输出检测结果,并使用新颖的概率Rescore算法纠正潜在的误分类。通过将每个高分辨率图像的输入与两个缩小图像交替输入到不同最先进的DNN目标检测器中,我们展示了与仅使用全分辨率图像的基准逐帧推理方案相比,在GAP9微控制器上平均准确度提高了2.16%,延迟减少了43%。代码可在https://github.com/Bomps4/Multi_Resolution_Rescored_ByteTrack 获取。
更新时间: 2024-04-17 15:45:49
领域: cs.CV,cs.AI,I.4
Visibility into AI Agents
Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks. Understanding and mitigating these risks involves critically evaluating existing governance structures, revising and adapting these structures where needed, and ensuring accountability of key stakeholders. Information about where, why, how, and by whom certain AI agents are used, which we refer to as visibility, is critical to these objectives. In this paper, we assess three categories of measures to increase visibility into AI agents: agent identifiers, real-time monitoring, and activity logging. For each, we outline potential implementations that vary in intrusiveness and informativeness. We analyze how the measures apply across a spectrum of centralized through decentralized deployment contexts, accounting for various actors in the supply chain including hardware and software service providers. Finally, we discuss the implications of our measures for privacy and concentration of power. Further work into understanding the measures and mitigating their negative impacts can help to build a foundation for the governance of AI agents.
Updated: 2024-04-17 15:45:13
标题: AI代理的可见性
摘要: 将商业、科学、政府和个人活动的委托增加到人工智能代理人身上,这些系统能够在有限的监督下追求复杂目标,可能会加剧现有的社会风险并引入新的风险。理解和减轻这些风险涉及对现有治理结构进行批判性评估,在必要时修订和调整这些结构,并确保关键利益相关者的责任。我们称之为可见性的关于某些人工智能代理人在何处、为何、如何以及由谁使用的信息对于实现这些目标至关重要。在本文中,我们评估了三类增加对人工智能代理人可见性的措施:代理人标识符、实时监测和活动记录。对于每一种措施,我们概述了可能的实施方式,这些方式在侵入性和信息性上有所不同。我们分析了这些措施如何适用于从集中式到分散式部署情境的各种背景,考虑到供应链中的各种参与方,包括硬件和软件服务提供商。最后,我们讨论了我们的措施对隐私和权力集中的影响。进一步研究了解这些措施并减轻其负面影响,有助于为人工智能代理人的治理奠定基础。
更新时间: 2024-04-17 15:45:13
领域: cs.CY,cs.AI
From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying
Safeguarding the Intellectual Property (IP) of data has become critically important as machine learning applications continue to proliferate, and their success heavily relies on the quality of training data. While various mechanisms exist to secure data during storage, transmission, and consumption, fewer studies have been developed to detect whether they are already leaked for model training without authorization. This issue is particularly challenging due to the absence of information and control over the training process conducted by potential attackers. In this paper, we concentrate on the domain of tabular data and introduce a novel methodology, Local Distribution Shifting Synthesis (\textsc{LDSS}), to detect leaked data that are used to train classification models. The core concept behind \textsc{LDSS} involves injecting a small volume of synthetic data--characterized by local shifts in class distribution--into the owner's dataset. This enables the effective identification of models trained on leaked data through model querying alone, as the synthetic data injection results in a pronounced disparity in the predictions of models trained on leaked and modified datasets. \textsc{LDSS} is \emph{model-oblivious} and hence compatible with a diverse range of classification models. We have conducted extensive experiments on seven types of classification models across five real-world datasets. The comprehensive results affirm the reliability, robustness, fidelity, security, and efficiency of \textsc{LDSS}. Extending \textsc{LDSS} to regression tasks further highlights its versatility and efficacy compared with baseline methods.
Updated: 2024-04-17 15:45:11
标题: 从零到英雄:通过合成数据注入和模型查询检测泄露数据
摘要: 随着机器学习应用的不断增加,保护数据的知识产权(IP)变得至关重要,它们的成功很大程度上依赖于训练数据的质量。虽然存在各种机制来在存储、传输和消费过程中保护数据,但很少有研究开发出来检测它们是否在未经授权的情况下已经泄露用于模型训练。这个问题特别具有挑战性,因为潜在攻击者进行的训练过程缺乏信息和控制。 本文集中讨论表格数据领域,并引入一种新的方法,局部分布转移合成(LDSS),用于检测用于训练分类模型的泄露数据。LDSS的核心概念是向所有者数据集注入一小部分具有类别分布局部变化特征的合成数据。这使得可以通过仅查询模型来有效识别训练在泄露数据和修改数据集上的模型的预测之间的显著差异。LDSS是模型无关的,因此与各种分类模型兼容。我们在五个真实世界数据集上对七种类型的分类模型进行了广泛的实验。全面的结果证实了LDSS的可靠性、鲁棒性、保真性、安全性和效率。将LDSS扩展到回归任务进一步突显了其相对基准方法的多样性和功效。
更新时间: 2024-04-17 15:45:11
领域: cs.LG,cs.DB
AgentKit: Flow Engineering with Graphs, not Coding
We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offers a unified framework for explicitly constructing a complex "thought process" from simple natural language prompts. The basic building block in AgentKit is a node, containing a natural language prompt for a specific subtask. The user then puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process". For example, for the task of writing a paper, one may start with the thought process of 1) identify a core message, 2) identify prior research gaps, etc. The nodes in AgentKit can be designed and combined in different ways to implement multiple advanced capabilities including on-the-fly hierarchical planning, reflection, and learning from interactions. In addition, due to the modular nature and the intuitive design to simulate explicit human thought process, a basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience. Quantitatively, we show that agents designed through AgentKit achieve SOTA performance on WebShop and Crafter. These advances underscore AgentKit's potential in making LLM agents effective and accessible for a wider range of applications. https://github.com/holmeswww/AgentKit
Updated: 2024-04-17 15:40:45
标题: AgentKit:使用图而非编码进行流工程
摘要: 我们提出了一个直观的LLM提示框架(AgentKit)用于多功能代理。AgentKit提供了一个统一的框架,可以从简单的自然语言提示明确构建复杂的“思维过程”。AgentKit中的基本构件是一个节点,包含一个特定子任务的自然语言提示。然后用户将节点链放在一起,就像堆积乐高积木一样。节点链可以设计成明确强制执行自然结构化的“思维过程”。例如,对于写作一篇论文的任务,可以从以下思维过程开始:1)确定核心信息,2)确定先前的研究空白等。AgentKit中的节点可以以不同方式设计和组合,以实现多种高级功能,包括即时分层规划、反思和从互动中学习。此外,由于其模块化性质和模拟显式人类思维过程的直观设计,一个基本代理可以被实现得像一个子任务的提示列表一样简单,因此可以由没有任何编程经验的人设计和调整。在数量上,我们展示了通过AgentKit设计的代理在WebShop和Crafter上实现了SOTA性能。这些进步强调了AgentKit在使LLM代理对更广泛的应用更有效和可访问的潜力。
更新时间: 2024-04-17 15:40:45
领域: cs.AI,cs.LG
Discovering Nuclear Models from Symbolic Machine Learning
Numerous phenomenological nuclear models have been proposed to describe specific observables within different regions of the nuclear chart. However, developing a unified model that describes the complex behavior of all nuclei remains an open challenge. Here, we explore whether novel symbolic Machine Learning (ML) can rediscover traditional nuclear physics models or identify alternatives with improved simplicity, fidelity, and predictive power. To address this challenge, we developed a Multi-objective Iterated Symbolic Regression approach that handles symbolic regressions over multiple target observables, accounts for experimental uncertainties and is robust against high-dimensional problems. As a proof of principle, we applied this method to describe the nuclear binding energies and charge radii of light and medium mass nuclei. Our approach identified simple analytical relationships based on the number of protons and neutrons, providing interpretable models with precision comparable to state-of-the-art nuclear models. Additionally, we integrated this ML-discovered model with an existing complementary model to estimate the limits of nuclear stability. These results highlight the potential of symbolic ML to develop accurate nuclear models and guide our description of complex many-body problems.
Updated: 2024-04-17 15:32:58
标题: 从符号机器学习中发现核模型
摘要: 许多现象学核模型已被提出来描述核图表中不同区域内的特定可观测量。然而,开发一个统一的模型来描述所有核的复杂行为仍然是一个挑战。在这里,我们探讨了是否新颖的符号机器学习(ML)可以重新发现传统核物理模型或者识别具有改进的简单性、忠实度和预测能力的替代方案。为了解决这一挑战,我们开发了一种多目标迭代符号回归方法,该方法处理多个目标可观测量上的符号回归,考虑实验不确定性,并且对高维问题具有鲁棒性。作为一个原理验证,我们将这种方法应用于描述轻量和中等质量核的核结合能和电荷半径。我们的方法确定了基于质子和中子数量的简单分析关系,提供了可解释的模型,精度与最先进的核模型相当。此外,我们将这个ML发现的模型与现有的互补模型整合在一起,以估计核稳定性的极限。这些结果突显了符号ML开发准确的核模型的潜力,并引导我们描述复杂的多体问题。
更新时间: 2024-04-17 15:32:58
领域: nucl-th,cs.AI,cs.LG,nucl-ex
Taxonomy to Regulation: A (Geo)Political Taxonomy for AI Risks and Regulatory Measures in the EU AI Act
Technological innovations have shown remarkable capabilities to benefit and harm society alike. AI constitutes a democratized sophisticated technology accessible to large parts of society, including malicious actors. This work proposes a taxonomy focusing on on (geo)political risks associated with AI. It identifies 12 risks in total divided into four categories: (1) Geopolitical Pressures, (2) Malicious Usage, (3) Environmental, Social, and Ethical Risks, and (4) Privacy and Trust Violations. Incorporating a regulatory side, this paper conducts a policy assessment of the EU AI Act. Adopted in March 2023, the landmark regulation has the potential to have a positive top-down impact concerning AI risk reduction but needs regulatory adjustments to mitigate risks more comprehensively. Regulatory exceptions for open-source models, excessively high parameters for the classification of GPAI models as a systemic risk, and the exclusion of systems designed exclusively for military purposes from the regulation's obligations leave room for future action.
Updated: 2024-04-17 15:32:56
标题: 从分类到监管:欧盟AI法案中关于AI风险和监管措施的(地缘)政治分类学
摘要: 技术创新显示出了显著的能力,既有益于社会,也有害于社会。人工智能构成了一种民主化的复杂技术,可以让社会的大部分人士,包括恶意行为者,获得。本文提出了一个关注与人工智能相关的(地缘)政治风险的分类法。总共确定了12种风险,分为四类:(1)地缘政治压力,(2)恶意使用,(3)环境、社会和道德风险,以及(4)隐私和信任侵犯。结合监管方面,本文对欧盟人工智能法案进行了政策评估。该标志性法规于2023年3月通过,具有潜力在人工智能风险减少方面产生积极的自上而下影响,但需要进行监管调整以更全面地减少风险。针对开源模型的监管例外、将GPAI模型分类为系统性风险的参数设置过高、以及将仅设计用于军事目的的系统排除在法规义务之外,都为未来采取行动留下了空间。
更新时间: 2024-04-17 15:32:56
领域: cs.AI,cs.CY,cs.LG,68T01,I.2.0
AdaIR: Exploiting Underlying Similarities of Image Restoration Tasks with Adapters
Existing image restoration approaches typically employ extensive networks specifically trained for designated degradations. Despite being effective, such methods inevitably entail considerable storage costs and computational overheads due to the reliance on task-specific networks. In this work, we go beyond this well-established framework and exploit the inherent commonalities among image restoration tasks. The primary objective is to identify components that are shareable across restoration tasks and augment the shared components with modules specifically trained for individual tasks. Towards this goal, we propose AdaIR, a novel framework that enables low storage cost and efficient training without sacrificing performance. Specifically, a generic restoration network is first constructed through self-supervised pre-training using synthetic degradations. Subsequent to the pre-training phase, adapters are trained to adapt the pre-trained network to specific degradations. AdaIR requires solely the training of lightweight, task-specific modules, ensuring a more efficient storage and training regimen. We have conducted extensive experiments to validate the effectiveness of AdaIR and analyze the influence of the pre-training strategy on discovering shareable components. Extensive experimental results show that AdaIR achieves outstanding results on multi-task restoration while utilizing significantly fewer parameters (1.9 MB) and less training time (7 hours) for each restoration task. The source codes and trained models will be released.
Updated: 2024-04-17 15:31:06
标题: AdaIR:利用适配器揭示图像恢复任务的潜在相似性
摘要: 现有的图像恢复方法通常采用专门针对指定退化训练的庞大网络。尽管这些方法有效,但由于依赖于特定任务的网络,这些方法不可避免地会产生相当大的存储成本和计算开销。在这项工作中,我们超越了这一既定框架,利用图像恢复任务之间固有的共性。主要目标是确定在恢复任务之间可共享的组件,并增强这些共享组件与专门针对个别任务训练的模块。为实现这一目标,我们提出了AdaIR,这是一个新颖的框架,可以实现低存储成本和高效训练而不损失性能。具体而言,通过使用合成退化进行自监督预训练,首先构建一个通用恢复网络。在预训练阶段之后,训练适配器来使预训练网络适应特定的退化。AdaIR仅需要训练轻量级、特定任务的模块,确保更有效的存储和训练计划。我们进行了大量实验来验证AdaIR的有效性,并分析预训练策略对发现可共享组件的影响。大量实验结果表明,AdaIR在多任务恢复上取得了优异的结果,同时每个恢复任务仅利用更少的参数(1.9 MB)和更短的训练时间(7小时)。源代码和训练模型将会发布。
更新时间: 2024-04-17 15:31:06
领域: cs.CV,cs.AI
Assessing The Effectiveness Of Current Cybersecurity Regulations And Policies In The US
This article assesses the effectiveness of current cybersecurity regulations and policies in the United States amidst the escalating frequency and sophistication of cyber threats. The focus is on the comprehensive framework established by the U.S. government, with a spotlight on the National Institute of Standards and Technology (NIST) Cybersecurity Framework and key regulations such as HIPAA, GLBA, FISMA, CISA, CCPA, and the DOD Cybersecurity Maturity Model Certification. The study evaluates the impact of these regulations on different sectors and analyzes trends in cybercrime data from 2000 to 2022. The findings highlight the challenges, successes, and the need for continuous adaptation in the face of evolving cyber threats
Updated: 2024-04-17 15:26:55
标题: 评估美国当前网络安全法规和政策的有效性
摘要: 本文评估了美国当前网络安全法规和政策在网络威胁不断升级的情况下的有效性。重点放在美国政府建立的综合框架上,特别关注国家标准与技术研究所(NIST)网络安全框架以及关键法规如HIPAA、GLBA、FISMA、CISA、CCPA和美国国防部网络安全成熟度模型认证。该研究评估了这些法规对不同行业的影响,分析了2000年至2022年的网络犯罪数据趋势。研究结果突出了面对不断演变的网络威胁,挑战、成功和持续适应的需求。
更新时间: 2024-04-17 15:26:55
领域: cs.CR,cs.CY
Solving morphological analogies: from retrieval to generation
Analogical inference is a remarkable capability of human reasoning, and has been used to solve hard reasoning tasks. Analogy based reasoning (AR) has gained increasing interest from the artificial intelligence community and has shown its potential in multiple machine learning tasks such as classification, decision making and recommendation with competitive results. We propose a deep learning (DL) framework to address and tackle two key tasks in AR: analogy detection and solving. The framework is thoroughly tested on the Siganalogies dataset of morphological analogical proportions (APs) between words, and shown to outperform symbolic approaches in many languages. Previous work have explored the behavior of the Analogy Neural Network for classification (ANNc) on analogy detection and of the Analogy Neural Network for retrieval (ANNr) on analogy solving by retrieval, as well as the potential of an autoencoder (AE) for analogy solving by generating the solution word. In this article we summarize these findings and we extend them by combining ANNr and the AE embedding model, and checking the performance of ANNc as an retrieval method. The combination of ANNr and AE outperforms the other approaches in almost all cases, and ANNc as a retrieval method achieves competitive or better performance than 3CosMul. We conclude with general guidelines on using our framework to tackle APs with DL.
Updated: 2024-04-17 15:23:12
标题: 解决形态类比问题:从检索到生成
摘要: 类比推理是人类推理的一个显著能力,并已被用于解决困难的推理任务。基于类比的推理(AR)引起了人工智能社区的越来越多的关注,并在多个机器学习任务中展现了其潜力,如分类、决策和推荐,取得了有竞争力的结果。我们提出了一个深度学习(DL)框架来解决和应对AR中的两个关键任务:类比检测和解决。该框架在Siganalogies数据集上进行了彻底测试,该数据集包含了单词之间的形态类比比例(APs),并且在许多语言中表现出优于符号方法的结果。先前的研究探讨了用于分类的类比神经网络(ANNc)在类比检测上的行为,以及用于检索的类比神经网络(ANNr)在通过检索解决类比的潜力,以及用于生成解决方案单词的自编码器(AE)的潜力。在本文中,我们总结了这些发现,并通过结合ANNr和AE嵌入模型来扩展它们,并检查了ANNc作为检索方法的性能。在几乎所有情况下,ANNr和AE的组合优于其他方法,而ANNc作为检索方法实现了与3CosMul相当或更好的性能。我们最后总结了使用我们的框架来处理包含DL的APs的一般指导方针。
更新时间: 2024-04-17 15:23:12
领域: cs.CL,cs.AI,cs.LG
A Federated Learning Approach to Privacy Preserving Offensive Language Identification
The spread of various forms of offensive speech online is an important concern in social media. While platforms have been investing heavily in ways of coping with this problem, the question of privacy remains largely unaddressed. Models trained to detect offensive language on social media are trained and/or fine-tuned using large amounts of data often stored in centralized servers. Since most social media data originates from end users, we propose a privacy preserving decentralized architecture for identifying offensive language online by introducing Federated Learning (FL) in the context of offensive language identification. FL is a decentralized architecture that allows multiple models to be trained locally without the need for data sharing hence preserving users' privacy. We propose a model fusion approach to perform FL. We trained multiple deep learning models on four publicly available English benchmark datasets (AHSD, HASOC, HateXplain, OLID) and evaluated their performance in detail. We also present initial cross-lingual experiments in English and Spanish. We show that the proposed model fusion approach outperforms baselines in all the datasets while preserving privacy.
Updated: 2024-04-17 15:23:12
标题: 一种隐私保护攻击性语言识别的联邦学习方法
摘要: 在社交媒体中,各种形式的攻击性言论的传播是一个重要的问题。虽然平台一直在大力投资于应对这个问题的方法,但隐私问题仍然没有得到很好的解决。在社交媒体上训练用于检测攻击性语言的模型通常需要大量数据,这些数据通常存储在集中式服务器中。由于大多数社交媒体数据来自最终用户,我们提出了一个隐私保护的分布式架构,通过引入联邦学习(FL)来识别在线攻击性言论。FL是一种分布式架构,允许多个模型在本地训练,无需共享数据,从而保护用户的隐私。我们提出了一个模型融合方法来执行FL。我们在四个公开可用的英语基准数据集(AHSD、HASOC、HateXplain、OLID)上训练了多个深度学习模型,并详细评估了它们的性能。我们还展示了英语和西班牙语之间的初始跨语言实验。我们展示了所提出的模型融合方法在所有数据集中优于基准线,并保护隐私。
更新时间: 2024-04-17 15:23:12
领域: cs.CL,cs.LG
Interpreting and generalizing deep learning in physics-based problems with functional linear models
Although deep learning has achieved remarkable success in various scientific machine learning applications, its opaque nature poses concerns regarding interpretability and generalization capabilities beyond the training data. Interpretability is crucial and often desired in modeling physical systems. Moreover, acquiring extensive datasets that encompass the entire range of input features is challenging in many physics-based learning tasks, leading to increased errors when encountering out-of-distribution (OOD) data. In this work, motivated by the field of functional data analysis (FDA), we propose generalized functional linear models as an interpretable surrogate for a trained deep learning model. We demonstrate that our model could be trained either based on a trained neural network (post-hoc interpretation) or directly from training data (interpretable operator learning). A library of generalized functional linear models with different kernel functions is considered and sparse regression is used to discover an interpretable surrogate model that could be analytically presented. We present test cases in solid mechanics, fluid mechanics, and transport. Our results demonstrate that our model can achieve comparable accuracy to deep learning and can improve OOD generalization while providing more transparency and interpretability. Our study underscores the significance of interpretable representation in scientific machine learning and showcases the potential of functional linear models as a tool for interpreting and generalizing deep learning.
Updated: 2024-04-17 15:16:07
标题: 用功能线性模型解释和推广物理问题中的深度学习
摘要: 尽管深度学习在各种科学机器学习应用中取得了显著的成功,但其不透明的特性引发了对解释性和泛化能力的担忧,这超出了训练数据。在建模物理系统中,解释性至关重要且常常被期望。此外,在许多基于物理学的学习任务中,获得包含整个输入特征范围的大量数据集是具有挑战性的,这导致在遇到超出分布(OOD)数据时误差增加。在这项工作中,受到函数数据分析(FDA)领域的启发,我们提出了广义函数线性模型作为深度学习模型的可解释替代。我们证明我们的模型可以基于训练好的神经网络进行训练(事后解释),也可以直接从训练数据中进行训练(可解释的操作符学习)。考虑了具有不同核函数的广义函数线性模型库,并使用稀疏回归来发现一个能够被分析呈现的可解释替代模型。我们在固体力学、流体力学和传输方面进行了测试案例。我们的结果表明,我们的模型可以达到与深度学习相当的准确性,并可提高OOD泛化能力,同时提供更多的透明度和解释性。我们的研究强调了科学机器学习中可解释性表示的重要性,并展示了函数线性模型作为解释和泛化深度学习的工具的潜力。
更新时间: 2024-04-17 15:16:07
领域: cs.LG,physics.flu-dyn
A Large-scale Fine-grained Analysis of Packages in Open-Source Software Ecosystems
Package managers such as NPM, Maven, and PyPI play a pivotal role in open-source software (OSS) ecosystems, streamlining the distribution and management of various freely available packages. The fine-grained details within software packages can unveil potential risks within existing OSS ecosystems, offering valuable insights for detecting malicious packages. In this study, we undertake a large-scale empirical analysis focusing on fine-grained information (FGI): the metadata, static, and dynamic functions. Specifically, we investigate the FGI usage across a diverse set of 50,000+ legitimate and 1,000+ malicious packages. Based on this diverse data collection, we conducted a comparative analysis between legitimate and malicious packages. Our findings reveal that (1) malicious packages have less metadata content and utilize fewer static and dynamic functions than legitimate ones; (2) malicious packages demonstrate a higher tendency to invoke HTTP/URL functions as opposed to other application services, such as FTP or SMTP; (3) FGI serves as a distinguishable indicator between legitimate and malicious packages; and (4) one dimension in FGI has sufficient distinguishable capability to detect malicious packages, and combining all dimensions in FGI cannot significantly improve overall performance.
Updated: 2024-04-17 15:16:01
标题: 一个关于开源软件生态系统中软件包的大规模细粒度分析
摘要: 像NPM、Maven和PyPI这样的包管理器在开源软件(OSS)生态系统中起着关键作用,简化了各种免费可用包的分发和管理。软件包中的细粒度细节可以揭示现有OSS生态系统中的潜在风险,为检测恶意包提供宝贵的见解。在这项研究中,我们进行了一项大规模的实证分析,重点关注细粒度信息(FGI):元数据、静态和动态功能。具体而言,我们调查了超过50,000个合法包和1,000多个恶意包之间的FGI使用情况。基于这些多样化的数据收集,我们对合法包和恶意包进行了比较分析。我们的发现表明:(1)恶意包的元数据内容较少,利用的静态和动态功能也较少;(2)恶意包倾向于调用HTTP/URL功能,而不是其他应用程序服务,如FTP或SMTP;(3)FGI作为合法包和恶意包之间的可区分指标;(4)FGI中的一个维度具有足够的区分能力来检测恶意包,而将FGI中的所有维度结合在一起并不能显著提高整体性能。
更新时间: 2024-04-17 15:16:01
领域: cs.SE,cs.CR
Predictive representations: building blocks of intelligence
Adaptive behavior often requires predicting future events. The theory of reinforcement learning prescribes what kinds of predictive representations are useful and how to compute them. This paper integrates these theoretical ideas with work on cognition and neuroscience. We pay special attention to the successor representation (SR) and its generalizations, which have been widely applied both as engineering tools and models of brain function. This convergence suggests that particular kinds of predictive representations may function as versatile building blocks of intelligence.
Updated: 2024-04-17 15:14:30
标题: 预测性表征:智能的基石
摘要: 适应性行为通常需要预测未来事件。强化学习理论规定了哪种预测性表征是有用的以及如何计算它们。本文将这些理论思想与认知和神经科学的研究结合起来。我们特别关注继承者表征(SR)及其推广,这些已广泛应用于工程工具和大脑功能模型。这种融合表明特定类型的预测性表征可能作为智能的多功能构建模块。
更新时间: 2024-04-17 15:14:30
领域: cs.AI,cs.LG
Leave No One Behind: Online Self-Supervised Self-Distillation for Sequential Recommendation
Sequential recommendation methods play a pivotal role in modern recommendation systems. A key challenge lies in accurately modeling user preferences in the face of data sparsity. To tackle this challenge, recent methods leverage contrastive learning (CL) to derive self-supervision signals by maximizing the mutual information of two augmented views of the original user behavior sequence. Despite their effectiveness, CL-based methods encounter a limitation in fully exploiting self-supervision signals for users with limited behavior data, as users with extensive behaviors naturally offer more information. To address this problem, we introduce a novel learning paradigm, named Online Self-Supervised Self-distillation for Sequential Recommendation ($S^4$Rec), effectively bridging the gap between self-supervised learning and self-distillation methods. Specifically, we employ online clustering to proficiently group users by their distinct latent intents. Additionally, an adversarial learning strategy is utilized to ensure that the clustering procedure is not affected by the behavior length factor. Subsequently, we employ self-distillation to facilitate the transfer of knowledge from users with extensive behaviors (teachers) to users with limited behaviors (students). Experiments conducted on four real-world datasets validate the effectiveness of the proposed method.
Updated: 2024-04-17 15:10:32
标题: 不要丢下任何人:在线自我监督自我蒸馏用于顺序推荐
摘要: 顺序推荐方法在现代推荐系统中发挥着关键作用。一个关键挑战在于在数据稀疏的情况下准确建模用户偏好。为了应对这一挑战,最近的方法利用对比学习(CL)通过最大化原始用户行为序列的两个增强视图的互信息来得出自我监督信号。尽管它们有效,基于CL的方法在充分利用自我监督信号方面遇到了限制,因为行为数据有限的用户自然提供更多信息。为了解决这个问题,我们引入了一种新的学习范式,名为在线自我监督自蒸馏顺序推荐($S^4$Rec),有效地弥合了自我监督学习和自蒸馏方法之间的差距。具体地,我们采用在线聚类方法有效地通过他们不同的潜在意图将用户分组。此外,采用对抗学习策略确保聚类过程不受行为长度因素的影响。随后,我们采用自蒸馏方法促进从具有丰富行为(教师)的用户向行为有限(学生)的用户转移知识。在四个真实数据集上进行的实验验证了所提出方法的有效性。
更新时间: 2024-04-17 15:10:32
领域: cs.IR,cs.LG
Using Game Engines and Machine Learning to Create Synthetic Satellite Imagery for a Tabletop Verification Exercise
Satellite imagery is regarded as a great opportunity for citizen-based monitoring of activities of interest. Relevant imagery may however not be available at sufficiently high resolution, quality, or cadence -- let alone be uniformly accessible to open-source analysts. This limits an assessment of the true long-term potential of citizen-based monitoring of nuclear activities using publicly available satellite imagery. In this article, we demonstrate how modern game engines combined with advanced machine-learning techniques can be used to generate synthetic imagery of sites of interest with the ability to choose relevant parameters upon request; these include time of day, cloud cover, season, or level of activity onsite. At the same time, resolution and off-nadir angle can be adjusted to simulate different characteristics of the satellite. While there are several possible use-cases for synthetic imagery, here we focus on its usefulness to support tabletop exercises in which simple monitoring scenarios can be examined to better understand verification capabilities enabled by new satellite constellations and very short revisit times.
Updated: 2024-04-17 15:09:31
标题: 使用游戏引擎和机器学习技术创建合成卫星图像进行桌面验证练习
摘要: 卫星图像被视为市民基于监测感兴趣活动的重要机会。然而,相关图像可能无法以足够高的分辨率、质量或频率提供,更别说对开源分析人员具有统一可访问性。这限制了使用公开可用卫星图像对市民基于监测核活动的真实长期潜力进行评估。在本文中,我们展示了如何利用现代游戏引擎结合先进的机器学习技术来生成具有选择相关参数能力的感兴趣地点的合成图像;这些参数包括白天时间、云覆盖、季节或现场活动水平。同时,分辨率和偏离角度可以调整以模拟卫星的不同特征。虽然合成图像有多种可能的用途,但我们在这里着重于其用于支持桌面演习的实用性,在这些演习中,可以检查简单的监测场景,以更好地理解新卫星星座和极短重访时间所提供的验证能力。
更新时间: 2024-04-17 15:09:31
领域: cs.CV,cs.AI,cs.HC,cs.LG
Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem
This paper aims to develop a learning method for a special class of traveling salesman problems (TSP), namely, the pickup-and-delivery TSP (PDTSP), which finds the shortest tour along a sequence of one-to-one pickup-and-delivery nodes. One-to-one here means that the transported people or goods are associated with designated pairs of pickup and delivery nodes, in contrast to that indistinguishable goods can be delivered to any nodes. In PDTSP, precedence constraints need to be satisfied that each pickup node must be visited before its corresponding delivery node. Classic operations research (OR) algorithms for PDTSP are difficult to scale to large-sized problems. Recently, reinforcement learning (RL) has been applied to TSPs. The basic idea is to explore and evaluate visiting sequences in a solution space. However, this approach could be less computationally efficient, as it has to potentially evaluate many infeasible solutions of which precedence constraints are violated. To restrict solution search within a feasible space, we utilize operators that always map one feasible solution to another, without spending time exploring the infeasible solution space. Such operators are evaluated and selected as policies to solve PDTSPs in an RL framework. We make a comparison of our method and baselines, including classic OR algorithms and existing learning methods. Results show that our approach can find tours shorter than baselines.
Updated: 2024-04-17 15:05:51
标题: 学习旅游:拾取和送货旅行推销员问题中解决方案可行性映射的操作员设计
摘要: 这篇论文旨在为一类特殊的旅行推销员问题(TSP)开发一种学习方法,即取件和送货TSP(PDTSP),该问题在一系列一对一取件和送货节点中寻找最短的路径。这里的一对一意味着被运输的人或货物与指定的取件和送货节点配对,与那些无法区分的货物可以送到任意节点不同。在PDTSP中,需要满足优先约束,即必须在访问相应的送货节点之前访问每个取件节点。传统的运筹学(OR)算法对于PDTSP难以扩展到大规模问题。最近,强化学习(RL)已被应用于TSP。基本思想是在解决方案空间中探索和评估访问顺序。然而,这种方法可能在计算上效率较低,因为必须潜在地评估许多违反优先约束的不可行解决方案。为了将解决方案搜索限制在可行空间内,我们利用总是将一个可行解决方案映射到另一个可行解决方案的运算符,而不必花费时间探索不可行解空间。这些运算符被评估并选择为在RL框架中解决PDTSP的策略。我们对我们的方法和基线进行了比较,包括经典的OR算法和现有的学习方法。结果显示我们的方法可以找到比基线更短的旅行路线。
更新时间: 2024-04-17 15:05:51
领域: cs.AI
Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models
With the rapid advancement of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a comprehensive survey of existing works on emerging and pressing bias and unfairness issues in IR systems when the integration of LLMs. We first unify bias and unfairness issues as distribution mismatch problems, providing a groundwork for categorizing various mitigation strategies through distribution alignment. Subsequently, we systematically delve into the specific bias and unfairness issues arising from three critical stages of LLMs integration into IR systems: data collection, model development, and result evaluation. In doing so, we meticulously review and analyze recent literature, focusing on the definitions, characteristics, and corresponding mitigation strategies associated with these issues. Finally, we identify and highlight some open problems and challenges for future work, aiming to inspire researchers and stakeholders in the IR field and beyond to better understand and mitigate bias and unfairness issues of IR in this LLM era. We also consistently maintain a GitHub repository for the relevant papers and resources in this rising direction at https://github.com/KID-22/LLM-IR-Bias-Fairness-Survey.
Updated: 2024-04-17 15:05:03
标题: 将"Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models"翻译为:统一信息检索中的偏见和不公平性:大型语言模型的挑战和机遇调查
摘要: 随着大型语言模型(LLMs)的快速发展,信息检索(IR)系统,如搜索引擎和推荐系统,已经经历了重大的范式转变。这种演变,虽然带来了新的机遇,但也引入了新的挑战,特别是在偏见和不公平方面,这可能威胁到信息生态系统。在本文中,我们对现有关于IR系统中存在的新兴和迫切的偏见和不公平问题的工作进行了全面调查,特别是在LLMs整合方面。我们首先将偏见和不公平问题统一为分布不匹配问题,为通过分布对齐实施各种缓解策略提供了基础。随后,我们系统地深入研究了LLMs整合到IR系统中的三个关键阶段产生的特定偏见和不公平问题:数据收集、模型开发和结果评估。在这样做的过程中,我们详细审查和分析了最近的文献,重点关注与这些问题相关的定义、特征和相应的缓解策略。最后,我们确定并强调了一些未解决的问题和挑战,为未来的工作提供启发,旨在激发IR领域以及其他领域的研究人员和利益相关者更好地理解和缓解在LLMs时代IR中的偏见和不公平问题。我们还在https://github.com/KID-22/LLM-IR-Bias-Fairness-Survey上保持了一个GitHub存储库,用于相关论文和资源。
更新时间: 2024-04-17 15:05:03
领域: cs.IR,cs.AI,cs.CL
Reformatted Alignment
The quality of finetuning data is crucial for aligning large language models (LLMs) with human values. Current methods to improve data quality are either labor-intensive or prone to factual errors caused by LLM hallucinations. This paper explores elevating the quality of existing instruction data to better align with human values, introducing a simple and effective approach named ReAlign, which reformats the responses of instruction data into a format that better aligns with pre-established criteria and the collated evidence. This approach minimizes human annotation, hallucination, and the difficulty in scaling, remaining orthogonal to existing alignment techniques. Experimentally, ReAlign significantly boosts the general alignment ability, math reasoning, factuality, and readability of the LLMs. Encouragingly, without introducing any additional data or advanced training techniques, and merely by reformatting the response, LLaMA-2-13B's mathematical reasoning ability on GSM8K can be improved from 46.77% to 56.63% in accuracy. Additionally, a mere 5% of ReAlign data yields a 67% boost in general alignment ability measured by the Alpaca dataset. This work highlights the need for further research into the science and mechanistic interpretability of LLMs. We have made the associated code and data publicly accessible to support future studies at https://github.com/GAIR-NLP/ReAlign.
Updated: 2024-04-17 15:03:19
标题: 重新格式化对齐
摘要: 微调数据质量对于将大型语言模型(LLMs)与人类价值观对齐至关重要。目前改善数据质量的方法要么需要大量人力,要么容易受到LLM幻觉引起的事实错误的影响。本文探讨了提升现有指令数据质量以更好地与人类价值观对齐的方法,引入了一种简单有效的方法,名为ReAlign,它将指令数据的响应重新格式化为更符合预先建立的标准和汇编证据的格式。该方法最大程度地减少了人工标注、幻觉和扩展难度,与现有对齐技术保持正交。在实验中,ReAlign显著提升了LLMs的一般对齐能力、数学推理能力、事实性和可读性。 令人鼓舞的是,在不引入任何额外数据或高级训练技术的情况下,仅通过重新格式化响应,LLaMA-2-13B在GSM8K上的数学推理能力从46.77%提高到56.63%的准确率。此外,仅有5%的ReAlign数据在Alpaca数据集上测量的一般对齐能力中提升了67%。这项工作强调了对LLMs的科学和机械可解释性的进一步研究的必要性。我们已经将相关代码和数据公开发布,以支持未来研究,网址为https://github.com/GAIR-NLP/ReAlign。
更新时间: 2024-04-17 15:03:19
领域: cs.CL,cs.AI,cs.LG
GOLF: Goal-Oriented Long-term liFe tasks supported by human-AI collaboration
The advent of ChatGPT and similar large language models (LLMs) has revolutionized the human-AI interaction and information-seeking process. Leveraging LLMs as an alternative to search engines, users can now access summarized information tailored to their queries, significantly reducing the cognitive load associated with navigating vast information resources. This shift underscores the potential of LLMs in redefining information access paradigms. Drawing on the foundation of task-focused information retrieval and LLMs' task planning ability, this research extends the scope of LLM capabilities beyond routine task automation to support users in navigating long-term and significant life tasks. It introduces the GOLF framework (Goal-Oriented Long-term liFe tasks), which focuses on enhancing LLMs' ability to assist in significant life decisions through goal orientation and long-term planning. The methodology encompasses a comprehensive simulation study to test the framework's efficacy, followed by model and human evaluations to develop a dataset benchmark for long-term life tasks, and experiments across different models and settings. By shifting the focus from short-term tasks to the broader spectrum of long-term life goals, this research underscores the transformative potential of LLMs in enhancing human decision-making processes and task management, marking a significant step forward in the evolution of human-AI collaboration.
Updated: 2024-04-17 15:00:58
标题: 高尔夫:目标导向的长期生活任务,人工智能与人类协作支持
摘要: ChatGPT和类似的大型语言模型(LLMs)的出现彻底改变了人工智能与人类互动和信息检索过程。利用LLMs作为搜索引擎的替代方案,用户现在可以访问根据其查询定制的摘要信息,显著减少了浏览庞大信息资源所带来的认知负担。这种转变突显了LLMs在重新定义信息获取范式方面的潜力。借鉴任务导向的信息检索基础和LLMs的任务规划能力,本研究将LLM的能力范围扩展到了支持用户在长期和重要的生活任务中导航,而不仅仅是常规任务自动化。它引入了GOLF框架(Goal-Oriented Long-term liFe tasks),重点是通过目标导向和长期规划增强LLMs在重要生活决策中的辅助能力。该方法包括一个全面的模拟研究来测试框架的有效性,接着进行模型和人类评估以开发一个长期生活任务的数据集基准,并在不同模型和设置下进行实验。通过将焦点从短期任务转移到更广泛的长期生活目标范围,这项研究强调了LLMs在增强人类决策过程和任务管理方面的转型潜力,标志着人工智能与人类协作演变的重要一步。
更新时间: 2024-04-17 15:00:58
领域: cs.HC,cs.AI,cs.IR
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot transfer in several applications. Taking inspiration from this, in our paper we explore the use of global image priors generated from a pre-trained ViT model to provide more detailed contextual information. We argue that the embedding vector from a ViT model, pre-trained on a large dataset, captures greater relevant information for SIDE than the usual route of generating pseudo image captions, followed by CLIP based text embeddings. Based on this idea, we propose a new SIDE model using a diffusion backbone which is conditioned on ViT embeddings. Our proposed design establishes a new state-of-the-art (SOTA) for SIDE on NYUv2 dataset, achieving Abs Rel error of 0.059 (14% improvement) compared to 0.069 by the current SOTA (VPD). And on KITTI dataset, achieving Sq Rel error of 0.139 (2% improvement) compared to 0.142 by the current SOTA (GEDepth). For zero-shot transfer with a model trained on NYUv2, we report mean relative improvement of (20%, 23%, 81%, 25%) over NeWCRFs on (Sun-RGBD, iBims1, DIODE, HyperSim) datasets, compared to (16%, 18%, 45%, 9%) by ZoeDepth. The project page is available at https://ecodepth-iitd.github.io
Updated: 2024-04-17 14:59:51
标题: ECoDepth: 单目深度估计的有效扩散模型调节
摘要: 在没有视差线索的情况下,基于学习的单图深度估计(SIDE)模型在图像中严重依赖阴影和上下文线索。虽然这种简单性很吸引人,但是必须在大型和多样化的数据集上训练这些模型,而这些数据集很难捕捉。已经表明,在几个应用中,使用来自预训练基础模型(如CLIP)的嵌入可以改进零样本转移。受此启发,在我们的论文中,我们探讨了使用从预训练的ViT模型生成的全局图像先验,以提供更详细的上下文信息。我们认为,来自在大型数据集上预训练的ViT模型的嵌入向量捕捉到SIDE比通常生成伪图像标题,然后是基于CLIP的文本嵌入更多相关信息。基于这个想法,我们提出了一个新的SIDE模型,使用一个以ViT嵌入为条件的扩散骨干。我们提出的设计在NYUv2数据集上建立了一个新的最新技术(SOTA),相对于当前SOTA(VPD)的0.069,实现了0.059的Abs Rel错误(14%的改进)。在KITTI数据集上,相对于当前SOTA(GEDepth)的0.142,实现了0.139的Sq Rel错误(2%的改进)。对于在NYUv2上训练的模型的零样本转移,我们在(Sun-RGBD,iBims1,DIODE,HyperSim)数据集上报告了相对平均改进(20%,23%,81%,25%),而ZoeDepth为(16%,18%,45%,9%)。项目页面可在https://ecodepth-iitd.github.io找到。
更新时间: 2024-04-17 14:59:51
领域: cs.CV,cs.AI,cs.LG
Real-Time Trajectory Synthesis with Local Differential Privacy
Trajectory streams are being generated from location-aware devices, such as smartphones and in-vehicle navigation systems. Due to the sensitive nature of the location data, directly sharing user trajectories suffers from privacy leakage issues. Local differential privacy (LDP), which perturbs sensitive data on the user side before it is shared or analyzed, emerges as a promising solution for private trajectory stream collection and analysis. Unfortunately, existing stream release approaches often neglect the rich spatial-temporal context information within trajectory streams, resulting in suboptimal utility and limited types of downstream applications. To this end, we propose RetraSyn, a novel real-time trajectory synthesis framework, which is able to perform on-the-fly trajectory synthesis based on the mobility patterns privately extracted from users' trajectory streams. Thus, the downstream trajectory analysis can be performed on the high-utility synthesized data with privacy protection. We also take the genuine behaviors of real-world mobile travelers into consideration, ensuring authenticity and practicality. The key components of RetraSyn include the global mobility model, dynamic mobility update mechanism, real-time synthesis, and adaptive allocation strategy. We conduct extensive experiments on multiple real-world and synthetic trajectory datasets under various location-based utility metrics, encompassing both streaming and historical scenarios. The empirical results demonstrate the superiority and versatility of our proposed framework.
Updated: 2024-04-17 14:55:49
标题: 使用本地差分隐私实现实时轨迹合成
摘要: 轨迹流是由诸如智能手机和车载导航系统等位置感知设备生成的。由于位置数据的敏感性质,直接共享用户轨迹存在隐私泄露问题。局部差分隐私(LDP)在共享或分析之前在用户端扰动敏感数据,成为私密轨迹流收集和分析的有前途的解决方案。不幸的是,现有的流释放方法经常忽视轨迹流中丰富的时空上下文信息,导致效用不佳和下游应用类型受限。为此,我们提出了RetraSyn,一个新颖的实时轨迹合成框架,能够基于从用户轨迹流中私密提取的移动模式实时进行轨迹合成。因此,下游轨迹分析可以在保护隐私的高效用合成数据上进行。我们还考虑了真实世界移动旅行者的真实行为,确保真实性和实用性。RetraSyn的关键组件包括全局移动模型、动态移动更新机制、实时合成和自适应分配策略。我们在多个真实世界和合成轨迹数据集上进行了大量实验,涵盖了各种基于位置的效用指标,包括流式和历史情景。实证结果证明了我们提出的框架的优越性和多功能性。
更新时间: 2024-04-17 14:55:49
领域: cs.DB,cs.CR
AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts
Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including suicidal behaviors in extreme cases. Yet, there is a notable absence of methodologies for analyzing cognitive pathways that could aid psychotherapists in conducting effective interventions online. In this study, we gathered data from social media and established the task of extracting cognitive pathways, annotating the data based on a cognitive theoretical framework. We initially categorized the task of extracting cognitive pathways as a hierarchical text classification with four main categories and nineteen subcategories. Following this, we structured a text summarization task to help psychotherapists quickly grasp the essential information. Our experiments evaluate the performance of deep learning and large language models (LLMs) on these tasks. The results demonstrate that our deep learning method achieved a micro-F1 score of 62.34% in the hierarchical text classification task. Meanwhile, in the text summarization task, GPT-4 attained a Rouge-1 score of 54.92 and a Rouge-2 score of 30.86, surpassing the experimental deep learning model's performance. However, it may suffer from an issue of hallucination. We have made all models and codes publicly available to support further research in this field.
Updated: 2024-04-17 14:55:27
标题: AI增强的认知行为疗法:深度学习和大型语言模型用于从社交媒体文本中提取认知路径
摘要: 认知行为疗法(CBT)是一种有效的技术,用于处理源自心理疾病的非理性思维,但它需要精确识别认知路径以成功应用于患者护理中。在当前社会中,个人经常在社交媒体上针对特定话题表达消极情绪,通常表现出认知扭曲,极端情况下包括自杀行为。然而,在分析可能有助于心理治疗师在线进行有效干预的认知路径方面存在明显缺乏方法。在这项研究中,我们从社交媒体中收集数据,并建立了提取认知路径的任务,根据认知理论框架对数据进行注释。我们最初将提取认知路径的任务分类为具有四个主要类别和十九个子类别的分层文本分类。在此之后,我们构建了一个文本摘要任务,以帮助心理治疗师快速把握关键信息。我们的实验评估了深度学习和大型语言模型(LLMs)在这些任务上的表现。结果表明,我们的深度学习方法在分层文本分类任务中实现了62.34%的微型F1分数。与此同时,在文本摘要任务中,GPT-4获得了54.92的Rouge-1分数和30.86的Rouge-2分数,超过了实验深度学习模型的表现。然而,它可能存在幻觉问题。我们已经公开提供了所有模型和代码,以支持这一领域的进一步研究。
更新时间: 2024-04-17 14:55:27
领域: cs.CL,cs.LG
Research on emotionally intelligent dialogue generation based on automatic dialogue system
Automated dialogue systems are important applications of artificial intelligence, and traditional systems struggle to understand user emotions and provide empathetic feedback. This study integrates emotional intelligence technology into automated dialogue systems and creates a dialogue generation model with emotional intelligence through deep learning and natural language processing techniques. The model can detect and understand a wide range of emotions and specific pain signals in real time, enabling the system to provide empathetic interaction. By integrating the results of the study "Can artificial intelligence detect pain and express pain empathy?", the model's ability to understand the subtle elements of pain empathy has been enhanced, setting higher standards for emotional intelligence dialogue systems. The project aims to provide theoretical understanding and practical suggestions to integrate advanced emotional intelligence capabilities into dialogue systems, thereby improving user experience and interaction quality.
Updated: 2024-04-17 14:55:03
标题: 基于自动对话系统的情绪智能对话生成研究
摘要: 自动对话系统是人工智能的重要应用,传统系统很难理解用户的情绪并提供共情反馈。本研究将情感智能技术整合到自动对话系统中,通过深度学习和自然语言处理技术创建了具有情感智能的对话生成模型。该模型能够实时检测和理解各种情绪和特定的疼痛信号,使系统能够提供共情互动。通过整合研究“人工智能能否检测到疼痛并表达共情?”的结果,增强了模型理解疼痛共情微妙元素的能力,为情感智能对话系统设定更高的标准。该项目旨在为将先进的情感智能能力整合到对话系统中提供理论理解和实际建议,从而提高用户体验和交互质量。
更新时间: 2024-04-17 14:55:03
领域: cs.AI,cs.CL
Open-Ended Wargames with Large Language Models
Wargames are a powerful tool for understanding and rehearsing real-world decision making. Automated play of wargames using artificial intelligence (AI) enables possibilities beyond those of human-conducted games, such as playing the game many times over to see a range of possible outcomes. There are two categories of wargames: quantitative games, with discrete types of moves, and qualitative games, which revolve around open-ended responses. Historically, automation efforts have focused on quantitative games, but large language models (LLMs) make it possible to automate qualitative wargames. We introduce "Snow Globe," an LLM-powered multi-agent system for playing qualitative wargames. With Snow Globe, every stage of a text-based qualitative wargame from scenario preparation to post-game analysis can be optionally carried out by AI, humans, or a combination thereof. We describe its software architecture conceptually and release an open-source implementation alongside this publication. As case studies, we simulate a tabletop exercise about an AI incident response and a political wargame about a geopolitical crisis. We discuss potential applications of the approach and how it fits into the broader wargaming ecosystem.
Updated: 2024-04-17 14:54:58
标题: 使用大型语言模型进行开放式战争游戏
摘要: 战争游戏是理解和演练现实决策的强大工具。使用人工智能(AI)自动玩战争游戏可以实现超越人类进行游戏的可能性,例如多次玩游戏以看到一系列可能的结果。战争游戏有两种类别:定量游戏,具有离散类型的移动,和定性游戏,围绕开放性回应展开。历史上,自动化工作重点放在定量游戏上,但大型语言模型(LLMs)使得自动化定性战争游戏成为可能。我们介绍了“雪球”(Snow Globe),这是一个由LLM驱动的多代理系统,用于玩定性战争游戏。使用雪球,文本型定性战争游戏的每个阶段,从情景准备到游戏后分析,可以选择由AI、人类或二者组合进行。我们在概念上描述了其软件架构,并在本文发表的同时发布了一个开源实现。作为案例研究,我们模拟了一个关于AI事件响应的桌面演习和一个关于地缘政治危机的政治战争游戏。我们讨论了该方法的潜在应用以及它如何融入更广泛的战争游戏生态系统。
更新时间: 2024-04-17 14:54:58
领域: cs.CL,cs.AI,cs.CY
Prediction of Unmanned Surface Vessel Motion Attitude Based on CEEMDAN-PSO-SVM
Unmanned boats, while navigating at sea, utilize active compensation systems to mitigate wave disturbances experienced by onboard instruments and equipment. However, there exists a lag in the measurement of unmanned boat attitudes, thus introducing unmanned boat motion attitude prediction to compensate for the lag in the signal acquisition process. This paper, based on the basic principles of waves, derives the disturbance patterns of waves on unmanned boats from the wave energy spectrum. Through simulation analysis of unmanned boat motion attitudes, motion attitude data is obtained, providing experimental data for subsequent work. A combined prediction model based on Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), Particle Swarm Optimization (PSO), and Support Vector Machine (SVM) is designed to predict the motion attitude of unmanned boats. Simulation results validate its superior prediction accuracy compared to traditional prediction models. For example, in terms of mean absolute error, it improves by 17% compared to the EMD-PSO-SVM model.
Updated: 2024-04-17 14:53:03
标题: 基于CEEMDAN-PSO-SVM的无人水面舰艇运动姿态预测
摘要: 无人船在海上航行时,利用主动补偿系统来减轻船上仪器和设备所经历的波浪干扰。然而,无人船姿态测量存在时滞,因此引入了无人船运动姿态预测来补偿信号采集过程中的时滞。本文基于波浪的基本原理,从波能谱中推导出波浪对无人船的干扰模式。通过对无人船运动姿态的模拟分析,获得了运动姿态数据,为后续工作提供了实验数据。设计了一种基于完整集成经验模态分解自适应噪声(CEEMDAN)、粒子群优化(PSO)和支持向量机(SVM)的组合预测模型,用于预测无人船的运动姿态。模拟结果验证了其相对于传统预测模型的优越预测准确性。例如,在平均绝对误差方面,与EMD-PSO-SVM模型相比提高了17%。
更新时间: 2024-04-17 14:53:03
领域: cs.AI
Neuroevolving Electronic Dynamical Networks
Neuroevolution is a powerful method of applying an evolutionary algorithm to refine the performance of artificial neural networks through natural selection; however, the fitness evaluation of these networks can be time-consuming and computationally expensive, particularly for continuous time recurrent neural networks (CTRNNs) that necessitate the simulation of differential equations. To overcome this challenge, field programmable gate arrays (FPGAs) have emerged as an increasingly popular solution, due to their high performance and low power consumption. Further, their ability to undergo dynamic and partial reconfiguration enables the extremely rapid evaluation of the fitness of CTRNNs, effectively addressing the bottleneck associated with conventional methods of evolvable hardware. By incorporating fitness evaluation directly upon the programmable logic of the FPGA, hyper-parallel evaluation becomes feasible, dramatically reducing the time required for assessment. This inherent parallelism of FPGAs accelerates the entire neuroevolutionary process by several orders of magnitude, facilitating faster convergence to an optimal solution. The work presented in this study demonstrates the potential of utilizing dynamic and partial reconfiguration on capable FPGAs as a powerful platform for neuroevolving dynamic neural networks.
Updated: 2024-04-17 14:50:36
标题: 神经进化电子动力网络
摘要: 神经进化是一种强大的方法,通过自然选择将进化算法应用于改进人工神经网络的性能;然而,这些网络的适应度评估可能耗时且计算成本高,特别是对于需要模拟微分方程的连续时间递归神经网络(CTRNNs)。为了克服这一挑战,现场可编程门阵列(FPGAs)作为一种越来越受欢迎的解决方案,因为它们具有高性能和低功耗。此外,它们具有动态和部分重配置的能力,可以极快地评估CTRNNs的适应度,有效地解决了传统可进化硬件方法所面临的瓶颈。通过直接将适应度评估整合到FPGA的可编程逻辑中,超并行评估变得可行,显著减少了评估所需的时间。FPGAs的固有并行性通过几个数量级加速整个神经进化过程,促进更快地收敛到最佳解决方案。本研究展示的工作表明,利用具有动态和部分重配置功能的FPGAs作为神经进化动态神经网络的强大平台的潜力。
更新时间: 2024-04-17 14:50:36
领域: cs.NE,cs.AI,cs.AR
OmniLytics+: A Secure, Efficient, and Affordable Blockchain Data Market for Machine Learning through Off-Chain Processing
The rapid development of large machine learning (ML) models requires a massive amount of training data, resulting in booming demands of data sharing and trading through data markets. Traditional centralized data markets suffer from low level of security, and emerging decentralized platforms are faced with efficiency and privacy challenges. In this paper, we propose OmniLytics+, the first decentralized data market, built upon blockchain and smart contract technologies, to simultaneously achieve 1) data (resp., model) privacy for the data (resp. model) owner; 2) robustness against malicious data owners; 3) efficient data validation and aggregation. Specifically, adopting the zero-knowledge (ZK) rollup paradigm, OmniLytics+ proposes to secret share encrypted local gradients, computed from the encrypted global model, with a set of untrusted off-chain servers, who collaboratively generate a ZK proof on the validity of the gradient. In this way, the storage and processing overheads are securely offloaded from blockchain verifiers, significantly improving the privacy, efficiency, and affordability over existing rollup solutions. We implement the proposed OmniLytics+ data market as an Ethereum smart contract [41]. Extensive experiments demonstrate the effectiveness of OmniLytics+ in training large ML models in presence of malicious data owner, and the substantial advantages of OmniLytics+ in gas cost and execution time over baselines.
Updated: 2024-04-17 14:41:14
标题: OmniLytics+:一种安全、高效且经济实惠的区块链数据市场,通过离链处理为机器学习提供支持
摘要: 大型机器学习(ML)模型的快速发展需要大量的训练数据,导致数据共享和交易需求在数据市场中蓬勃发展。传统的集中式数据市场存在安全性较低的问题,新兴的去中心化平台面临效率和隐私挑战。在本文中,我们提出了OmniLytics+,这是第一个建立在区块链和智能合约技术之上的去中心化数据市场,可以同时实现以下目标:1)为数据(模型)所有者提供数据(模型)隐私保护;2)抵御恶意数据所有者的攻击;3)有效进行数据验证和聚合。具体来说,OmniLytics+采用了零知识(ZK)Rollup范例,提出了使用加密本地梯度对从加密全局模型计算而来的数据进行秘密共享,与一组不受信任的链下服务器协作生成关于梯度有效性的ZK证明。通过这种方式,存储和处理开销被安全地从区块链验证者中卸载,显著提高了隐私性、效率和可承受性,超越了现有Rollup解决方案。我们将提出的OmniLytics+数据市场实现为以太坊智能合约。广泛的实验表明,在恶意数据所有者存在的情况下,OmniLytics+在训练大型ML模型方面的有效性,以及在燃气成本和执行时间方面相对基准的巨大优势。
更新时间: 2024-04-17 14:41:14
领域: cs.CR,cs.LG
TorchSurv: A Lightweight Package for Deep Survival Analysis
TorchSurv is a Python package that serves as a companion tool to perform deep survival modeling within the PyTorch environment. Unlike existing libraries that impose specific parametric forms, TorchSurv enables the use of custom PyTorch-based deep survival models. With its lightweight design, minimal input requirements, full PyTorch backend, and freedom from restrictive survival model parameterizations, TorchSurv facilitates efficient deep survival model implementation and is particularly beneficial for high-dimensional and complex input data scenarios.
Updated: 2024-04-17 14:38:04
标题: TorchSurv:一种用于深度生存分析的轻量级包
摘要: TorchSurv是一个Python软件包,作为PyTorch环境中执行深度生存建模的伴侣工具。与现有的库不同,TorchSurv不强加特定的参数形式,而是允许使用基于自定义PyTorch的深度生存模型。由于其轻量化设计、最小的输入要求、完整的PyTorch后端以及不受限制的生存模型参数化,TorchSurv有助于高效地实现深度生存模型,并特别有益于高维和复杂输入数据场景。
更新时间: 2024-04-17 14:38:04
领域: cs.LG
Instantiations and Computational Aspects of Non-Flat Assumption-based Argumentation
Most existing computational tools for assumption-based argumentation (ABA) focus on so-called flat frameworks, disregarding the more general case. In this paper, we study an instantiation-based approach for reasoning in possibly non-flat ABA. We make use of a semantics-preserving translation between ABA and bipolar argumentation frameworks (BAFs). By utilizing compilability theory, we establish that the constructed BAFs will in general be of exponential size. In order to keep the number of arguments and computational cost low, we present three ways of identifying redundant arguments. Moreover, we identify fragments of ABA which admit a poly-sized instantiation. We propose two algorithmic approaches for reasoning in possibly non-flat ABA. The first approach utilizes the BAF instantiation while the second works directly without constructing arguments. An empirical evaluation shows that the former outperforms the latter on many instances, reflecting the lower complexity of BAF reasoning. This result is in contrast to flat ABA, where direct approaches dominate instantiation-based approaches.
Updated: 2024-04-17 14:36:47
标题: 非扁平假设论证的实例化和计算方面
摘要: 大多数现有的基于假设的论证(ABA)计算工具都集中在所谓的平面框架上,忽略了更一般的情况。在本文中,我们研究了一种基于实例的方法,用于可能非平面的ABA推理。我们利用ABA和双极论证框架(BAFs)之间的语义保留转换。通过利用可编译性理论,我们建立了构建的BAFs通常将呈指数级大小的结论。为了保持论点数量和计算成本低,我们提出了三种识别多余论点的方法。此外,我们确定了ABA的片段,其允许多项实例化。我们提出了两种算法方法,用于可能非平面的ABA推理。第一种方法利用BAF实例化,而第二种方法直接工作,而不构建论点。经验评估表明,前者在许多实例中优于后者,反映了BAF推理的较低复杂性。这一结果与平面ABA相反,在那里直接方法占据主导地位,而实例化方法次之。
更新时间: 2024-04-17 14:36:47
领域: cs.AI
Scaling Instructable Agents Across Many Simulated Worlds
Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as open-ended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games.
Updated: 2024-04-17 14:36:27
标题: 在许多模拟世界中扩展可指导代理
摘要: 构建具有能够在任何3D环境中遵循任意语言指令的具体化人工智能系统是创建通用人工智能的关键挑战。实现这一目标需要学习将语言与感知和具体行动相结合,以完成复杂任务。可扩展、可指导、多世界代理(SIMA)项目通过训练代理程序在各种虚拟3D环境中遵循自由形式指令来解决这个挑战,包括精心策划的研究环境以及开放式的商业视频游戏。我们的目标是开发一种可指导的代理,可以在任何模拟的3D环境中完成人类可以做的任何事情。我们的方法专注于以语言驱动的普适性,同时施加最少的假设。我们的代理程序使用通用的、类似于人类的接口实时与环境交互:输入为图像观察和语言指令,输出为键盘和鼠标操作。这一通用方法具有挑战性,但它使代理程序能够在许多视觉复杂和语义丰富的环境中将语言具体化,并且还使我们能够方便地在新环境中运行代理程序。在本文中,我们描述了我们的动机和目标,我们取得的初步进展,以及在几个不同的研究环境和多种商业视频游戏中的有前途的初步结果。
更新时间: 2024-04-17 14:36:27
领域: cs.RO,cs.AI,cs.HC,cs.LG
Explainable Lung Disease Classification from Chest X-Ray Images Utilizing Deep Learning and XAI
Lung diseases remain a critical global health concern, and it's crucial to have accurate and quick ways to diagnose them. This work focuses on classifying different lung diseases into five groups: viral pneumonia, bacterial pneumonia, COVID, tuberculosis, and normal lungs. Employing advanced deep learning techniques, we explore a diverse range of models including CNN, hybrid models, ensembles, transformers, and Big Transfer. The research encompasses comprehensive methodologies such as hyperparameter tuning, stratified k-fold cross-validation, and transfer learning with fine-tuning.Remarkably, our findings reveal that the Xception model, fine-tuned through 5-fold cross-validation, achieves the highest accuracy of 96.21\%. This success shows that our methods work well in accurately identifying different lung diseases. The exploration of explainable artificial intelligence (XAI) methodologies further enhances our understanding of the decision-making processes employed by these models, contributing to increased trust in their clinical applications.
Updated: 2024-04-17 14:34:35
标题: 利用深度学习和XAI解释性肺部 X 射线图像分类
摘要: 肺部疾病仍然是一个关键的全球健康问题,有准确快速的诊断方法至关重要。本研究着眼于将不同肺部疾病分类为五组:病毒性肺炎、细菌性肺炎、COVID、结核病和正常肺。采用先进的深度学习技术,我们探索了包括CNN、混合模型、集成模型、变压器和大规模转移在内的多样化模型。研究涵盖了超参数调整、分层k折交叉验证和微调传递学习等综合方法。值得注意的是,我们的研究结果表明,通过5折交叉验证微调的Xception模型取得了最高96.21\%的准确率。这一成功表明我们的方法在准确识别不同肺部疾病方面表现良好。探索可解释人工智能(XAI)方法进一步增强了我们对这些模型决策过程的理解,有助于增加对其临床应用的信任。
更新时间: 2024-04-17 14:34:35
领域: eess.IV,cs.CV,cs.LG
Short-term wind speed forecasting model based on an attention-gated recurrent neural network and error correction strategy
The accurate wind speed series forecast is very pivotal to security of grid dispatching and the application of wind power. Nevertheless, on account of their nonlinear and non-stationary nature, their short-term forecast is extremely challenging. Therefore, this dissertation raises one short-term wind speed forecast pattern on the foundation of attention with an improved gated recurrent neural network (AtGRU) and a tactic of error correction. That model uses the AtGRU model as the preliminary predictor and the GRU model as the error corrector. At the beginning, SSA (singular spectrum analysis) is employed in previous wind speed series for lessening the noise. Subsequently, historical wind speed series is going to be used for the predictor training. During this process, the prediction can have certain errors. The sequence of these errors processed by variational modal decomposition (VMD) is used to train the corrector of error. The eventual forecast consequence is just the sum of predictor forecast and error corrector. The proposed SSA-AtGRU-VMD-GRU model outperforms the compared models in three case studies on Woodburn, St. Thomas, and Santa Cruz. It is indicated that the model evidently enhances the correction of the wind speed forecast.
Updated: 2024-04-17 14:27:45
标题: 基于注意力门控循环神经网络和误差校正策略的短期风速预测模型
摘要: Wind speed forecasting is crucial for grid dispatching and wind power application. However, due to the nonlinear and non-stationary nature of wind speed data, short-term forecasting is challenging. This study proposes a short-term wind speed forecast model using an improved gated recurrent neural network (AtGRU) and error correction. The model combines AtGRU as the initial predictor and GRU as the error corrector. Singular spectrum analysis (SSA) is used to reduce noise in the wind speed data, and historical data is used for predictor training. Errors in the prediction are processed using variational modal decomposition (VMD) to train the error corrector. The final forecast is the sum of the predictor forecast and error corrector. The proposed model outperforms other models in case studies on Woodburn, St. Thomas, and Santa Cruz, demonstrating improved wind speed forecast accuracy.
更新时间: 2024-04-17 14:27:45
领域: cs.LG,cs.AI,physics.ao-ph
E2R: a Hierarchical-Learning inspired Novelty-Search method to generate diverse repertoires of grasping trajectories
Robotics grasping refers to the task of making a robotic system pick an object by applying forces and torques on its surface. Despite the recent advances in data-driven approaches, grasping remains an unsolved problem. Most of the works on this task are relying on priors and heavy constraints to avoid the exploration problem. Novelty Search (NS) refers to evolutionary algorithms that replace selection of best performing individuals with selection of the most novel ones. Such methods have already shown promising results on hard exploration problems. In this work, we introduce a new NS-based method that can generate large datasets of grasping trajectories in a platform-agnostic manner. Inspired by the hierarchical learning paradigm, our method decouples approach and prehension to make the behavioral space smoother. Experiments conducted on 3 different robot-gripper setups and on several standard objects shows that our method outperforms state-of-the-art for generating diverse repertoire of grasping trajectories, getting a higher successful run ratio, as well as a better diversity for both approach and prehension. Some of the generated solutions have been successfully deployed on a real robot, showing the exploitability of the obtained repertoires.
Updated: 2024-04-17 14:20:02
标题: E2R:一种受层次学习启发的新颖搜索方法,用于生成多样化的抓取轨迹库
摘要: 机器人抓取是指通过在物体表面施加力和扭矩使机器人系统抓取物体的任务。尽管最近数据驱动方法取得了进展,但抓取仍然是一个尚未解决的问题。大多数关于这一任务的研究依赖于先验知识和严格的约束来避免探索问题。新颖性搜索(NS)指的是用选择最新颖的个体替代选择表现最好的个体的进化算法。这种方法已经在难题探索问题上显示出了有希望的结果。在这项工作中,我们介绍了一种基于NS的新方法,可以以平台无关的方式生成大量抓取轨迹数据集。受层次学习范式的启发,我们的方法将接近和抓取解耦,使行为空间更加平滑。在3种不同的机器人夹具设置和几种标准物体上进行的实验表明,我们的方法在生成多样化的抓取轨迹方面优于最先进技术,获得了更高的成功运行比例,以及更好的接近和抓取多样性。一些生成的解决方案已成功部署在真实机器人上,显示了所获得的技能库的可利用性。
更新时间: 2024-04-17 14:20:02
领域: cs.RO,cs.LG
Time Fairness in Online Knapsack Problems
The online knapsack problem is a classic problem in the field of online algorithms. Its canonical version asks how to pack items of different values and weights arriving online into a capacity-limited knapsack so as to maximize the total value of the admitted items. Although optimal competitive algorithms are known for this problem, they may be fundamentally unfair, i.e., individual items may be treated inequitably in different ways. We formalize a practically-relevant notion of time fairness which effectively models a trade off between static and dynamic pricing in a motivating application such as cloud resource allocation, and show that existing algorithms perform poorly under this metric. We propose a parameterized deterministic algorithm where the parameter precisely captures the Pareto-optimal trade-off between fairness (static pricing) and competitiveness (dynamic pricing). We show that randomization is theoretically powerful enough to be simultaneously competitive and fair; however, it does not work well in experiments. To further improve the trade-off between fairness and competitiveness, we develop a nearly-optimal learning-augmented algorithm which is fair, consistent, and robust (competitive), showing substantial performance improvements in numerical experiments.
Updated: 2024-04-17 14:13:52
标题: 在线背包问题中的时间公平性
摘要: 网络背包问题是在线算法领域的一个经典问题。其经典版本要求如何将不同价值和重量的物品在线到达时装入容量有限的背包中,以最大化所允许物品的总价值。虽然对于这个问题已知存在最优竞争算法,但它们可能基本上是不公平的,即个别物品可能以不同方式被不公平地对待。我们形式化了一个在实践中相关的时间公平性概念,有效地模拟了静态和动态定价之间的权衡,在诸如云资源分配等激励应用中展现出来,并且展示了现有算法在这一度量下表现不佳。我们提出了一个参数化确定性算法,其中参数精确捕捉了公平性(静态定价)和竞争性(动态定价)之间的帕累托最优权衡。我们表明随机化在理论上足够强大,可以同时具备竞争性和公平性;然而,在实验中并不有效。为了进一步改善公平性和竞争性之间的权衡,我们开发了一个几乎最优的学习增强算法,它既公平、一致,又具有鲁棒性(竞争力),在数值实验中显示出显著的性能改进。
更新时间: 2024-04-17 14:13:52
领域: cs.LG,cs.CY,cs.DS
Don't Let MEV Slip: The Costs of Swapping on the Uniswap Protocol
We present the first in-depth empirical characterization of the costs of trading on a decentralized exchange (DEX). Using quoted prices from the Uniswap Labs interface for two pools -- USDC-ETH (5bps) and PEPE-ETH (30bps) -- we evaluate the efficiency of trading on DEXs. Our main tool is slippage -- the difference between the realized execution price of a trade, and its quoted price -- which we breakdown into its benign and adversarial components. We also present an alternative way to quantify and identify slippage due to adversarial reordering of transactions, which we call reordering slippage, that does not require quoted prices or mempool data to calculate. We find that the composition of transaction costs varies tremendously with the trade's characteristics. Specifically, while for small swaps, gas costs dominate costs, for large swaps price-impact and slippage account for the majority of it. Moreover, when trading PEPE, a popular 'memecoin', the probability of adversarial slippage is about 80% higher than when trading a mature asset like USDC. Overall, our results provide preliminary evidence that DEXs offer a compelling trust-less alternative to centralized exchanges for trading digital assets.
Updated: 2024-04-17 14:12:56
标题: 不要让MEV滑落:在Uniswap协议上进行交换的成本
摘要: 我们首次对去中心化交易所(DEX)交易成本进行了深入的实证表征。利用Uniswap Labs接口的两个池的报价价格——USDC-ETH(5bps)和PEPE-ETH(30bps),我们评估了在DEX上交易的效率。我们的主要工具是滑点——交易的实现执行价格与其报价价格之间的差异——我们将其分解为良性和敌意成分。我们还提出了一种另类方式来量化和识别由于交易重新排序引起的滑点,我们称之为重新排序滑点,这不需要报价价格或内存池数据来计算。我们发现交易成本的构成随交易特征而变化很大。具体而言,对于小规模交易,燃气成本占主导地位,而对于大规模交易,价格冲击和滑点占大部分成本。此外,当交易PEPE这样一种热门的“模因币”时,敌意滑点的概率比交易成熟资产如USDC时高出约80%。 总的来说,我们的结果初步证明DEX为交易数字资产提供了一种引人注目的无需信任的替代方案,与中心化交易所相比。
更新时间: 2024-04-17 14:12:56
领域: cs.CR,q-fin.TR
SERENE: A Collusion Resilient Replication-based Verification Framework
The rapid advancement of autonomous driving technology is accompanied by substantial challenges, particularly the reliance on remote task execution without ensuring a reliable and accurate returned results. This reliance on external compute servers, which may be malicious or rogue, represents a major security threat. While researchers have been exploring verifiable computing, and replication-based task verification as a simple, fast, and dependable method to assess the correctness of results. However, colluding malicious workers can easily defeat this method. Existing collusion detection and mitigation solutions often require the use of a trusted third party server or verified tasks which may be hard to guarantee, or solutions that assume the presence of a minority of colluding servers. We propose SERENE a collusion resilient replication-based verification framework that detects, and mitigates colluding workers. Unlike state-of-the-art solutions, SERENE uses a lightweight detection algorithm that detects collusion based on a single verification task. Mitigation requires a two stage process to group the workers and identifying colluding from honest workers. We implement and compare SERENE's performance to Staab et. al, resulting in an average of 50\% and 60\% accuracy improvement in detection and mitigation accuracy respectively.
Updated: 2024-04-17 14:11:31
标题: SERENE:一种抗串谋复制验证框架
摘要: 自主驾驶技术的快速发展伴随着巨大的挑战,特别是依赖远程任务执行而不确保可靠和准确的返回结果。对外部计算服务器的依赖可能存在恶意或不端行为,构成重大安全威胁。虽然研究人员一直在探索可验证计算和基于复制的任务验证作为一种简单、快速和可靠的评估结果正确性的方法。然而,串通的恶意工作者可以轻易击败这种方法。现有的串通检测和缓解解决方案通常需要使用受信任的第三方服务器或经过验证的任务,这可能难以保证,或者假设存在少数串通的服务器。我们提出了SERENE,一个抗串通的基于复制的验证框架,可检测和缓解串通工作者。与最先进的解决方案不同,SERENE使用一种轻量级检测算法,基于单个验证任务检测串通。缓解需要一个两阶段过程来将工作者分组并识别串通者和诚实的工作者。我们实现并比较了SERENE与Staab等人的性能,结果显示在检测和缓解准确性方面平均提高了50%和60%。
更新时间: 2024-04-17 14:11:31
领域: cs.CR,cs.NI
DUPE: Detection Undermining via Prompt Engineering for Deepfake Text
As large language models (LLMs) become increasingly commonplace, concern about distinguishing between human and AI text increases as well. The growing power of these models is of particular concern to teachers, who may worry that students will use LLMs to write school assignments. Facing a technology with which they are unfamiliar, teachers may turn to publicly-available AI text detectors. Yet the accuracy of many of these detectors has not been thoroughly verified, posing potential harm to students who are falsely accused of academic dishonesty. In this paper, we evaluate three different AI text detectors-Kirchenbauer et al. watermarks, ZeroGPT, and GPTZero-against human and AI-generated essays. We find that watermarking results in a high false positive rate, and that ZeroGPT has both high false positive and false negative rates. Further, we are able to significantly increase the false negative rate of all detectors by using ChatGPT 3.5 to paraphrase the original AI-generated texts, thereby effectively bypassing the detectors.
Updated: 2024-04-17 14:10:27
标题: DUPE: 深度伪造文本的提示工程检测技术
摘要: 随着大型语言模型(LLMs)变得越来越普遍,对于区分人类文本和人工智能文本的担忧也在增加。这些模型的增强能力尤其让教师们感到担忧,他们可能担心学生会使用LLMs来写作业。面对一种他们不熟悉的技术,教师们可能会转向公开可用的人工智能文本检测器。然而,许多这些检测器的准确性尚未得到彻底验证,可能会给被错误指控为学术不端的学生带来潜在危害。在本文中,我们评估了三种不同的人工智能文本检测器-Kirchenbauer等人的水印、ZeroGPT和GPTZero-对人类和人工智能生成的文章。我们发现,水印技术导致高误报率,而ZeroGPT既有高误报率又有高误检率。此外,我们可以通过使用ChatGPT 3.5对原始的人工智能生成文本进行改写,从而有效地绕过所有检测器,显著增加所有检测器的误检率。
更新时间: 2024-04-17 14:10:27
领域: cs.AI
EPIM: Efficient Processing-In-Memory Accelerators based on Epitome
The utilization of large-scale neural networks on Processing-In-Memory (PIM) accelerators encounters challenges due to constrained on-chip memory capacity. To tackle this issue, current works explore model compression algorithms to reduce the size of Convolutional Neural Networks (CNNs). Most of these algorithms either aim to represent neural operators with reduced-size parameters (e.g., quantization) or search for the best combinations of neural operators (e.g., neural architecture search). Designing neural operators to align with PIM accelerators' specifications is an area that warrants further study. In this paper, we introduce the Epitome, a lightweight neural operator offering convolution-like functionality, to craft memory-efficient CNN operators for PIM accelerators (EPIM). On the software side, we evaluate epitomes' latency and energy on PIM accelerators and introduce a PIM-aware layer-wise design method to enhance their hardware efficiency. We apply epitome-aware quantization to further reduce the size of epitomes. On the hardware side, we modify the datapath of current PIM accelerators to accommodate epitomes and implement a feature map reuse technique to reduce computation cost. Experimental results reveal that our 3-bit quantized EPIM-ResNet50 attains 71.59% top-1 accuracy on ImageNet, reducing crossbar areas by 30.65 times. EPIM surpasses the state-of-the-art pruning methods on PIM.
Updated: 2024-04-17 14:09:52
标题: EPIM:基于Epitome的高效内存处理加速器
摘要: 在处理内存中的大规模神经网络时,面临着处理器内存容量受限的挑战。为了解决这个问题,当前的研究探索模型压缩算法来减小卷积神经网络(CNNs)的大小。大多数算法要么旨在用较小的参数表示神经算子(例如,量化),要么寻找最佳的神经算子组合(例如,神经架构搜索)。设计与PIM加速器规格相符的神经算子是一个值得进一步研究的领域。在本文中,我们介绍了Epitome,一种轻量级神经算子,提供类似卷积功能,用于为PIM加速器(EPIM)打造内存高效的CNN算子。在软件方面,我们评估了Epitomes在PIM加速器上的延迟和能量,并引入了一种PIM感知的逐层设计方法来增强其硬件效率。我们应用了Epitome感知的量化方法,进一步减小了Epitomes的大小。在硬件方面,我们修改了当前PIM加速器的数据通路以适应Epitomes,并实现了一种特征图重用技术来降低计算成本。实验结果显示,我们的3位量化的EPIM-ResNet50在ImageNet上达到了71.59%的top-1准确率,将交叉栏区域减少了30.65倍。EPIM超越了PIM上最先进的修剪方法。
更新时间: 2024-04-17 14:09:52
领域: cs.AR,cs.LG
MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model
With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To solve this problem, this paper proposes a method called Multimodal Image Semantic Compression (MISC), which consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information. Experimental results show that our proposed MISC is suitable for compressing both traditional Natural Sense Images (NSIs) and emerging AI-Generated Images (AIGIs) content. It can achieve optimal consistency and perception results while saving 50% bitrate, which has strong potential applications in the next generation of storage and communication. The code will be released on https://github.com/lcysyzxdxc/MISC.
Updated: 2024-04-17 14:06:28
标题: MISC:由大型多模型驱动的超低比特率图像语义压缩
摘要: 随着存储和通信协议的演进,超低比特率图像压缩已成为一个备受关注的话题。然而,现有的压缩算法必须在超低比特率下要么牺牲与实际情况的一致性,要么牺牲感知质量。近年来,大型多模型(LMM)的快速发展使得平衡这两个目标成为可能。为了解决这个问题,本文提出了一种名为多模式图像语义压缩(MISC)的方法,该方法包括一个用于提取图像语义信息的LMM编码器,一个用于定位对应语义的区域的地图编码器,一个生成极度压缩比特流的图像编码器,以及一个基于上述信息重构图像的解码器。实验结果表明,我们提出的MISC适用于压缩传统的自然感知图像(NSIs)和新兴的人工智能生成图像(AIGIs)内容。它可以在节省50%比特率的同时实现最佳的一致性和感知结果,具有在下一代存储和通信中具有强大潜力的应用。该代码将发布在 https://github.com/lcysyzxdxc/MISC。
更新时间: 2024-04-17 14:06:28
领域: cs.CV,cs.AI,eess.IV
On Unified Prompt Tuning for Request Quality Assurance in Public Code Review
Public Code Review (PCR) can be implemented through a Software Question Answering (SQA) community, which facilitates high knowledge dissemination. Current methods mainly focus on the reviewer's perspective, including finding a capable reviewer, predicting comment quality, and recommending/generating review comments. Our intuition is that satisfying review necessity requests can increase their visibility, which in turn is a prerequisite for better review responses. To this end, we propose a unified framework called UniPCR to complete developer-based request quality assurance (i.e., predicting request necessity and recommending tags subtask) under a Masked Language Model (MLM). Specifically, we reformulate both subtasks via 1) text prompt tuning, which converts two subtasks into MLM by constructing prompt templates using hard prompt; 2) code prefix tuning, which optimizes a small segment of generated continuous vectors as the prefix of the code representation using soft prompt. Experimental results on the Public Code Review dataset for the time span 2011-2022 demonstrate that our UniPCR framework adapts to the two subtasks and outperforms comparable accuracy-based results with state-of-the-art methods for request quality assurance. These conclusions highlight the effectiveness of our unified framework from the developer's perspective in public code review.
Updated: 2024-04-17 14:04:50
标题: 关于在公共代码审查中统一提示调整以确保请求质量保证
摘要: 公共代码审查(PCR)可以通过软件问答(SQA)社区实施,这有助于高知识传播。目前的方法主要关注审阅者的视角,包括寻找合适的审阅者、预测评论质量以及推荐/生成审阅评论。我们的直觉是,满足审阅必要性请求可以增加它们的可见性,从而为更好的审阅响应打下基础。为此,我们提出了一个统一的框架UniPCR,用于完成基于开发者请求的质量保证(即,预测请求必要性和推荐标签子任务)在掩蔽语言模型(MLM)下。具体来说,我们通过1)文本提示调优,将两个子任务转换为MLM,通过使用硬提示构建提示模板;2)代码前缀调优,通过使用软提示,优化生成的连续向量的一个小部分作为代码表示的前缀。对于时间跨度2011年至2022年的公共代码审查数据集上的实验结果表明,我们的UniPCR框架适应了这两个子任务,并在请求质量保证方面胜过基于准确性的结果,与最先进的方法相比。这些结论强调了我们的统一框架从开发者的角度在公共代码审查中的有效性。
更新时间: 2024-04-17 14:04:50
领域: cs.SE,cs.AI
Enhancing Data Privacy In Wireless Sensor Networks: Investigating Techniques And Protocols To Protect Privacy Of Data Transmitted Over Wireless Sensor Networks In Critical Applications Of Healthcare And National Security
The article discusses the emergence of Wireless Sensor Networks (WSNs) as a groundbreaking technology in data processing and communication. It outlines how WSNs, composed of dispersed autonomous sensors, are utilized to monitor physical and environmental factors, transmitting data wirelessly for analysis. The article explores various applications of WSNs in healthcare, national security, emergency response, and infrastructure monitoring, highlighting their roles in enhancing patient care, public health surveillance, border security, disaster management, and military operations. Additionally, it examines the foundational concepts of data privacy in WSNs, focusing on encryption techniques, authentication mechanisms, anonymization techniques, and access control mechanisms. The article also addresses vulnerabilities, threats, and challenges related to data privacy in healthcare and national security contexts, emphasizing regulatory compliance, ethical considerations, and socio-economic factors. Furthermore, it introduces the Diffusion of Innovation Theory as a framework for understanding the adoption of privacy-enhancing technologies in WSNs. Finally, the article reviews empirical studies demonstrating the efficacy of security solutions in preserving data privacy in WSNs, offering insights into advancements in safeguarding sensitive information.
Updated: 2024-04-17 13:48:30
标题: 在无线传感器网络中增强数据隐私:研究用于保护在卫生保健和国家安全关键应用中传输数据隐私的技术和协议
摘要: 这篇文章讨论了无线传感器网络(WSNs)作为数据处理和通信中的突破性技术的出现。文章概述了由分散的自主传感器组成的WSNs如何被用于监测物理和环境因素,并通过无线传输数据进行分析。文章探讨了WSNs在医疗保健、国家安全、应急响应和基础设施监测等领域的各种应用,并强调它们在提高患者护理、公共卫生监测、边境安全、灾难管理和军事行动中的作用。此外,文章还探讨了WSNs中数据隐私的基本概念,重点关注加密技术、认证机制、匿名化技术和访问控制机制。文章还讨论了与医疗保健和国家安全领域数据隐私相关的漏洞、威胁和挑战,强调了监管合规性、伦理考虑和社会经济因素。此外,文章引入了创新扩散理论作为理解WSNs中采用增强隐私技术的框架。最后,文章审查了证明安全解决方案在保护数据隐私方面有效性的实证研究,为保护敏感信息方面的进展提供了见解。
更新时间: 2024-04-17 13:48:30
领域: cs.CR,cs.NI
UMAIR-FPS: User-aware Multi-modal Animation Illustration Recommendation Fusion with Painting Style
The rapid advancement of high-quality image generation models based on AI has generated a deluge of anime illustrations. Recommending illustrations to users within massive data has become a challenging and popular task. However, existing anime recommendation systems have focused on text features but still need to integrate image features. In addition, most multi-modal recommendation research is constrained by tightly coupled datasets, limiting its applicability to anime illustrations. We propose the User-aware Multi-modal Animation Illustration Recommendation Fusion with Painting Style (UMAIR-FPS) to tackle these gaps. In the feature extract phase, for image features, we are the first to combine image painting style features with semantic features to construct a dual-output image encoder for enhancing representation. For text features, we obtain text embeddings based on fine-tuning Sentence-Transformers by incorporating domain knowledge that composes a variety of domain text pairs from multilingual mappings, entity relationships, and term explanation perspectives, respectively. In the multi-modal fusion phase, we novelly propose a user-aware multi-modal contribution measurement mechanism to weight multi-modal features dynamically according to user features at the interaction level and employ the DCN-V2 module to model bounded-degree multi-modal crosses effectively. UMAIR-FPS surpasses the stat-of-the-art baselines on large real-world datasets, demonstrating substantial performance enhancements.
Updated: 2024-04-17 13:46:56
标题: UMAIR-FPS:用户感知多模态动画插图推荐与绘画风格融合
摘要: 基于人工智能的高质量图像生成模型的快速发展已经产生了大量的动漫插图。在海量数据中向用户推荐插图已成为一项具有挑战性和受欢迎的任务。然而,现有的动漫推荐系统主要关注文本特征,但仍需要整合图像特征。此外,大多数多模态推荐研究受到紧密耦合数据集的限制,限制了其对动漫插图的适用性。我们提出了一种用户感知的多模态动画插图推荐融合与绘画风格(UMAIR-FPS)来解决这些差距。在特征提取阶段,对于图像特征,我们首次将图像绘画风格特征与语义特征结合起来构建一个双输出图像编码器,以增强表示。对于文本特征,我们基于微调句子转换器获取文本嵌入,融入领域知识,从多语言映射、实体关系和术语解释等角度组成各种领域文本对。在多模态融合阶段,我们新颖地提出了一种用户感知的多模态贡献度测量机制,根据交互级别的用户特征动态加权多模态特征,并采用DCN-V2模块有效地建模有界度多模态交叉效应。UMAIR-FPS在大型真实世界数据集上超越了最先进的基线,显示出显著的性能提升。
更新时间: 2024-04-17 13:46:56
领域: cs.IR,cs.AI
Exploring Key Point Analysis with Pairwise Generation and Graph Partitioning
Key Point Analysis (KPA), the summarization of multiple arguments into a concise collection of key points, continues to be a significant and unresolved issue within the field of argument mining. Existing models adapt a two-stage pipeline of clustering arguments or generating key points for argument clusters. This approach rely on semantic similarity instead of measuring the existence of shared key points among arguments. Additionally, it only models the intra-cluster relationship among arguments, disregarding the inter-cluster relationship between arguments that do not share key points. To address these limitations, we propose a novel approach for KPA with pairwise generation and graph partitioning. Our objective is to train a generative model that can simultaneously provide a score indicating the presence of shared key point between a pair of arguments and generate the shared key point. Subsequently, to map generated redundant key points to a concise set of key points, we proceed to construct an arguments graph by considering the arguments as vertices, the generated key points as edges, and the scores as edge weights. We then propose a graph partitioning algorithm to partition all arguments sharing the same key points to the same subgraph. Notably, our experimental findings demonstrate that our proposed model surpasses previous models when evaluated on both the ArgKP and QAM datasets.
Updated: 2024-04-17 13:44:29
标题: 使用成对生成和图分区探索关键点分析
摘要: 关键点分析(KPA)是将多个论点总结为简洁的关键点集合,仍然是论点挖掘领域中一个重要且未解决的问题。现有模型采用两阶段流程,对论点进行聚类或生成论点集。这种方法依赖语义相似性而非测量论点之间共享关键点的存在。此外,它仅对论点之间的内部集群关系进行建模,忽略了不共享关键点的论点之间的集群关系。为解决这些限制,我们提出了一种新颖的基于配对生成和图分区的KPA方法。我们的目标是训练一个生成模型,同时提供一个表示一对论点之间共享关键点存在的分数,并生成共享的关键点。随后,为了将生成的冗余关键点映射到简洁的关键点集合,我们继续通过将论点视为顶点、生成的关键点视为边、分数视为边权重来构建一个论点图。然后,我们提出了一个图分区算法,将所有共享相同关键点的论点分配到同一个子图中。值得注意的是,我们的实验结果表明,我们提出的模型在ArgKP和QAM数据集上的评估中超越了先前的模型。
更新时间: 2024-04-17 13:44:29
领域: cs.CL,cs.LG
Off-Path TCP Hijacking in Wi-Fi Networks: A Packet-Size Side Channel Attack
In this paper, we unveil a fundamental side channel in Wi-Fi networks, specifically the observable frame size, which can be exploited by attackers to conduct TCP hijacking attacks. Despite the various security mechanisms (e.g., WEP and WPA2/WPA3) implemented to safeguard Wi-Fi networks, our study reveals that an off path attacker can still extract sufficient information from the frame size side channel to hijack the victim's TCP connection. Our side channel attack is based on two significant findings: (i) response packets (e.g., ACK and RST) generated by TCP receivers vary in size, and (ii) the encrypted frames containing these response packets have consistent and distinguishable sizes. By observing the size of the victim's encrypted frames, the attacker can detect and hijack the victim's TCP connections. We validate the effectiveness of this side channel attack through two case studies, i.e., SSH DoS and web traffic manipulation. Precisely, our attack can terminate the victim's SSH session in 19 seconds and inject malicious data into the victim's web traffic within 28 seconds. Furthermore, we conduct extensive measurements to evaluate the impact of our attack on real-world Wi-Fi networks. We test 30 popular wireless routers from 9 well-known vendors, and none of these routers can protect victims from our attack. Besides, we implement our attack in 80 real-world Wi-Fi networks and successfully hijack the victim's TCP connections in 75 (93.75%) evaluated Wi-Fi networks. We have responsibly disclosed the vulnerability to the Wi-Fi Alliance and proposed several mitigation strategies to address this issue.
Updated: 2024-04-17 13:37:04
标题: Wi-Fi网络中的Off-Path TCP劫持:一种包大小侧信道攻击
摘要: 在这篇论文中,我们揭示了Wi-Fi网络中的一个根本侧信道,即可观察的帧大小,攻击者可以利用它来进行TCP劫持攻击。尽管已经实施了各种安全机制(例如WEP和WPA2/WPA3)来保护Wi-Fi网络,但我们的研究表明,一名离线攻击者仍然可以从帧大小侧信道中提取足够的信息来劫持受害者的TCP连接。我们的侧信道攻击基于两个重要发现:(i)由TCP接收方生成的响应数据包(例如ACK和RST)大小不同,(ii)包含这些响应数据包的加密帧具有一致且可区分的大小。通过观察受害者的加密帧大小,攻击者可以检测并劫持受害者的TCP连接。我们通过两个案例研究验证了这种侧信道攻击的有效性,即SSH DoS和网页流量操纵。具体来说,我们的攻击可以在19秒内终止受害者的SSH会话,并在28秒内向受害者的网页流量中注入恶意数据。此外,我们进行了广泛的测量,评估了我们的攻击对实际Wi-Fi网络的影响。我们测试了来自9个知名供应商的30个热门无线路由器,这些路由器中没有一个能够保护受害者免受我们的攻击。此外,我们在80个实际Wi-Fi网络中实施了我们的攻击,并成功地在75个(93.75%)受评估的Wi-Fi网络中劫持了受害者的TCP连接。我们已经负责地向Wi-Fi联盟披露了这个漏洞,并提出了几种缓解策略来解决这个问题。
更新时间: 2024-04-17 13:37:04
领域: cs.NI,cs.CR
Unveiling Code Pre-Trained Models: Investigating Syntax and Semantics Capacities
Past research has examined how well these models grasp code syntax, yet their understanding of code semantics still needs to be explored. We extensively analyze seven code models to investigate how code models represent code syntax and semantics. This includes four prominent code pre-trained models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) and three large language models (StarCoder, CodeLlama, and CodeT5+). We have developed four probing tasks to evaluate the models' abilities to learn code syntax and semantics. These tasks focus on reconstructing code syntax and semantic structures-such as AST, CFG, CDG, and DDG - within the models' representation spaces. These structures are fundamental to understanding code. Additionally, we explore the role of syntax tokens in each token representation and the extended dependencies among code tokens. Furthermore, we examine the distribution of attention weights concerning code semantic structures. Through detailed analysis, our results emphasize the strengths and weaknesses of various code models in mastering code syntax and semantics. The findings reveal that these models are proficient in grasping code syntax, effectively capturing the relationships and roles of syntax tokens. However, their ability to encode code semantics shows more variability. This study enriches our understanding of the capabilities of code models in analyzing syntax and semantics. Our findings offer valuable insights for future code model enhancements, helping optimize their application across a range of code-related tasks.
Updated: 2024-04-17 13:35:09
标题: 揭示预训练模型的代码:探究语法和语义能力
摘要: 过去的研究已经检验了这些模型对代码语法的掌握程度,但它们对于代码语义的理解仍需要进一步探索。我们广泛分析了七种代码模型,以研究代码模型如何表示代码的语法和语义。这包括四种著名的代码预训练模型(CodeBERT,GraphCodeBERT,CodeT5和UnixCoder)以及三种大型语言模型(StarCoder,CodeLlama和CodeT5+)。我们开发了四个探测任务来评估模型学习代码语法和语义的能力。这些任务聚焦于在模型的表示空间内重建代码语法和语义结构-如AST,CFG,CDG和DDG。这些结构对于理解代码至关重要。此外,我们探讨了每个标记表示中语法标记的作用以及代码标记之间的扩展依赖关系。此外,我们考察了与代码语义结构相关的注意力权重的分布。通过详细分析,我们的结果强调了各种代码模型在掌握代码语法和语义方面的优势和劣势。研究结果表明,这些模型在掌握代码语法方面表现出色,有效地捕捉了语法标记之间的关系和角色。然而,它们编码代码语义的能力显示出更多的变化。这项研究丰富了我们对代码模型在分析语法和语义方面的能力的理解。我们的发现为未来代码模型的增强提供了宝贵的见解,有助于优化它们在各种与代码相关任务中的应用。
更新时间: 2024-04-17 13:35:09
领域: cs.SE,cs.AI
Tensor Factorisation for Polypharmacy Side Effect Prediction
Adverse reactions caused by drug combinations are an increasingly common phenomenon, making their accurate prediction an important challenge in modern medicine. However, the polynomial nature of this problem renders lab-based identification of adverse reactions insufficient. Dozens of computational approaches have therefore been proposed for the task in recent years, with varying degrees of success. One group of methods that has seemingly been under-utilised in this area is tensor factorisation, despite their clear applicability to this type of data. In this work, we apply three such models to a benchmark dataset in order to compare them against established techniques. We find, in contrast to previous reports, that for this task tensor factorisation models are competitive with state-of-the-art graph neural network models and we recommend that future work in this field considers cheaper methods with linear complexity before running costly deep learning processes.
Updated: 2024-04-17 13:32:05
标题: 张量分解用于多药联合使用副作用预测
摘要: 由药物组合引起的不良反应是一种日益普遍的现象,使得准确预测成为现代医学中的重要挑战。然而,由于这一问题的多项式性质,基于实验室的不良反应识别显然是不够的。因此,近年来已经提出了数十种计算方法来处理这一任务,其成功程度各不相同。在这一领域中似乎被低估的一组方法是张量分解,尽管它们明显适用于这种类型的数据。在这项工作中,我们将三种这样的模型应用于一个基准数据集,以便将它们与已建立的技术进行比较。与以往的报道相反,我们发现,在这个任务中,张量分解模型与最先进的图神经网络模型具有竞争力,并建议未来在这个领域的工作在运行昂贵的深度学习过程之前考虑使用具有线性复杂度的更便宜的方法。
更新时间: 2024-04-17 13:32:05
领域: cs.LG,q-bio.BM
S3PHER: Secure and Searchable System for Patient-driven HEalth data shaRing
Healthcare data contains some of the most sensitive information about an individual, yet sharing this data with healthcare practitioners can significantly enhance patient care and support research efforts. However, current systems for sharing health data between patients and caregivers do not fully address the critical security requirements of privacy, confidentiality, and consent management. Furthermore, compliance with regulatory laws such as GDPR and HIPAA is often deficient, largely because patients typically are asked to provide general consent for healthcare entities to access their data. Recognizing the limitations of existing systems, we present S3PHER, a novel approach to sharing health data that provides patients with control over who accesses their data, what data is accessed, and when. Our system ensures end to end privacy by integrating a Proxy ReEncryption Scheme with a Searchable Encryption Scheme, utilizing Homomorphic Encryption to enable healthcare practitioners to privately search and access patients' documents. The practicality and benefits of S3PHER are further validated through end to end deployment and use case analyses, with tests on real datasets demonstrating promising execution times.
Updated: 2024-04-17 13:31:50
标题: S3PHER:用于患者驱动的健康数据共享的安全和可搜索系统
摘要: 医疗保健数据包含个人最敏感的信息,然而与医疗从业者分享这些数据可以显著增强患者护理和支持研究工作。然而,目前用于患者和护理人员之间分享健康数据的系统并未完全满足隐私、保密和同意管理的关键安全要求。此外,遵守诸如GDPR和HIPAA等法规往往不足,主要是因为患者通常被要求提供健康实体访问其数据的一般同意。认识到现有系统的局限性,我们提出了S3PHER,一种新颖的健康数据共享方法,它赋予患者控制权,决定谁可以访问他们的数据、访问哪些数据以及何时访问。我们的系统通过集成代理重加密方案和可搜索加密方案,利用同态加密实现终端到终端的隐私,使医疗从业者能够私下搜索和访问患者的文件。通过终端到终端的部署和用例分析,以及对真实数据集的测试,进一步验证了S3PHER的实用性和好处,展示了有希望的执行时间。
更新时间: 2024-04-17 13:31:50
领域: cs.CR,E.3; H.3.1; H.3.2; H.3.3
Characterizing and modeling harms from interactions with design patterns in AI interfaces
The proliferation of applications using artificial intelligence (AI) systems has led to a growing number of users interacting with these systems through sophisticated interfaces. Human-computer interaction research has long shown that interfaces shape both user behavior and user perception of technical capabilities and risks. Yet, practitioners and researchers evaluating the social and ethical risks of AI systems tend to overlook the impact of anthropomorphic, deceptive, and immersive interfaces on human-AI interactions. Here, we argue that design features of interfaces with adaptive AI systems can have cascading impacts, driven by feedback loops, which extend beyond those previously considered. We first conduct a scoping review of AI interface designs and their negative impact to extract salient themes of potentially harmful design patterns in AI interfaces. Then, we propose Design-Enhanced Control of AI systems (DECAI), a conceptual model to structure and facilitate impact assessments of AI interface designs. DECAI draws on principles from control systems theory -- a theory for the analysis and design of dynamic physical systems -- to dissect the role of the interface in human-AI systems. Through two case studies on recommendation systems and conversational language model systems, we show how DECAI can be used to evaluate AI interface designs.
Updated: 2024-04-17 13:30:45
标题: 对人工智能界面中设计模式相互作用所导致的危害进行特征化和建模
摘要: 人工智能(AI)系统的应用不断增多,导致越来越多的用户通过复杂的界面与这些系统进行交互。人机交互研究长期以来表明,界面塑造了用户行为和对技术能力和风险的看法。然而,评估AI系统的社会和伦理风险的从业者和研究人员往往忽视了人机交互中人工智能、欺骗性和沉浸式界面对人工智能交互的影响。在这里,我们认为,具有自适应AI系统的界面设计特征可能产生级联影响,由反馈循环驱动,其影响范围超出以往考虑的范围。我们首先进行了AI界面设计和其负面影响的范围回顾,提取了AI界面可能有害设计模式的显著主题。然后,我们提出了AI系统设计增强控制(DECAI),这是一个概念模型,用于结构化和促进AI界面设计的影响评估。DECAI借鉴了控制系统理论的原则,这是一种用于分析和设计动态物理系统的理论,以剖析界面在人工智能系统中的作用。通过推荐系统和对话语言模型系统的两个案例研究,我们展示了如何使用DECAI来评估AI界面设计。
更新时间: 2024-04-17 13:30:45
领域: cs.HC,cs.AI,cs.CY
Causal Intervention for Fairness in Multi-behavior Recommendation
Recommender systems usually learn user interests from various user behaviors, including clicks and post-click behaviors (e.g., like and favorite). However, these behaviors inevitably exhibit popularity bias, leading to some unfairness issues: 1) for items with similar quality, more popular ones get more exposure; and 2) even worse the popular items with lower popularity might receive more exposure. Existing work on mitigating popularity bias blindly eliminates the bias and usually ignores the effect of item quality. We argue that the relationships between different user behaviors (e.g., conversion rate) actually reflect the item quality. Therefore, to handle the unfairness issues, we propose to mitigate the popularity bias by considering multiple user behaviors. In this work, we examine causal relationships behind the interaction generation procedure in multi-behavior recommendation. Specifically, we find that: 1) item popularity is a confounder between the exposed items and users' post-click interactions, leading to the first unfairness; and 2) some hidden confounders (e.g., the reputation of item producers) affect both item popularity and quality, resulting in the second unfairness. To alleviate these confounding issues, we propose a causal framework to estimate the causal effect, which leverages backdoor adjustment to block the backdoor paths caused by the confounders. In the inference stage, we remove the negative effect of popularity and utilize the good effect of quality for recommendation. Experiments on two real-world datasets validate the effectiveness of our proposed framework, which enhances fairness without sacrificing recommendation accuracy.
Updated: 2024-04-17 13:16:02
标题: 多行为推荐中的公平性因果干预
摘要: 推荐系统通常从各种用户行为中学习用户兴趣,包括点击和点击后的行为(例如,点赞和收藏)。然而,这些行为不可避免地表现出流行度偏见,导致一些不公平问题:1)对于质量相似的项目,更受欢迎的项目获得更多曝光量;2)更糟糕的是,流行度较低的热门项目可能会获得更多曝光。现有的研究在减轻流行度偏见时盲目消除偏见,并通常忽略项目质量的影响。我们认为不同用户行为之间的关系(例如,转化率)实际上反映了项目质量。因此,为了处理不公平问题,我们提出通过考虑多种用户行为来减轻流行度偏见。 在这项工作中,我们研究了多种行为推荐中交互生成过程背后的因果关系。具体来说,我们发现:1)项目流行度是暴露项目和用户点击后互动之间的混杂因素,导致第一个不公平;2)一些隐藏的混杂因素(例如,项目生产者的声誉)影响项目的流行度和质量,导致第二种不公平。为了缓解这些混杂问题,我们提出了一个因果框架来估计因果效应,利用反向门调整来阻断混杂因素引起的反向路径。在推断阶段,我们消除流行度的负面影响,并利用质量的好影响进行推荐。在两个真实世界数据集上的实验证明了我们提出的框架的有效性,即在不牺牲推荐准确性的情况下提高公平性。
更新时间: 2024-04-17 13:16:02
领域: cs.IR,cs.AI
Distributed Fractional Bayesian Learning for Adaptive Optimization
This paper considers a distributed adaptive optimization problem, where all agents only have access to their local cost functions with a common unknown parameter, whereas they mean to collaboratively estimate the true parameter and find the optimal solution over a connected network. A general mathematical framework for such a problem has not been studied yet. We aim to provide valuable insights for addressing parameter uncertainty in distributed optimization problems and simultaneously find the optimal solution. Thus, we propose a novel Prediction while Optimization scheme, which utilizes distributed fractional Bayesian learning through weighted averaging on the log-beliefs to update the beliefs of unknown parameters, and distributed gradient descent for renewing the estimation of the optimal solution. Then under suitable assumptions, we prove that all agents' beliefs and decision variables converge almost surely to the true parameter and the optimal solution under the true parameter, respectively. We further establish a sublinear convergence rate for the belief sequence. Finally, numerical experiments are implemented to corroborate the theoretical analysis.
Updated: 2024-04-17 13:09:33
标题: 分布式分数贝叶斯学习用于自适应优化
摘要: 本文考虑了一个分布式自适应优化问题,其中所有代理只能访问带有一个共同未知参数的本地成本函数,而他们打算合作估计真实参数并在连接网络上找到最优解。目前尚未研究过这类问题的一般数学框架。我们的目标是为解决分布式优化问题中的参数不确定性提供有价值的见解,并同时找到最优解。因此,我们提出了一种新颖的“预测同时优化”方案,通过对log-信念进行加权平均的分布式分数贝叶斯学习来更新未知参数的信念,并通过分布式梯度下降来更新最优解的估计。然后在适当的假设下,我们证明所有代理的信念和决策变量几乎肯定收敛到真实参数和最优解,分别。我们进一步建立了信念序列的次线性收敛速度。最后,我们实施了数值实验来验证理论分析。
更新时间: 2024-04-17 13:09:33
领域: math.OC,cs.DC,cs.LG,cs.MA
Calibrating Bayesian Learning via Regularization, Confidence Minimization, and Selective Inference
The application of artificial intelligence (AI) models in fields such as engineering is limited by the known difficulty of quantifying the reliability of an AI's decision. A well-calibrated AI model must correctly report its accuracy on in-distribution (ID) inputs, while also enabling the detection of out-of-distribution (OOD) inputs. A conventional approach to improve calibration is the application of Bayesian ensembling. However, owing to computational limitations and model misspecification, practical ensembling strategies do not necessarily enhance calibration. This paper proposes an extension of variational inference (VI)-based Bayesian learning that integrates calibration regularization for improved ID performance, confidence minimization for OOD detection, and selective calibration to ensure a synergistic use of calibration regularization and confidence minimization. The scheme is constructed successively by first introducing calibration-regularized Bayesian learning (CBNN), then incorporating out-of-distribution confidence minimization (OCM) to yield CBNN-OCM, and finally integrating also selective calibration to produce selective CBNN-OCM (SCBNN-OCM). Selective calibration rejects inputs for which the calibration performance is expected to be insufficient. Numerical results illustrate the trade-offs between ID accuracy, ID calibration, and OOD calibration attained by both frequentist and Bayesian learning methods. Among the main conclusions, SCBNN-OCM is seen to achieve best ID and OOD performance as compared to existing state-of-the-art approaches at the cost of rejecting a sufficiently large number of inputs.
Updated: 2024-04-17 13:08:26
标题: 通过正则化、置信度最小化和选择性推断对贝叶斯学习进行校准
摘要: 人工智能(AI)模型在工程等领域的应用受到一个已知困难的限制,即如何量化AI决策的可靠性。一个良好校准的AI模型必须正确报告其在内分布(ID)输入上的准确性,同时也能够检测到外分布(OOD)输入。改善校准的一种传统方法是应用贝叶斯集成。然而,由于计算限制和模型错误规范化,实际的集成策略并不一定会增强校准。本文提出了一种基于变分推断(VI)的贝叶斯学习的扩展,该方法集成了校准正则化以改进ID性能,置信度最小化以进行OOD检测,并选择性校准以确保校准正则化和置信度最小化的协同使用。该方案通过首先引入校准正则化的贝叶斯学习(CBNN),然后结合外分布置信度最小化(OCM)得到CBNN-OCM,最后还整合了选择性校准以产生选择性CBNN-OCM(SCBNN-OCM)。选择性校准拒绝那些校准性能预计不足的输入。数值结果展示了频率学习方法和贝叶斯学习方法在ID准确性、ID校准和OOD校准之间的权衡。在主要结论中,SCBNN-OCM相比现有最先进的方法在ID和OOD性能方面表现最好,但会拒绝足够多的输入。
更新时间: 2024-04-17 13:08:26
领域: cs.LG,cs.AI,eess.SP
Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System
Collaborative filtering recommender systems (CF-RecSys) have shown successive results in enhancing the user experience on social media and e-commerce platforms. However, as CF-RecSys struggles under cold scenarios with sparse user-item interactions, recent strategies have focused on leveraging modality information of user/items (e.g., text or images) based on pre-trained modality encoders and Large Language Models (LLMs). Despite their effectiveness under cold scenarios, we observe that they underperform simple traditional collaborative filtering models under warm scenarios due to the lack of collaborative knowledge. In this work, we propose an efficient All-round LLM-based Recommender system, called A-LLMRec, that excels not only in the cold scenario but also in the warm scenario. Our main idea is to enable an LLM to directly leverage the collaborative knowledge contained in a pre-trained state-of-the-art CF-RecSys so that the emergent ability of the LLM as well as the high-quality user/item embeddings that are already trained by the state-of-the-art CF-RecSys can be jointly exploited. This approach yields two advantages: (1) model-agnostic, allowing for integration with various existing CF-RecSys, and (2) efficiency, eliminating the extensive fine-tuning typically required for LLM-based recommenders. Our extensive experiments on various real-world datasets demonstrate the superiority of A-LLMRec in various scenarios, including cold/warm, few-shot, cold user, and cross-domain scenarios. Beyond the recommendation task, we also show the potential of A-LLMRec in generating natural language outputs based on the understanding of the collaborative knowledge by performing a favorite genre prediction task. Our code is available at https://github.com/ghdtjr/A-LLMRec .
Updated: 2024-04-17 13:03:07
标题: 大型语言模型遇见协同过滤:一种高效的基于LLM的全能推荐系统
摘要: 协同过滤推荐系统(CF-RecSys)在增强社交媒体和电子商务平台用户体验方面取得了连续的成果。然而,在稀疏用户-物品交互的冷场景下,CF-RecSys存在困难。最近的策略集中于利用用户/物品的模态信息(例如文本或图像),基于预训练的模态编码器和大型语言模型(LLMs)。尽管它们在冷场景下有效,但我们观察到它们在温场景下表现不佳,原因是缺乏协同知识。在这项工作中,我们提出了一种高效的全方位LLM推荐系统,称为A-LLMRec,不仅在冷场景下表现出色,而且在温场景下也表现出色。我们的主要想法是使LLM能够直接利用一个预训练的最先进的CF-RecSys中所包含的协同知识,以便同时利用LLM的新兴能力以及已经被最先进的CF-RecSys训练过的高质量用户/物品嵌入。这种方法产生了两个优点:(1)与各种现有CF-RecSys集成的模型不可知性,(2)效率高,消除了通常需要对LLM推荐器进行广泛微调的需求。我们在各种真实世界数据集上进行了广泛的实验,证明了A-LLMRec在各种场景下(包括冷/温场景,少样本,冷用户和跨域场景)的优越性。除了推荐任务外,我们还展示了A-LLMRec在基于对协同知识的理解来生成自然语言输出方面的潜力,通过执行喜爱流派预测任务。我们的代码可在https://github.com/ghdtjr/A-LLMRec 找到。
更新时间: 2024-04-17 13:03:07
领域: cs.IR,cs.AI
The Causal Chambers: Real Physical Systems as a Testbed for AI Methodology
In some fields of AI, machine learning and statistics, the validation of new methods and algorithms is often hindered by the scarcity of suitable real-world datasets. Researchers must often turn to simulated data, which yields limited information about the applicability of the proposed methods to real problems. As a step forward, we have constructed two devices that allow us to quickly and inexpensively produce large datasets from non-trivial but well-understood physical systems. The devices, which we call causal chambers, are computer-controlled laboratories that allow us to manipulate and measure an array of variables from these physical systems, providing a rich testbed for algorithms from a variety of fields. We illustrate potential applications through a series of case studies in fields such as causal discovery, out-of-distribution generalization, change point detection, independent component analysis, and symbolic regression. For applications to causal inference, the chambers allow us to carefully perform interventions. We also provide and empirically validate a causal model of each chamber, which can be used as ground truth for different tasks. All hardware and software is made open source, and the datasets are publicly available at causalchamber.org or through the Python package causalchamber.
Updated: 2024-04-17 13:00:52
标题: 因果室:作为人工智能方法论测试平台的真实物理系统
摘要: 在一些人工智能、机器学习和统计学领域,新方法和算法的验证常常受到适合实际应用的数据集稀缺的限制。研究人员通常不得不转向模拟数据,这对于验证所提出的方法在真实问题中的适用性提供了有限的信息。为了向前迈进一步,我们构建了两个设备,可以快速、廉价地从复杂但深入理解的物理系统中产生大型数据集。这些设备被称为因果室,是计算机控制的实验室,允许我们操纵和测量这些物理系统的一系列变量,为各种领域的算法提供一个丰富的测试平台。我们通过一系列案例研究展示了潜在的应用领域,如因果发现、超出分布泛化、变点检测、独立成分分析和符号回归。对于因果推断的应用,这些室允许我们进行仔细的干预。我们还提供并经验证了每个室的因果模型,可以作为不同任务的基本事实。所有硬件和软件都是开源的,数据集可在causalchamber.org或通过Python包causalchamber公开获取。
更新时间: 2024-04-17 13:00:52
领域: cs.AI,cs.LG,stat.ME,stat.ML
LLMs for Cyber Security: New Opportunities
Large language models (LLMs) are a class of powerful and versatile models that are beneficial to many industries. With the emergence of LLMs, we take a fresh look at cyber security, specifically exploring and summarizing the potential of LLMs in addressing challenging problems in the security and safety domains.
Updated: 2024-04-17 12:58:51
标题: LLMs用于网络安全:新机遇
摘要: 大型语言模型(LLMs)是一类功能强大且多功能的模型,对许多行业都有益处。随着LLMs的出现,我们重新审视网络安全,具体探讨和总结LLMs在解决安全领域中具有挑战性问题的潜力。
更新时间: 2024-04-17 12:58:51
领域: cs.CR,cs.SE
SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap
Tracking and identifying athletes on the pitch holds a central role in collecting essential insights from the game, such as estimating the total distance covered by players or understanding team tactics. This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i.e. a minimap). However, reconstructing the game state from videos captured by a single camera is challenging. It requires understanding the position of the athletes and the viewpoint of the camera to localize and identify players within the field. In this work, we formalize the task of Game State Reconstruction and introduce SoccerNet-GSR, a novel Game State Reconstruction dataset focusing on football videos. SoccerNet-GSR is composed of 200 video sequences of 30 seconds, annotated with 9.37 million line points for pitch localization and camera calibration, as well as over 2.36 million athlete positions on the pitch with their respective role, team, and jersey number. Furthermore, we introduce GS-HOTA, a novel metric to evaluate game state reconstruction methods. Finally, we propose and release an end-to-end baseline for game state reconstruction, bootstrapping the research on this task. Our experiments show that GSR is a challenging novel task, which opens the field for future research. Our dataset and codebase are publicly available at https://github.com/SoccerNet/sn-gamestate.
Updated: 2024-04-17 12:53:45
标题: SoccerNet比赛状态重建:端到端运动员在小地图上的跟踪和识别
摘要: 在球场上追踪和识别运动员在收集比赛中的重要见解方面发挥着核心作用,比如估计球员覆盖的总距离或了解团队战术。这种追踪和识别过程对于重建比赛状态至关重要,该状态由运动员在球场的2D俯视图上的位置和身份(即迷你地图)定义。然而,从单个摄像头拍摄的视频重建比赛状态具有挑战性。它需要理解运动员的位置和摄像头的视角,以在场地内定位和识别球员。在这项工作中,我们形式化了游戏状态重建的任务,并推出了SoccerNet-GSR,一个专注于足球视频的新型游戏状态重建数据集。 SoccerNet-GSR由200个30秒的视频序列组成,其中包含937万个用于球场定位和摄像头校准的线点,以及超过236万名球员在球场上的位置及其所属角色、团队和球衣号码。此外,我们引入了GS-HOTA,一个用于评估游戏状态重建方法的新型指标。最后,我们提出并发布了一个端到端的游戏状态重建基准,为这项任务的研究提供了起点。我们的实验表明,GSR是一项具有挑战性的新任务,为未来研究开辟了新领域。我们的数据集和代码库可以在https://github.com/SoccerNet/sn-gamestate 上公开获取。
更新时间: 2024-04-17 12:53:45
领域: cs.CV,cs.AI,cs.LG
Investigating Gender Fairness in Machine Learning-driven Personalized Care for Chronic Pain
Chronic pain significantly diminishes the quality of life for millions worldwide. While psychoeducation and therapy can improve pain outcomes, many individuals experiencing pain lack access to evidence-based treatments or fail to complete the necessary number of sessions to achieve benefit. Reinforcement learning (RL) shows potential in tailoring personalized pain management interventions according to patients' individual needs while ensuring the efficient use of scarce clinical resources. However, clinicians, patients, and healthcare decision-makers are concerned that RL solutions could exacerbate disparities associated with patient characteristics like race or gender. In this article, we study gender fairness in personalized pain care recommendations using a real-world application of reinforcement learning (Piette et al., 2022a). Here, adhering to gender fairness translates to minimal or no disparity in the utility received by subpopulations as defined by gender. We investigate whether the selection of relevant patient information (referred to as features) used to assist decision-making affects gender fairness. Our experiments, conducted using real-world data Piette, 2022), indicate that included features can impact gender fairness. Moreover, we propose an RL solution, NestedRecommendation, that demonstrates the ability: i) to adaptively learn to select the features that optimize for utility and fairness, and ii) to accelerate feature selection and in turn, improve pain care recommendations from early on, by leveraging clinicians' domain expertise.
Updated: 2024-04-17 12:52:07
标题: 调查机器学习驱动的个性化慢性疼痛护理中的性别公平性
摘要: 长期疼痛严重影响全球数百万人的生活质量。虽然心理教育和治疗可以改善疼痛结果,但许多经历疼痛的个体缺乏获取基于证据的治疗或未能完成必要的会话次数以获益。强化学习(RL)在根据患者个体需求定制个性化疼痛管理干预方面显示出潜力,同时确保有效利用有限的临床资源。然而,临床医生、患者和医疗决策者担心RL解决方案可能加剧与患者特征(如种族或性别)相关的不平等现象。在本文中,我们使用强化学习的真实世界应用(Piette等人,2022a)研究了个性化疼痛护理建议中的性别公平性。在这里,遵循性别公平性意味着按性别定义的亚人群所接受的效用最小或没有差异。我们研究了用于辅助决策的相关患者信息(称为特征)的选择是否影响性别公平性。我们的实验使用真实世界数据(Piette,2022)表明包含的特征可能影响性别公平性。此外,我们提出了一个RL解决方案,NestedRecommendation,它展示了以下能力:i)能够自适应学习选择优化效用和公平性的特征,并且ii)通过利用临床医生的领域专业知识来加速特征选择,并进而改善早期提供的疼痛护理建议。
更新时间: 2024-04-17 12:52:07
领域: cs.LG,cs.CY
Toward Understanding the Disagreement Problem in Neural Network Feature Attribution
In recent years, neural networks have demonstrated their remarkable ability to discern intricate patterns and relationships from raw data. However, understanding the inner workings of these black box models remains challenging, yet crucial for high-stake decisions. Among the prominent approaches for explaining these black boxes are feature attribution methods, which assign relevance or contribution scores to each input variable for a model prediction. Despite the plethora of proposed techniques, ranging from gradient-based to backpropagation-based methods, a significant debate persists about which method to use. Various evaluation metrics have been proposed to assess the trustworthiness or robustness of their results. However, current research highlights disagreement among state-of-the-art methods in their explanations. Our work addresses this confusion by investigating the explanations' fundamental and distributional behavior. Additionally, through a comprehensive simulation study, we illustrate the impact of common scaling and encoding techniques on the explanation quality, assess their efficacy across different effect sizes, and demonstrate the origin of inconsistency in rank-based evaluation metrics.
Updated: 2024-04-17 12:45:59
标题: 朝向理解神经网络特征归因中的差异问题
摘要: 近年来,神经网络展示了它们从原始数据中识别复杂模式和关系的显著能力。然而,理解这些黑匣子模型的内部运作仍然具有挑战性,但对于高风险决策至关重要。在解释这些黑匣子的突出方法中,特征归因方法是其中之一,它为模型预测的每个输入变量分配相关性或贡献分数。尽管提出了大量技术,从基于梯度的到基于反向传播的方法,但在使用哪种方法上仍然存在重大争议。已经提出了各种评估指标来评估它们的结果的可信度或稳健性。然而,当前的研究突显出现代方法在解释方面存在分歧。我们的工作通过调查解释的基本和分布行为来解决这种混乱。此外,通过全面的模拟研究,我们展示了常见的缩放和编码技术对解释质量的影响,评估了它们在不同效果大小上的功效,并展示了在基于排名的评估指标中不一致性的来源。
更新时间: 2024-04-17 12:45:59
领域: stat.ML,cs.LG
On Learning Parities with Dependent Noise
In this expository note we show that the learning parities with noise (LPN) assumption is robust to weak dependencies in the noise distribution of small batches of samples. This provides a partial converse to the linearization technique of [AG11]. The material in this note is drawn from a recent work by the authors [GMR24], where the robustness guarantee was a key component in a cryptographic separation between reinforcement learning and supervised learning.
Updated: 2024-04-17 12:36:20
标题: 关于依赖噪声的奇偶性学习
摘要: 在这篇说明性的论文中,我们展示了学习带有噪声的假设(LPNA)对于小批量样本中噪声分布的弱依赖性是稳健的。这为[AG11]中的线性化技术提供了部分对话。本文中的材料取自作者最近的一项工作[GMR24],在那项工作中,稳健性保证是强化学习和监督学习之间的加密分离的关键组成部分。
更新时间: 2024-04-17 12:36:20
领域: cs.CR,cs.DS
Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text. Advanced methods often utilize contrastive learning as the optimization objective, which benefits from adequate positive and negative examples. However, the triplet for CIR incurs high manual annotation costs, resulting in limited positive examples. Furthermore, existing methods commonly use in-batch negative sampling, which reduces the negative number available for the model. To address the problem of lack of positives, we propose a data generation method by leveraging a multi-modal large language model to construct triplets for CIR. To introduce more negatives during fine-tuning, we design a two-stage fine-tuning framework for CIR, whose second stage introduces plenty of static representations of negatives to optimize the representation space rapidly. The above two improvements can be effectively stacked and designed to be plug-and-play, easily applied to existing CIR models without changing their original architectures. Extensive experiments and ablation analysis demonstrate that our method effectively scales positives and negatives and achieves state-of-the-art results on both FashionIQ and CIRR datasets. In addition, our methods also perform well in zero-shot composed image retrieval, providing a new CIR solution for the low-resources scenario.
Updated: 2024-04-17 12:30:54
标题: 通过缩放正负样本的对比学习来改善复合图像检索
摘要: 合成图像检索(CIR)任务旨在使用由参考图像和修改后的文本组成的合成查询来检索目标图像。先进的方法通常利用对比学习作为优化目标,这有赖于充分的正负示例。然而,CIR的三元组导致高昂的手动注释成本,导致正例有限。此外,现有方法通常使用批内负采样,这减少了模型可用的负数数量。为了解决正例不足的问题,我们提出了一种数据生成方法,利用多模态大型语言模型构建CIR的三元组。为了在微调过程中引入更多负例,我们为CIR设计了一个两阶段微调框架,第二阶段引入了大量静态负例的表示以快速优化表示空间。上述两项改进可以有效堆叠并设计成即插即用,易于应用于现有CIR模型而无需改变它们的原始架构。大量实验和消融分析表明,我们的方法有效扩展了正例和负例,并在FashionIQ和CIRR数据集上取得了最先进的结果。此外,我们的方法在零样本合成图像检索中表现良好,为低资源场景提供了一种新的CIR解决方案。
更新时间: 2024-04-17 12:30:54
领域: cs.CV,cs.AI
Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning
In open-world semi-supervised learning, a machine learning model is tasked with uncovering novel categories from unlabeled data while maintaining performance on seen categories from labeled data. The central challenge is the substantial learning gap between seen and novel categories, as the model learns the former faster due to accurate supervisory information. Moreover, capturing the semantics of unlabeled novel category samples is also challenging due to the missing label information. To address the above issues, we introduce 1) the adaptive synchronizing marginal loss which imposes class-specific negative margins to alleviate the model bias towards seen classes, and 2) the pseudo-label contrastive clustering which exploits pseudo-labels predicted by the model to group unlabeled data from the same category together in the output space. Extensive experiments on benchmark datasets demonstrate that previous approaches may significantly hinder novel class learning, whereas our method strikingly balances the learning pace between seen and novel classes, achieving a remarkable 3% average accuracy increase on the ImageNet dataset. Importantly, we find that fine-tuning the self-supervised pre-trained model significantly boosts the performance, which is overlooked in prior literature. Our code is available at https://github.com/yebo0216best/LPS-main.
Updated: 2024-04-17 12:27:25
标题: 桥接鸿沟:开放世界半监督学习的学习节奏同步
摘要: 在开放世界的半监督学习中,一个机器学习模型被要求从未标记的数据中发现新颖的类别,同时保持在标记数据中已见类别的性能。中心挑战在于已见和新颖类别之间的显著学习差距,因为模型由于准确的监督信息而更快地学习前者。此外,由于缺少标签信息,捕捉未标记新颖类别样本的语义也具有挑战性。为了解决以上问题,我们引入了1)自适应同步边际损失,它对特定类别施加负边际以减轻模型对已见类别的偏见,以及2)伪标签对比聚类,它利用模型预测的伪标签将同一类别的未标记数据聚集在输出空间中。对基准数据集进行的大量实验证明,先前的方法可能会严重阻碍新颖类别的学习,而我们的方法显著平衡了已见和新颖类别之间的学习速度,在ImageNet数据集上实现了显著的3%平均准确率提升。重要的是,我们发现微调自监督预训练模型显著提升了性能,这在先前的文献中被忽视了。我们的代码可以在https://github.com/yebo0216best/LPS-main 上找到。
更新时间: 2024-04-17 12:27:25
领域: cs.LG,cs.CV
NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024.
Updated: 2024-04-17 12:26:13
标题: NTIRE 2024挑战赛:短视频UGC质量评估的方法和结果
摘要: 这篇论文回顾了NTIRE 2024挑战,主题是短视频UGC视频质量评估(S-UGC VQA),各种优秀解决方案在收集的来自流行短视频平台快手/Kwai平台的KVQ数据集上提交并评估。KVQ数据库分为三部分,包括2926个用于训练的视频,420个用于验证的视频和854个用于测试的视频。其目的是建立新的基准并推动S-UGC VQA的发展。比赛共有200名参与者,13支队伍提交了有效解决方案进行最终测试阶段。所提出的解决方案在S-UGC VQA方面取得了最先进的性能。该项目可在https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024找到。
更新时间: 2024-04-17 12:26:13
领域: eess.IV,cs.AI
Use of Parallel Explanatory Models to Enhance Transparency of Neural Network Configurations for Cell Degradation Detection
In a previous paper, we have shown that a recurrent neural network (RNN) can be used to detect cellular network radio signal degradations accurately. We unexpectedly found, though, that accuracy gains diminished as we added layers to the RNN. To investigate this, in this paper, we build a parallel model to illuminate and understand the internal operation of neural networks, such as the RNN, which store their internal state in order to process sequential inputs. This model is widely applicable in that it can be used with any input domain where the inputs can be represented by a Gaussian mixture. By looking at the RNN processing from a probability density function perspective, we are able to show how each layer of the RNN transforms the input distributions to increase detection accuracy. At the same time we also discover a side effect acting to limit the improvement in accuracy. To demonstrate the fidelity of the model we validate it against each stage of RNN processing as well as the output predictions. As a result, we have been able to explain the reasons for the RNN performance limits with useful insights for future designs for RNNs and similar types of neural network.
Updated: 2024-04-17 12:22:54
标题: 使用并行解释模型来增强神经网络配置的透明度,用于细胞降解检测
摘要: 在一篇先前的论文中,我们已经展示了递归神经网络(RNN)可以准确地检测细胞网络无线信号的降级。然而,我们意外地发现,随着我们向RNN添加层级,准确性增益减少。为了调查这一现象,在本文中,我们建立了一个并行模型来阐明和理解神经网络内部操作,比如RNN,这些神经网络存储他们的内部状态以处理序列输入。该模型具有广泛的适用性,因为它可以与任何可以用高斯混合表示的输入域一起使用。通过从概率密度函数的角度观察RNN处理过程,我们能够展示RNN的每一层如何转换输入分布以提高检测准确性。与此同时,我们还发现了一个限制准确性提高的副作用。为了证明模型的准确性,我们对比了每个阶段的RNN处理以及输出预测。因此,我们已经能够解释RNN性能限制的原因,并为未来设计RNN和类似类型的神经网络提供了有用的见解。
更新时间: 2024-04-17 12:22:54
领域: cs.LG
The Impact of AI Tool on Engineering at ANZ Bank An Empirical Study on GitHub Copilot within Corporate Environment
The increasing popularity of AI, particularly Large Language Models (LLMs), has significantly impacted various domains, including Software Engineering. This study explores the integration of AI tools in software engineering practices within a large organization. We focus on ANZ Bank, which employs over 5000 engineers covering all aspects of the software development life cycle. This paper details an experiment conducted using GitHub Copilot, a notable AI tool, within a controlled environment to evaluate its effectiveness in real-world engineering tasks. Additionally, this paper shares initial findings on the productivity improvements observed after GitHub Copilot was adopted on a large scale, with about 1000 engineers using it. ANZ Bank's six-week experiment with GitHub Copilot included two weeks of preparation and four weeks of active testing. The study evaluated participant sentiment and the tool's impact on productivity, code quality, and security. Initially, participants used GitHub Copilot for proposed use-cases, with their feedback gathered through regular surveys. In the second phase, they were divided into Control and Copilot groups, each tackling the same Python challenges, and their experiences were again surveyed. Results showed a notable boost in productivity and code quality with GitHub Copilot, though its impact on code security remained inconclusive. Participant responses were overall positive, confirming GitHub Copilot's effectiveness in large-scale software engineering environments. Early data from 1000 engineers also indicated a significant increase in productivity and job satisfaction.
Updated: 2024-04-17 12:14:40
标题: AI工具对澳新银行工程的影响:GitHub Copilot在企业环境中的实证研究
摘要: 人工智能的日益普及,特别是大型语言模型(LLMs),显著影响了包括软件工程在内的各个领域。本研究探讨了在大型组织中将人工智能工具整合到软件工程实践中。我们以涵盖软件开发生命周期各个方面的5000多名工程师的澳新银行为重点。本文详细介绍了在受控环境中使用备受关注的AI工具GitHub Copilot进行的实验,以评估其在现实工程任务中的有效性。此外,本文分享了在大规模采用GitHub Copilot后观察到的生产力改进的初步发现,大约有1000名工程师在使用该工具。澳新银行对GitHub Copilot进行了为期六周的实验,包括两周的准备和四周的积极测试。该研究评估了参与者的情绪以及该工具对生产力、代码质量和安全性的影响。最初,参与者使用GitHub Copilot进行拟议的用例,并通过定期调查收集他们的反馈意见。在第二阶段,他们被分为控制组和Copilot组,分别解决相同的Python挑战,并再次进行调查。结果显示,GitHub Copilot显著提升了生产力和代码质量,尽管其对代码安全性的影响尚不明确。参与者的反应总体上是积极的,确认了GitHub Copilot在大规模软件工程环境中的有效性。来自1000名工程师的初步数据还表明生产力和工作满意度显著提高。
更新时间: 2024-04-17 12:14:40
领域: cs.SE,cs.AI
A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching
Nowadays the accurate geo-localization of ground-view images has an important role across domains as diverse as journalism, forensics analysis, transports, and Earth Observation. This work addresses the problem of matching a query ground-view image with the corresponding satellite image without GPS data. This is done by comparing the features from a ground-view image and a satellite one, innovatively leveraging the corresponding latter's segmentation mask through a three-stream Siamese-like network. The proposed method, Semantic Align Net (SAN), focuses on limited Field-of-View (FoV) and ground panorama images (images with a FoV of 360{\deg}). The novelty lies in the fusion of satellite images in combination with their semantic segmentation masks, aimed at ensuring that the model can extract useful features and focus on the significant parts of the images. This work shows how SAN through semantic analysis of images improves the performance on the unlabelled CVUSA dataset for all the tested FoVs.
Updated: 2024-04-17 12:13:18
标题: 一种基于语义分割引导的地面至空中图像匹配方法
摘要: 如今,地面图像的精确地理定位在新闻报道、法证分析、交通运输和地球观测等各个领域都起着重要作用。本文研究了在没有GPS数据的情况下,将查询地面图像与相应的卫星图像进行匹配的问题。这是通过比较地面图像和卫星图像的特征,创新地利用后者的分割掩模,通过三流Siamese-like网络来实现的。所提出的方法,语义对齐网络(SAN),侧重于有限视野(FoV)和地面全景图像(视野为360度的图像)。其创新之处在于将卫星图像与其语义分割掩模融合在一起,旨在确保模型能够提取有用的特征并关注图像的重要部分。本文展示了SAN通过对图像进行语义分析,如何在未标记的CVUSA数据集上提高了所有测试FoV的性能。
更新时间: 2024-04-17 12:13:18
领域: cs.CV,cs.LG
Learning from Unlabelled Data with Transformers: Domain Adaptation for Semantic Segmentation of High Resolution Aerial Images
Data from satellites or aerial vehicles are most of the times unlabelled. Annotating such data accurately is difficult, requires expertise, and is costly in terms of time. Even if Earth Observation (EO) data were correctly labelled, labels might change over time. Learning from unlabelled data within a semi-supervised learning framework for segmentation of aerial images is challenging. In this paper, we develop a new model for semantic segmentation of unlabelled images, the Non-annotated Earth Observation Semantic Segmentation (NEOS) model. NEOS performs domain adaptation as the target domain does not have ground truth semantic segmentation masks. The distribution inconsistencies between the target and source domains are due to differences in acquisition scenes, environment conditions, sensors, and times. Our model aligns the learned representations of the different domains to make them coincide. The evaluation results show that NEOS is successful and outperforms other models for semantic segmentation of unlabelled data.
Updated: 2024-04-17 12:12:48
标题: 使用Transformer模型从无标签数据中学习:高分辨率航空图像语义分割的域自适应
摘要: 卫星或航空器的数据大多数情况下是未标记的。准确注释这些数据是困难的,需要专业知识,并且在时间上成本高昂。即使地球观测(EO)数据被正确标记,标签也可能随时间变化。在半监督学习框架下学习未标记数据用于航空图像分割是具有挑战性的。在本文中,我们开发了一个新的模型用于未标记图像的语义分割,即非标注地球观测语义分割(NEOS)模型。NEOS执行域自适应,因为目标域没有地面真值语义分割掩模。目标和源域之间的分布不一致是由于采集场景、环境条件、传感器和时间的差异。我们的模型对不同域的学习表示进行对齐以使它们重合。评估结果表明NEOS成功并在未标记数据的语义分割中胜过其他模型。
更新时间: 2024-04-17 12:12:48
领域: cs.CV,cs.LG
Do Counterfactual Examples Complicate Adversarial Training?
We leverage diffusion models to study the robustness-performance tradeoff of robust classifiers. Our approach introduces a simple, pretrained diffusion method to generate low-norm counterfactual examples (CEs): semantically altered data which results in different true class membership. We report that the confidence and accuracy of robust models on their clean training data are associated with the proximity of the data to their CEs. Moreover, robust models perform very poorly when evaluated on the CEs directly, as they become increasingly invariant to the low-norm, semantic changes brought by CEs. The results indicate a significant overlap between non-robust and semantic features, countering the common assumption that non-robust features are not interpretable.
Updated: 2024-04-17 12:09:17
标题: 反事实例是否会使对抗训练变得更复杂?
摘要: 我们利用扩散模型研究鲁棒分类器的鲁棒性能权衡。我们的方法引入了一种简单的、预先训练好的扩散方法,用于生成低范数的对抗性例子(CEs):在语义上改变的数据,导致不同的真实类成员身份。我们报告说,鲁棒模型在其干净训练数据上的置信度和准确性与数据与它们的CEs的接近程度有关。此外,当直接在CEs上评估时,鲁棒模型表现非常糟糕,因为它们对CEs带来的低范数、语义变化变得越来越不变。结果表明,非鲁棒和语义特征之间存在显著的重叠,反驳了非鲁棒特征不可解释的常见假设。
更新时间: 2024-04-17 12:09:17
领域: cs.LG,cs.CV
AAVDiff: Experimental Validation of Enhanced Viability and Diversity in Recombinant Adeno-Associated Virus (AAV) Capsids through Diffusion Generation
Recombinant adeno-associated virus (rAAV) vectors have revolutionized gene therapy, but their broad tropism and suboptimal transduction efficiency limit their clinical applications. To overcome these limitations, researchers have focused on designing and screening capsid libraries to identify improved vectors. However, the large sequence space and limited resources present challenges in identifying viable capsid variants. In this study, we propose an end-to-end diffusion model to generate capsid sequences with enhanced viability. Using publicly available AAV2 data, we generated 38,000 diverse AAV2 viral protein (VP) sequences, and evaluated 8,000 for viral selection. The results attested the superiority of our model compared to traditional methods. Additionally, in the absence of AAV9 capsid data, apart from one wild-type sequence, we used the same model to directly generate a number of viable sequences with up to 9 mutations. we transferred the remaining 30,000 samples to the AAV9 domain. Furthermore, we conducted mutagenesis on AAV9 VP hypervariable regions VI and V, contributing to the continuous improvement of the AAV9 VP sequence. This research represents a significant advancement in the design and functional validation of rAAV vectors, offering innovative solutions to enhance specificity and transduction efficiency in gene therapy applications.
Updated: 2024-04-17 12:08:46
标题: AAVDiff:通过扩散生成实验验证重组腺相关病毒(AAV)壳体增强存活率和多样性
摘要: 重组腺相关病毒(rAAV)载体已经彻底改变了基因治疗,但它们的广泛性和亚优化的转导效率限制了它们的临床应用。为了克服这些限制,研究人员专注于设计和筛选衣壳库,以找到改进的载体。然而,庞大的序列空间和有限的资源在识别可行的衣壳变体方面存在挑战。在这项研究中,我们提出了一个端到端的扩散模型,用于生成具有增强生存能力的衣壳序列。利用公开可用的AAV2数据,我们生成了38,000个不同的AAV2病毒蛋白(VP)序列,并对8,000个进行了病毒选择。结果证实了我们的模型相对于传统方法的优越性。此外,在缺乏AAV9衣壳数据的情况下,除了一个野生型序列外,我们使用相同的模型直接生成了一些具有多达9个突变的可行序列。我们将其余的30,000个样本转移到AAV9域。此外,我们对AAV9 VP的超变区域VI和V进行了突变,为持续改进AAV9 VP序列做出了贡献。这项研究代表了rAAV载体的设计和功能验证的重大进展,为增强基因治疗应用中的特异性和转导效率提供了创新解决方案。
更新时间: 2024-04-17 12:08:46
领域: cs.AI,cs.CE
How to Exhibit More Predictable Behaviors
This paper looks at predictability problems, i.e., wherein an agent must choose its strategy in order to optimize the predictions that an external observer could make. We address these problems while taking into account uncertainties on the environment dynamics and on the observed agent's policy. To that end, we assume that the observer 1. seeks to predict the agent's future action or state at each time step, and 2. models the agent using a stochastic policy computed from a known underlying problem, and we leverage on the framework of observer-aware Markov decision processes (OAMDPs). We propose action and state predictability performance criteria through reward functions built on the observer's belief about the agent policy; show that these induced predictable OAMDPs can be represented by goal-oriented or discounted MDPs; and analyze the properties of the proposed reward functions both theoretically and empirically on two types of grid-world problems.
Updated: 2024-04-17 12:06:17
标题: 如何展现更可预测的行为
摘要: 本文探讨了可预测性问题,即一个代理必须选择其策略以优化外部观察者可能做出的预测。我们在考虑环境动态和观察代理政策的不确定性的情况下解决了这些问题。为此,我们假设观察者1. 寻求在每个时间步预测代理的未来动作或状态,2. 使用从已知基本问题计算出的随机策略对代理进行建模,并利用观察者感知马尔可夫决策过程(OAMDPs)的框架。我们通过基于观察者对代理策略的信念构建的奖励函数提出了行动和状态可预测性绩效标准;展示了这些引发可预测性的OAMDPs可以通过面向目标或折扣的MDPs来表示;并从理论和实证的角度分析了所提出的奖励函数在两种类型的网格世界问题上的性质。
更新时间: 2024-04-17 12:06:17
领域: cs.AI
Inductive Cognitive Diagnosis for Fast Student Learning in Web-Based Online Intelligent Education Systems
Cognitive diagnosis aims to gauge students' mastery levels based on their response logs. Serving as a pivotal module in web-based online intelligent education systems (WOIESs), it plays an upstream and fundamental role in downstream tasks like learning item recommendation and computerized adaptive testing. WOIESs are open learning environment where numerous new students constantly register and complete exercises. In WOIESs, efficient cognitive diagnosis is crucial to fast feedback and accelerating student learning. However, the existing cognitive diagnosis methods always employ intrinsically transductive student-specific embeddings, which become slow and costly due to retraining when dealing with new students who are unseen during training. To this end, this paper proposes an inductive cognitive diagnosis model (ICDM) for fast new students' mastery levels inference in WOIESs. Specifically, in ICDM, we propose a novel student-centered graph (SCG). Rather than inferring mastery levels through updating student-specific embedding, we derive the inductive mastery levels as the aggregated outcomes of students' neighbors in SCG. Namely, SCG enables to shift the task from finding the most suitable student-specific embedding that fits the response logs to finding the most suitable representations for different node types in SCG, and the latter is more efficient since it no longer requires retraining. To obtain this representation, ICDM consists of a construction-aggregation-generation-transformation process to learn the final representation of students, exercises and concepts. Extensive experiments across real-world datasets show that, compared with the existing cognitive diagnosis methods that are always transductive, ICDM is much more faster while maintains the competitive inference performance for new students.
Updated: 2024-04-17 11:55:43
标题: 基于网络智能教育系统的快速学生学习的感性认知诊断
摘要: 认知诊断旨在根据学生的响应记录来衡量他们的掌握水平。作为网络教育系统中的一个关键模块,它在学习项目推荐和计算自适应测试等下游任务中发挥着重要作用。网络教育系统是一个开放的学习环境,不断有大量新学生注册并完成练习。在这样的系统中,高效的认知诊断对于快速反馈和加速学生学习至关重要。然而,现有的认知诊断方法通常采用固有的迁移学生特定嵌入,当处理在训练期间未见过的新学生时,重新训练变得缓慢且昂贵。因此,本文提出了一种用于网络教育系统中快速推断新学生掌握水平的归纳认知诊断模型(ICDM)。具体来说,在ICDM中,我们提出了一种新颖的以学生为中心的图(SCG)。我们不是通过更新学生特定嵌入来推断掌握水平,而是将归纳掌握水平作为SCG中学生邻居的汇总结果。换句话说,SCG使任务从寻找最适合响应记录的学生特定嵌入转变为寻找SCG中不同节点类型的最适合表示,后者更有效,因为不再需要重新训练。为了获得这种表示,ICDM包括一个构建-聚合-生成-转换过程来学习学生、练习和概念的最终表示。通过对真实数据集进行广泛实验,结果显示,与现有的总体诊断方法相比,ICDM在推断新学生时速度更快,同时保持了竞争性的推断性能。
更新时间: 2024-04-17 11:55:43
领域: cs.AI
Amplifying Main Memory-Based Timing Covert and Side Channels using Processing-in-Memory Operations
The adoption of processing-in-memory (PiM) architectures has been gaining momentum because they provide high performance and low energy consumption by alleviating the data movement bottleneck. Yet, the security of such architectures has not been thoroughly explored. The adoption of PiM solutions provides a new way to directly access main memory, which can be potentially exploited by malicious user applications. We show that this new way to access main memory opens opportunities for high-throughput timing attack vectors that are hard-to-mitigate without significant performance overhead. We introduce IMPACT, a set of high-throughput main memory-based timing attacks that leverage characteristics of PiM architectures to establish covert and side channels. IMPACT enables high-throughput communication and private information leakage. To achieve this, IMPACT (i) eliminates expensive cache bypassing steps required by processor-centric main memory and cache-based timing attacks and (ii) leverages the intrinsic parallelism of PiM operations. First, we showcase two covert-channel attack variants that run on the host CPU and leverage PiM architectures to gain direct and fast access to main memory and establish high-throughput communication covert channels. Second, we showcase a side-channel attack on a DNA sequence analysis application that leaks the private characteristics of a user's sample genome by leveraging PiM operations. Our results demonstrate that (i) our covert channels achieve up to 14.16 Mb/s communication throughput, which is 6.38x faster than the state-of-the-art main memory-based covert channels, and (ii) our side-channel attack allows the attacker to determine the properties of a sample genome at a throughput of 7.5 Mb/s with 96% accuracy. We discuss and evaluate several countermeasures for IMPACT to enable secure and robust PiM architectures.
Updated: 2024-04-17 11:48:14
标题: 利用内存中的处理操作加强基于主内存的时序隐蔽和侧信道
摘要: 随着处理内存(PiM)架构的采用日益增长,因为它们通过缓解数据移动瓶颈提供高性能和低能耗。然而,这种架构的安全性尚未得到彻底探讨。采用PiM解决方案提供了一种新的直接访问主内存的方式,这可能会被恶意用户应用程序利用。我们展示了这种新的访问主存储器的方法为高吞吐量的时序攻击向量提供了机会,这些攻击向量很难在没有显著性能开销的情况下得到缓解。 我们引入了IMPACT,一组基于高吞吐量主存储器的时序攻击,利用PiM架构的特性建立隐蔽和侧通道。IMPACT实现了高吞吐量通信和私人信息泄漏。为了实现这一目标,IMPACT(i)消除了处理器为中心的主存储器和基于缓存的时序攻击所需的昂贵的缓存绕过步骤,(ii)利用了PiM操作的固有并行性。首先,我们展示了两种在主机CPU上运行的隐蔽通道攻击变体,利用PiM架构获得对主内存的直接和快速访问,并建立高吞吐量通信隐蔽通道。其次,我们展示了对DNA序列分析应用程序的侧通道攻击,通过利用PiM操作泄露用户样本基因组的私人特征。我们的结果表明,(i)我们的隐蔽通道实现了高达14.16 Mb/s的通信吞吐量,比最先进的基于主存储器的隐蔽通道快6.38倍,(ii)我们的侧通道攻击使攻击者能够以7.5 Mb/s的吞吐量和96%的准确率确定样本基因组的特性。我们讨论和评估了IMPACT的若干反制措施,以实现安全和稳健的PiM架构。
更新时间: 2024-04-17 11:48:14
领域: cs.CR,cs.AR
Unlocking Bias Detection: Leveraging Transformer-Based Models for Content Analysis
Bias detection in text is crucial for combating the spread of negative stereotypes, misinformation, and biased decision-making. Traditional language models frequently face challenges in generalizing beyond their training data and are typically designed for a single task, often focusing on bias detection at the sentence level. To address this, we present the Contextualized Bi-Directional Dual Transformer (CBDT) \textcolor{green}{\faLeaf} classifier. This model combines two complementary transformer networks: the Context Transformer and the Entity Transformer, with a focus on improving bias detection capabilities. We have prepared a dataset specifically for training these models to identify and locate biases in texts. Our evaluations across various datasets demonstrate CBDT \textcolor{green} effectiveness in distinguishing biased narratives from neutral ones and identifying specific biased terms. This work paves the way for applying the CBDT \textcolor{green} model in various linguistic and cultural contexts, enhancing its utility in bias detection efforts. We also make the annotated dataset available for research purposes.
Updated: 2024-04-17 11:48:11
标题: 解锁偏见检测:利用基于Transformer的模型进行内容分析
摘要: 文本中的偏见检测对于遏制负面刻板印象、误导信息和偏见决策至关重要。传统的语言模型通常面临着超越训练数据的泛化挑战,并且通常设计用于单一任务,常常侧重于句子级别的偏见检测。为了解决这一问题,我们提出了Contextualized Bi-Directional Dual Transformer(CBDT)分类器。该模型结合了两个互补的Transformer网络:Context Transformer和Entity Transformer,注重提高偏见检测能力。我们特别准备了一个数据集用于训练这些模型以识别和定位文本中的偏见。我们在各种数据集上的评估表明,CBDT在区分有偏见叙述和中立叙述以及识别特定有偏见术语方面的有效性。这项工作为在各种语言和文化背景下应用CBDT模型打开了道路,提高了其在偏见检测工作中的实用性。我们还提供了为研究目的而注释的数据集。
更新时间: 2024-04-17 11:48:11
领域: cs.CL,cs.AI
KnowTuning: Knowledge-aware Fine-tuning for Large Language Models
Despite their success at many natural language processing (NLP) tasks, large language models still struggle to effectively leverage knowledge for knowledge-intensive tasks, manifesting limitations such as generating incomplete, non-factual, or illogical answers. These limitations stem from inadequate knowledge awareness of LLMs during vanilla fine-tuning. To address these problems, we propose a knowledge-aware fine-tuning (KnowTuning) method to improve fine-grained and coarse-grained knowledge awareness of LLMs. We devise a fine-grained knowledge augmentation stage to train LLMs to identify difficult fine-grained knowledge in answers. We also propose a coarse-grained knowledge comparison stage to train LLMs to distinguish between reliable and unreliable knowledge, in three aspects: completeness, factuality, and logicality. Extensive experiments on both generic and medical question answering (QA) datasets confirm the effectiveness of KnowTuning, through automatic and human evaluations, across various sizes of LLMs. We further verify that KnowTuning generates more facts with less factual error rate under fine-grained facts evaluation.
Updated: 2024-04-17 11:45:00
标题: KnowTuning:针对大型语言模型的知识感知微调
摘要: 尽管大型语言模型在许多自然语言处理(NLP)任务中取得了成功,但它们在利用知识进行知识密集型任务时仍然存在困难,表现出生成不完整、非事实性或不合逻辑的答案等限制。这些限制源于普通微调期间LLM的知识意识不足。为解决这些问题,我们提出了一种知识感知微调(KnowTuning)方法,以提高LLM的精细和粗粒度知识意识。我们设计了一个精细知识增强阶段,以训练LLM识别答案中的困难精细知识。我们还提出了一个粗粒度知识比较阶段,以训练LLM区分可靠和不可靠的知识,在完整性、事实性和逻辑性三个方面。通过对通用和医学问答(QA)数据集进行广泛实验,通过自动和人工评估,跨各种大小的LLM证实了KnowTuning的有效性。我们进一步验证了在精细事实评估中,KnowTuning生成更多事实且具有更低的事实错误率。
更新时间: 2024-04-17 11:45:00
领域: cs.CL,cs.AI
SoK: Decentralized Finance (DeFi) -- Fundamentals, Taxonomy and Risks
Decentralized Finance (DeFi) refers to financial services that are not necessarily related to crypto-currencies. By employing blockchain for security and integrity, DeFi creates new possibilities that attract retail and institution users, including central banks. Given its novel applications and sophisticated designs, the distinction between DeFi services and understanding the risk involved is often complex. This work systematically presents the major categories of DeFi protocols that cover over 90\% of total value locked (TVL) in DeFi. It establishes a structured methodology to differentiate between DeFi protocols based on their design and architecture. Every DeFi protocol is classified into one of three groups: liquidity pools, pegged and synthetic tokens, and aggregator protocols, followed by risk analysis. In particular, we classify stablecoins, liquid staking tokens, and bridged (wrapped) assets as pegged tokens resembling similar risks. The full risk exposure of DeFi users is derived not only from the DeFi protocol design but also from how it is used and with which tokens.
Updated: 2024-04-17 11:42:53
标题: SoK: 去中心化金融(DeFi)-- 基础知识、分类和风险
摘要: DeFi(去中心化金融)指的是与加密货币无关的金融服务。通过利用区块链技术确保安全性和完整性,DeFi创造了吸引零售和机构用户(包括央行)的新可能性。由于其新颖的应用和复杂的设计,理解DeFi服务之间的区别以及涉及的风险往往是复杂的。本文系统地介绍了涵盖DeFi中超过90%总锁定价值(TVL)的主要DeFi协议类别。它建立了一种结构化方法来区分基于设计和架构的DeFi协议。每个DeFi协议被分类为三种类型之一:流动性池、固定和合成代币、聚合器协议,然后进行风险分析。特别是,我们将稳定币、流动性质押代币和桥接(封装)资产分类为类似风险的固定代币。DeFi用户的完整风险暴露不仅取决于DeFi协议的设计,还取决于它的使用方式以及使用哪些代币。
更新时间: 2024-04-17 11:42:53
领域: cs.CR
Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks
To reduce network traffic and support environments with limited resources, a method for transmitting images with low amounts of transmission data is required. Machine learning-based image compression methods, which compress the data size of images while maintaining their features, have been proposed. However, in certain situations, reconstructing a part of semantic information of images at the receiver end may be sufficient. To realize this concept, semantic-information-based communication, called semantic communication, has been proposed, along with an image transmission method using semantic communication. This method transmits only the semantic information of an image, and the receiver reconstructs the image using an image-generation model. This method utilizes one type of semantic information, but reconstructing images similar to the original image using only it is challenging. This study proposes a multi-modal image transmission method that leverages diverse semantic information for efficient semantic communication. The proposed method extracts multi-modal semantic information from an image and transmits only it. Subsequently, the receiver generates multiple images using an image-generation model and selects an output based on semantic similarity. The receiver must select the output based only on the received features; however, evaluating semantic similarity using conventional metrics is challenging. Therefore, this study explored new metrics to evaluate the similarity between semantic features of images and proposes two scoring procedures. The results indicate that the proposed procedures can compare semantic similarities, such as position and composition, between semantic features of the original and generated images. Thus, the proposed method can facilitate the transmission and utilization of photographs through mobile networks for various service applications.
Updated: 2024-04-17 11:42:39
标题: 资源有限网络中基于多模态相似度估计的图像生成语义通信
摘要: 为了减少网络流量并支持资源有限的环境,需要一种传输图像数据量较低的方法。基于机器学习的图像压缩方法被提出,它可以在保持图像特征的同时压缩图像数据大小。然而,在某些情况下,接收端只需要重建图像的部分语义信息就足够了。为了实现这一概念,提出了基于语义信息的通信,称为语义通信,以及一种使用语义通信的图像传输方法。该方法只传输图像的语义信息,接收端使用图像生成模型重建图像。该方法利用一种类型的语义信息,但仅使用它来重建类似于原始图像的图像是具有挑战性的。本研究提出了一种利用多模态语义信息实现高效语义通信的图像传输方法。所提出的方法从图像中提取多模态语义信息并仅传输它。随后,接收端使用图像生成模型生成多个图像,并基于语义相似性选择输出。接收端必须仅基于接收到的特征选择输出;然而,使用传统指标评估语义相似性是具有挑战性的。因此,本研究探讨了评估原始图像和生成图像的语义特征之间相似性的新指标,并提出了两种评分程序。结果表明,所提出的程序可以比较原始和生成图像的语义特征之间的相似性,如位置和构成。因此,所提出的方法可以促进通过移动网络传输和利用照片,用于各种服务应用。
更新时间: 2024-04-17 11:42:39
领域: cs.NI,cs.AI
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Video retrieval (VR) involves retrieving the ground truth video from the video database given a text caption or vice-versa. The two important components of compositionality: objects & attributes and actions are joined using correct syntax to form a proper text query. These components (objects & attributes, actions and syntax) each play an important role to help distinguish among videos and retrieve the correct ground truth video. However, it is unclear what is the effect of these components on the video retrieval performance. We therefore, conduct a systematic study to evaluate the compositional and syntactic understanding of video retrieval models on standard benchmarks such as MSRVTT, MSVD and DIDEMO. The study is performed on two categories of video retrieval models: (i) which are pre-trained on video-text pairs and fine-tuned on downstream video retrieval datasets (Eg. Frozen-in-Time, Violet, MCQ etc.) (ii) which adapt pre-trained image-text representations like CLIP for video retrieval (Eg. CLIP4Clip, XCLIP, CLIP2Video etc.). Our experiments reveal that actions and syntax play a minor role compared to objects & attributes in video understanding. Moreover, video retrieval models that use pre-trained image-text representations (CLIP) have better syntactic and compositional understanding as compared to models pre-trained on video-text data. The code is available at https://github.com/IntelLabs/multimodal_cognitive_ai/tree/main/ICSVR
Updated: 2024-04-17 11:38:12
标题: ICSVR:探讨视频检索模型中的组合和句法理解
摘要: 视频检索(VR)涉及从视频数据库中检索出真实视频,给定一个文本标题或反之亦然。组合性的两个重要组成部分:对象和属性以及动作,通过正确的语法结合在一起形成一个合适的文本查询。这些组成部分(对象和属性、动作和语法)各自发挥着重要作用,帮助区分视频并检索正确的真实视频。然而,目前尚不清楚这些组成部分对视频检索性能的影响。因此,我们进行了系统性研究,评估了视频检索模型对标准基准数据集(如MSRVTT、MSVD和DIDEMO)的组合性和语法理解。该研究针对两类视频检索模型进行了:(i)在视频文本对上进行预训练并在下游视频检索数据集上微调(如Frozen-in-Time、Violet、MCQ等);(ii)利用预训练的图像文本表示(如CLIP)适应视频检索的模型(如CLIP4Clip、XCLIP、CLIP2Video等)。我们的实验揭示了在视频理解中,动作和语法相对于对象和属性起较小的作用。此外,使用预训练的图像文本表示(CLIP)的视频检索模型相比于在视频文本数据上进行预训练的模型具有更好的语法和组合理解。代码可在https://github.com/IntelLabs/multimodal_cognitive_ai/tree/main/ICSVR 获取。
更新时间: 2024-04-17 11:38:12
领域: cs.CV,cs.AI,cs.CL
Quantum-inspired Techniques in Tensor Networks for Industrial Contexts
In this paper we present a study of the applicability and feasibility of quantum-inspired algorithms and techniques in tensor networks for industrial environments and contexts, with a compilation of the available literature and an analysis of the use cases that may be affected by such methods. In addition, we explore the limitations of such techniques in order to determine their potential scalability.
Updated: 2024-04-17 11:34:14
标题: 量子启发技术在张量网络中的工业应用
摘要: 在本文中,我们提出了一个关于量子启发式算法和张量网络在工业环境和背景中的适用性和可行性的研究,结合了现有文献的编译和对可能受到这些方法影响的使用案例的分析。此外,我们探讨了这些技术的局限性,以确定它们的潜在可扩展性。
更新时间: 2024-04-17 11:34:14
领域: quant-ph,cs.ET,cs.LG,physics.comp-ph,81P68, 15A69,G.1.3; G.2.1; I.2; I.4
RD2Bench: Toward Data-Centric Automatic R&D
The progress of humanity is driven by those successful discoveries accompanied by countless failed experiments. Researchers often seek the potential research directions by reading and then verifying them through experiments. The process imposes a significant burden on researchers. In the past decade, the data-driven black-box deep learning method demonstrates its effectiveness in a wide range of real-world scenarios, which exacerbates the experimental burden of researchers and thus renders the potential successful discoveries veiled. Therefore, automating such a research and development (R&D) process is an urgent need. In this paper, we serve as the first effort to formalize the goal by proposing a Real-world Data-centric automatic R&D Benchmark, namely RD2Bench. RD2Bench benchmarks all the operations in data-centric automatic R&D (D-CARD) as a whole to navigate future work toward our goal directly. We focuses on evaluating the interaction and synergistic effects of various model capabilities and aiding to select the well-performed trustworthy models. Although RD2Bench is very challenging to the state-of-the-art (SOTA) large language model (LLM) named GPT-4, indicating ample research opportunities and more research efforts, LLMs possess promising potential to bring more significant development to D-CARD: They are able to implement some simple methods without adopting any additional techniques. We appeal to future work to take developing techniques for tackling automatic R&D into consideration, thus bringing the opportunities of the potential revolutionary upgrade to human productivity.
Updated: 2024-04-17 11:33:21
标题: RD2Bench:朝向数据中心的自动研发与开发
摘要: 人类的进步是由那些成功的发现推动的,这些发现伴随着无数次失败的实验。研究人员经常通过阅读寻找潜在的研究方向,然后通过实验证实这些方向。这个过程给研究人员带来了巨大的负担。在过去的十年中,基于数据驱动的黑盒深度学习方法在各种真实场景中展示出了其有效性,这加重了研究人员的实验负担,因此使潜在的成功发现变得模糊。因此,自动化这样一个研究和开发(R&D)过程是一个紧迫的需求。在本文中,我们作为第一个努力,通过提出一个以真实数据为中心的自动R&D基准,即RD2Bench,系统化地明确了这个目标。RD2Bench将数据中心的自动R&D(D-CARD)中的所有操作作为一个整体进行基准测试,以直接引导未来的工作朝着我们的目标发展。我们重点评估各种模型能力的相互作用和协同效应,并帮助选择性能良好的可信赖模型。尽管RD2Bench对于名为GPT-4的最新大型语言模型(LLM)非常具有挑战性,但表明了充分的研究机会和更多的研究努力,LLMs具有潜在的带来更大发展的潜力:它们能够实施一些简单的方法而不需要采用任何额外的技术。我们呼吁未来的工作考虑发展解决自动R&D的技术,从而为人类生产力的潜在革命性升级带来机会。
更新时间: 2024-04-17 11:33:21
领域: cs.AI,q-fin.GN
DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series
Time series anomaly detection (TAD) faces a significant challenge due to the scarcity of labelled data, which hinders the development of accurate detection models. Unsupervised domain adaptation (UDA) addresses this challenge by leveraging a labelled dataset from a related domain to detect anomalies in a target dataset. Existing domain adaptation techniques assume that the number of anomalous classes does not change between the source and target domains. In this paper, we propose a novel Domain Adaptation Contrastive learning for Anomaly Detection in multivariate time series (DACAD) model to address this issue by combining UDA and contrastive representation learning. DACAD's approach includes an anomaly injection mechanism that introduces various types of synthetic anomalies, enhancing the model's ability to generalise across unseen anomalous classes in different domains. This method significantly broadens the model's adaptability and robustness. Additionally, we propose a supervised contrastive loss for the source domain and a self-supervised contrastive triplet loss for the target domain, improving comprehensive feature representation learning and extraction of domain-invariant features. Finally, an effective Centre-based Entropy Classifier (CEC) is proposed specifically for anomaly detection, facilitating accurate learning of normal boundaries in the source domain. Our extensive evaluation across multiple real-world datasets against leading models in time series anomaly detection and UDA underscores DACAD's effectiveness. The results validate DACAD's superiority in transferring knowledge across domains and its potential to mitigate the challenge of limited labelled data in time series anomaly detection.
Updated: 2024-04-17 11:20:14
标题: DACAD:多元时间序列异常检测中的领域自适应对比学习
摘要: 时间序列异常检测(TAD)面临着一个重要挑战,即标记数据的稀缺性,这阻碍了准确检测模型的发展。无监督领域自适应(UDA)通过利用相关领域的标记数据集来检测目标数据集中的异常,解决了这一挑战。现有的领域自适应技术假设在源域和目标域之间异常类的数量不会发生变化。在本文中,我们提出了一种新颖的多变量时间序列异常检测领域自适应对比学习(DACAD)模型,通过结合UDA和对比表示学习来解决这一问题。DACAD的方法包括一个异常注入机制,引入各种类型的合成异常,增强了模型在不同领域中未见异常类的泛化能力。这种方法显著扩大了模型的适应性和鲁棒性。此外,我们提出了一个针对源域的监督对比损失,以及一个针对目标域的自监督对比三元组损失,改善了全面特征表示学习和提取领域不变特征。最后,提出了一个专门用于异常检测的有效的基于中心熵分类器(CEC),有助于在源域准确学习正常边界。我们在多个真实世界数据集上对比了时间序列异常检测和UDA领域中的领先模型,结果突出了DACAD的有效性。结果验证了DACAD在跨领域知识传递和缓解时间序列异常检测中标记数据有限挑战的优越性。
更新时间: 2024-04-17 11:20:14
领域: cs.LG,cs.AI
Composition in Differential Privacy for General Granularity Notions (Long Version)
The composition theorems of differential privacy (DP) allow data curators to combine different algorithms to obtain a new algorithm that continues to satisfy DP. However, new granularity notions (i.e., neighborhood definitions), data domains, and composition settings have appeared in the literature that the classical composition theorems do not cover. For instance, the original parallel composition theorem does not translate well to general granularity notions. This complicates the opportunity of composing DP mechanisms in new settings and obtaining accurate estimates of the incurred privacy loss after composition. To overcome these limitations, we study the composability of DP in a general framework and for any kind of data domain or neighborhood definition. We give a general composition theorem in both independent and adaptive versions and we provide analogous composition results for approximate, zero-concentrated, and Gaussian DP. Besides, we study the hypothesis needed to obtain the best composition bounds. Our theorems cover both parallel and sequential composition settings. Importantly, they also cover every setting in between, allowing us to compute the final privacy loss of a composition with greatly improved accuracy.
Updated: 2024-04-17 11:17:25
标题: 差分隐私中的组合性:针对一般粒度概念的长篇版本
摘要: 差分隐私(DP)的组合定理允许数据管理员将不同的算法组合起来,得到一个新的算法,仍然满足DP。然而,文献中出现了新的粒度概念(即邻域定义)、数据域和组合设置,经典的组合定理并未涵盖这些内容。例如,原始的并行组合定理在一般粒度概念下无法很好地转化。这使得在新设置中组合DP机制并获得组合后隐私损失的准确估计变得复杂。 为了克服这些限制,我们研究了在一般框架下以及针对任何类型的数据域或邻域定义的DP的可组合性。我们提供了独立和自适应版本的一般组合定理,并为近似、零集中和高斯DP提供了类似的组合结果。此外,我们研究了获得最佳组合界限所需的假设。我们的定理涵盖了并行和顺序组合设置。重要的是,它们还涵盖了这两者之间的每种设置,使我们能够以更高的准确度计算组合的最终隐私损失。
更新时间: 2024-04-17 11:17:25
领域: cs.CR,cs.DS,68P27 (Primary)
Incremental Residual Concept Bottleneck Models
Concept Bottleneck Models (CBMs) map the black-box visual representations extracted by deep neural networks onto a set of interpretable concepts and use the concepts to make predictions, enhancing the transparency of the decision-making process. Multimodal pre-trained models can match visual representations with textual concept embeddings, allowing for obtaining the interpretable concept bottleneck without the expertise concept annotations. Recent research has focused on the concept bank establishment and the high-quality concept selection. However, it is challenging to construct a comprehensive concept bank through humans or large language models, which severely limits the performance of CBMs. In this work, we propose the Incremental Residual Concept Bottleneck Model (Res-CBM) to address the challenge of concept completeness. Specifically, the residual concept bottleneck model employs a set of optimizable vectors to complete missing concepts, then the incremental concept discovery module converts the complemented vectors with unclear meanings into potential concepts in the candidate concept bank. Our approach can be applied to any user-defined concept bank, as a post-hoc processing method to enhance the performance of any CBMs. Furthermore, to measure the descriptive efficiency of CBMs, the Concept Utilization Efficiency (CUE) metric is proposed. Experiments show that the Res-CBM outperforms the current state-of-the-art methods in terms of both accuracy and efficiency and achieves comparable performance to black-box models across multiple datasets.
Updated: 2024-04-17 10:59:59
标题: 增量剩余概念瓶颈模型
摘要: 概念瓶颈模型(CBMs)将深度神经网络提取的黑盒视觉表示映射到一组可解释的概念,并利用这些概念进行预测,增强决策过程的透明度。多模态预训练模型可以将视觉表示与文本概念嵌入匹配,从而可以获得可解释的概念瓶颈,无需专业概念注释。最近的研究集中在概念库的建立和高质量概念的选择。然而,通过人类或大型语言模型构建全面的概念库是具有挑战性的,这严重限制了CBMs的性能。在这项工作中,我们提出了增量剩余概念瓶颈模型(Res-CBM)来应对概念完整性的挑战。具体来说,剩余概念瓶颈模型采用一组可优化的向量来完成缺失的概念,然后增量概念发现模块将未清晰含义的补充向量转换为候选概念库中的潜在概念。我们的方法可以应用于任何用户定义的概念库,作为一种事后处理方法来增强任何CBMs的性能。此外,为了衡量CBMs的描述效率,提出了概念利用效率(CUE)指标。实验表明,Res-CBM在准确性和效率方面优于当前的最先进方法,并在多个数据集上实现了与黑盒模型相当的性能。
更新时间: 2024-04-17 10:59:59
领域: cs.LG,cs.AI
AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs
Measuring the quality of contributions in political online discussions is crucial in deliberation research and computer science. Research has identified various indicators to assess online discussion quality, and with deep learning advancements, automating these measures has become feasible. While some studies focus on analyzing specific quality indicators, a comprehensive quality score incorporating various deliberative aspects is often preferred. In this work, we introduce AQuA, an additive score that calculates a unified deliberative quality score from multiple indices for each discussion post. Unlike other singular scores, AQuA preserves information on the deliberative aspects present in comments, enhancing model transparency. We develop adapter models for 20 deliberative indices, and calculate correlation coefficients between experts' annotations and the perceived deliberativeness by non-experts to weigh the individual indices into a single deliberative score. We demonstrate that the AQuA score can be computed easily from pre-trained adapters and aligns well with annotations on other datasets that have not be seen during training. The analysis of experts' vs. non-experts' annotations confirms theoretical findings in the social science literature.
Updated: 2024-04-17 10:56:48
标题: AQuA -- 结合专家和非专家观点,利用LLMs评估在线讨论中的审议质量
摘要: 在线政治讨论中贡献质量的衡量对于审议研究和计算机科学至关重要。研究已经确定了各种指标来评估在线讨论质量,并随着深度学习的进展,自动化这些测量变得可行。虽然一些研究集中分析特定质量指标,但通常更偏好综合考虑各种审议方面的综合质量评分。在这项工作中,我们介绍了AQuA,这是一个附加得分,它从多个指标为每个讨论帖计算一个统一的审议质量评分。与其他单一评分不同,AQuA保留了评论中存在的审议方面的信息,增强了模型的透明度。我们为20个审议指标开发了适配器模型,并计算了专家注释与非专家感知的审议性之间的相关系数,以将各个指标加权到单一的审议分数中。我们证明了AQuA分数可以轻松地从预训练的适配器中计算出来,并且与在训练过程中未见过的其他数据集上的注释相吻合。专家与非专家注释的分析证实了社会科学文献中的理论发现。
更新时间: 2024-04-17 10:56:48
领域: cs.CL,cs.AI,cs.LG
Can LLMs perform structured graph reasoning?
Pretrained Large Language Models (LLMs) have demonstrated various reasoning capabilities through language-based prompts alone, particularly in unstructured task settings (tasks purely based on language semantics). However, LLMs often struggle with structured tasks, because of the inherent incompatibility of input representation. Reducing structured tasks to uni-dimensional language semantics often renders the problem trivial. Keeping the trade-off between LLM compatibility and structure complexity in mind, we design various graph reasoning tasks as a proxy to semi-structured tasks in this paper, in order to test the ability to navigate through representations beyond plain text in various LLMs. Particularly, we design 10 distinct problems of graph traversal, each representing increasing levels of complexity, and benchmark 5 different instruct-finetuned LLMs (GPT-4, GPT-3.5, Claude-2, Llama-2 and Palm-2) on the aforementioned tasks. Further, we analyse the performance of models across various settings such as varying sizes of graphs as well as different forms of k-shot prompting. We highlight various limitations, biases and properties of LLMs through this benchmarking process, such as an inverse relation to the average degrees of freedom of traversal per node in graphs, the overall negative impact of k-shot prompting on graph reasoning tasks, and a positive response bias which prevents LLMs from identifying the absence of a valid solution. Finally, we introduce a new prompting technique specially designed for graph traversal tasks (PathCompare), which demonstrates a notable increase in the performance of LLMs in comparison to standard prompting techniques such as Chain-of-Thought (CoT).
Updated: 2024-04-17 10:50:04
标题: LLMs能够执行结构化图推理吗?
摘要: 预训练的大型语言模型(LLMs)仅通过基于语言的提示就展示了各种推理能力,特别是在非结构化任务设置中(完全基于语言语义的任务)。然而,由于输入表示的固有不兼容性,LLMs通常在结构化任务中遇到困难。将结构化任务简化为一维语言语义通常会使问题变得琐碎。在考虑LLM兼容性和结构复杂性之间的权衡的基础上,我们在本文中设计了各种图形推理任务作为半结构化任务的代理,以测试LLMs在各种表示之外航行的能力。特别是,我们设计了10个不同的图遍历问题,每个问题代表着不断增加的复杂性水平,并在上述任务上对5种不同的指令微调LLMs(GPT-4、GPT-3.5、Claude-2、Llama-2和Palm-2)进行基准测试。此外,我们分析了模型在不同设置下的性能,如不同大小的图形以及不同形式的k-shot提示。通过这一基准测试过程,我们强调了LLMs的各种局限性、偏见和特性,比如与图中每个节点的遍历自由度的平均度数呈反比关系、k-shot提示对图形推理任务的整体负面影响,以及阻止LLMs识别有效解决方案缺失的正响应偏差。最后,我们介绍了一种专门为图遍历任务设计的新提示技术(PathCompare),它展示了LLMs性能相对于标准提示技术(如CoT)的显著增加。
更新时间: 2024-04-17 10:50:04
领域: cs.CL,cs.AI
Optical Image-to-Image Translation Using Denoising Diffusion Models: Heterogeneous Change Detection as a Use Case
We introduce an innovative deep learning-based method that uses a denoising diffusion-based model to translate low-resolution images to high-resolution ones from different optical sensors while preserving the contents and avoiding undesired artifacts. The proposed method is trained and tested on a large and diverse data set of paired Sentinel-II and Planet Dove images. We show that it can solve serious image generation issues observed when the popular classifier-free guided Denoising Diffusion Implicit Model (DDIM) framework is used in the task of Image-to-Image Translation of multi-sensor optical remote sensing images and that it can generate large images with highly consistent patches, both in colors and in features. Moreover, we demonstrate how our method improves heterogeneous change detection results in two urban areas: Beirut, Lebanon, and Austin, USA. Our contributions are: i) a new training and testing algorithm based on denoising diffusion models for optical image translation; ii) a comprehensive image quality evaluation and ablation study; iii) a comparison with the classifier-free guided DDIM framework; and iv) change detection experiments on heterogeneous data.
Updated: 2024-04-17 10:49:00
标题: 使用去噪扩散模型进行光学图像到图像的转换:异质变化检测作为一个使用案例
摘要: 我们介绍了一种创新的基于深度学习的方法,该方法使用去噪扩散模型将低分辨率图像转换为高分辨率图像,同时保留内容并避免不良伪影。所提出的方法在大型和多样化的Sentinel-II和Planet Dove图像配对数据集上进行了训练和测试。我们展示了当流行的无分类器引导去噪扩散隐式模型(DDIM)框架用于多传感器光学遥感图像的图像到图像转换任务时,它可以解决严重的图像生成问题,并且可以生成具有高度一致补丁的大型图像,无论是在颜色还是特征上。此外,我们展示了我们的方法如何改善贝鲁特(黎巴嫩)和奥斯汀(美国)两个城市地区的异质变化检测结果。我们的贡献包括:i)基于去噪扩散模型的光学图像转换的新训练和测试算法;ii)全面的图像质量评估和剔除研究;iii)与无分类器引导DDIM框架的比较;和iv)异质数据上的变化检测实验。
更新时间: 2024-04-17 10:49:00
领域: cs.CV,cs.AI
Commitments are equivalent to one-way state generators
One-way state generators (OWSG) are natural quantum analogs to classical one-way functions. We show that $O\left(\frac{n}{\log(n)}\right)$-copy OWSGs ($n$ represents the input length) are equivalent to $poly(n)$-copy OWSG and to quantum commitments. Since known results show that $o\left(\frac{n}{\log(n)}\right)$-copy OWSG cannot imply commitments, this shows that $O\left(\frac{n}{\log(n)}\right)$-copy OWSGs are the weakest OWSGs from which we can get commitments (and hence much of quantum cryptography). Our construction follows along the lines of H\r{a}stad, Impagliazzo, Levin and Luby [HILL], who obtained classical pseudorandom generators (PRG) from classical one-way functions (OWF), however with crucial modifications. Our construction, when applied to the classical case, provides an alternative to the construction provided by [HILL]. Since we do not argue conditioned on the output of the one-way function, our construction and analysis are arguably simpler and may be of independent interest.
Updated: 2024-04-17 10:29:25
标题: "承诺等同于单向状态生成器"
摘要: 一方向状态生成器(OWSG)是经典一方向函数的自然量子模拟。我们展示了$O\left(\frac{n}{\log(n)}\right)$-复制的OWSG($n$代表输入长度)等效于$poly(n)$-复制的OWSG和量子承诺。由于已知结果表明$o\left(\frac{n}{\log(n)}\right)$-复制的OWSG不能导出承诺,这表明$O\left(\frac{n}{\log(n)}\right)$-复制的OWSG是我们可以获得承诺(因此很多量子密码学)的最弱的OWSG。我们的构造沿着H\r{a}stad,Impagliazzo,Levin和Luby [HILL]的思路进行,他们从经典一方向函数(OWF)获得了经典伪随机生成器(PRG),但有关键性修改。我们的构造在应用于经典情况时,提供了[HILL]提供的构造的替代方案。由于我们不是根据一方向函数的输出进行论证,我们的构造和分析可能更简单,并且可能具有独立的兴趣。
更新时间: 2024-04-17 10:29:25
领域: quant-ph,cs.CR
Energy-Efficient Uncertainty-Aware Biomass Composition Prediction at the Edge
Clover fixates nitrogen from the atmosphere to the ground, making grass-clover mixtures highly desirable to reduce external nitrogen fertilization. Herbage containing clover additionally promotes higher food intake, resulting in higher milk production. Herbage probing however remains largely unused as it requires a time-intensive manual laboratory analysis. Without this information, farmers are unable to perform localized clover sowing or take targeted fertilization decisions. Deep learning algorithms have been proposed with the goal to estimate the dry biomass composition from images of the grass directly in the fields. The energy-intensive nature of deep learning however limits deployment to practical edge devices such as smartphones. This paper proposes to fill this gap by applying filter pruning to reduce the energy requirement of existing deep learning solutions. We report that although pruned networks are accurate on controlled, high-quality images of the grass, they struggle to generalize to real-world smartphone images that are blurry or taken from challenging angles. We address this challenge by training filter-pruned models using a variance attenuation loss so they can predict the uncertainty of their predictions. When the uncertainty exceeds a threshold, we re-infer using a more accurate unpruned model. This hybrid approach allows us to reduce energy consumption while retaining a high accuracy. We evaluate our algorithm on two datasets: the GrassClover and the Irish clover using an NVIDIA Jetson Nano edge device. We find that we reduce energy reduction with respect to state-of-the-art solutions by 50% on average with only 4% accuracy loss.
Updated: 2024-04-17 10:26:49
标题: 在边缘进行能效和不确定性感知的生物质组成预测
摘要: 三叶草从大气中固定氮向地面,使得草三叶草混合物成为减少外部氮肥施用的理想选择。含有三叶草的牧草还可以促进更高的食物摄入,从而导致更高的牛奶产量。然而,牧草探测仍然很少被使用,因为它需要耗时的手工实验室分析。缺乏这些信息,农民无法进行局部三叶草播种或做出有针对性的施肥决策。深度学习算法被提出,旨在从田间直接的草图像中估计干生物量组成。然而,深度学习的高能耗性质限制了其部署到实际边缘设备,如智能手机。本文提出通过应用滤波器修剪来减少现有深度学习解决方案的能量需求,以填补这一空白。我们报告,虽然经修剪的网络在受控的、高质量的草图像上准确,但在模糊或从具有挑战性角度拍摄的真实世界智能手机图像上却难以泛化。我们通过使用方差衰减损失来训练经过滤波修剪的模型,使其能够预测其预测的不确定性。当不确定性超过阈值时,我们将使用更准确的未修剪模型重新推断。这种混合方法使我们能够在保持高准确性的同时减少能耗。我们在两个数据集上评估了我们的算法:GrassClover和爱尔兰三叶草,使用NVIDIA Jetson Nano边缘设备。我们发现,相对于最先进的解决方案,我们能够将能耗减少平均50%,仅有4%的准确性损失。
更新时间: 2024-04-17 10:26:49
领域: cs.CV,cs.AI
Online Bin Packing with Predictions
Bin packing is a classic optimization problem with a wide range of applications, from load balancing to supply chain management. In this work, we study the online variant of the problem, in which a sequence of items of various sizes must be placed into a minimum number of bins of uniform capacity. The online algorithm is enhanced with a (potentially erroneous) prediction concerning the frequency of item sizes in the sequence. We design and analyze online algorithms with efficient tradeoffs between the consistency (i.e., the competitive ratio assuming no prediction error) and the robustness (i.e., the competitive ratio under adversarial error), and whose performance degrades near-optimally as a function of the prediction error. This is the first theoretical and experimental study of online bin packing under competitive analysis, in the realistic setting of learnable predictions. Previous work addressed only extreme cases with respect to the prediction error, and relied on overly powerful and error-free oracles.
Updated: 2024-04-17 10:25:45
标题: 使用预测的在线装箱
摘要: 装箱问题是一个经典的优化问题,具有广泛的应用,从负载平衡到供应链管理。在这项工作中,我们研究了该问题的在线变体,在该变体中,必须将一系列不同大小的物品放入具有统一容量的最少数量的箱中。在线算法通过对序列中物品大小频率的(潜在的错误)预测进行增强。我们设计并分析了在线算法,在一致性(即在没有预测错误的情况下的竞争比率)和鲁棒性(即在对抗性错误下的竞争比率)之间实现了高效的权衡,并且其性能会随着预测错误的增加而接近最佳。这是在线装箱问题在竞争性分析下的第一项理论和实验研究,在可学习预测的现实设置中。先前的工作只涉及极端情况下的预测错误,并依赖于过于强大且无误差的预言。
更新时间: 2024-04-17 10:25:45
领域: cs.DS,cs.LG,68T05,I.2.6; F.2.0
Semantic Communication for Cooperative Multi-Task Processing over Wireless Networks
In this paper, we have expanded the current status of semantic communication limited to processing one task to a more general system that can handle multiple tasks concurrently. In pursuit of this, we first introduced our definition of the "semantic source", enabling the interpretation of multiple semantics based on a single observation. A semantic encoder design is then introduced, featuring the division of the encoder into a common unit and multiple specific units enabling cooperative multi-task processing. Simulation results demonstrate the effectiveness of the proposed semantic source and the system design. Our approach employs information maximization (infomax) and end-to-end design principles.
Updated: 2024-04-17 10:21:47
标题: 无线网络上用于合作多任务处理的语义通信
摘要: 在这篇论文中,我们扩展了目前仅限于处理单个任务的语义通信的现状,使其成为一个能够同时处理多个任务的更通用的系统。为此,我们首先介绍了我们对“语义源”的定义,使其能够基于单个观测解释多种语义。接着介绍了一种语义编码器设计,其中编码器被划分为一个通用单元和多个特定单元,使其能够进行协同多任务处理。模拟结果表明了所提出的语义源和系统设计的有效性。我们的方法采用信息最大化(infomax)和端到端设计原则。
更新时间: 2024-04-17 10:21:47
领域: eess.SP,cs.IT,cs.LG,math.IT
In-Context Learning State Vector with Inner and Momentum Optimization
Large Language Models (LLMs) have exhibited an impressive ability to perform In-Context Learning (ICL) from only a few examples. Recent works have indicated that the functions learned by ICL can be represented through compressed vectors derived from the transformer. However, the working mechanisms and optimization of these vectors are yet to be thoroughly explored. In this paper, we address this gap by presenting a comprehensive analysis of these compressed vectors, drawing parallels to the parameters trained with gradient descent, and introduce the concept of state vector. Inspired by the works on model soup and momentum-based gradient descent, we propose inner and momentum optimization methods that are applied to refine the state vector progressively as test-time adaptation. Moreover, we simulate state vector aggregation in the multiple example setting, where demonstrations comprising numerous examples are usually too lengthy for regular ICL, and further propose a divide-and-conquer aggregation method to address this challenge. We conduct extensive experiments using Llama-2 and GPT-J in both zero-shot setting and few-shot setting. The experimental results show that our optimization method effectively enhances the state vector and achieves the state-of-the-art performance on diverse tasks. Code is available at https://github.com/HITsz-TMG/ICL-State-Vector
Updated: 2024-04-17 10:19:15
标题: 具有内部和动量优化的上下文学习状态向量
摘要: 大型语言模型(LLMs)展示了令人印象深刻的能力,可以从少数示例中执行上下文学习(ICL)。最近的研究表明,通过从transformer派生的压缩向量可以表示ICL学习的功能。然而,这些向量的工作机制和优化尚未被彻底探索。在本文中,我们通过对这些压缩向量进行全面分析,将其与使用梯度下降训练的参数进行对比,并引入了状态向量的概念来填补这一空白。受模型混沌和基于动量的梯度下降的启发,我们提出了内部和动量优化方法,逐步应用于在测试时间逐步完善状态向量。此外,我们模拟了在多个示例设置中的状态向量聚合,其中通常由大量示例组成的演示对于常规ICL而言太长,进一步提出了一种分而治之的聚合方法来解决这一挑战。我们在零-shot设置和少-shot设置中使用Llama-2和GPT-J进行了广泛的实验。实验结果表明,我们的优化方法有效地增强了状态向量,并在各种任务上实现了最先进的性能。源代码可在https://github.com/HITsz-TMG/ICL-State-Vector找到。
更新时间: 2024-04-17 10:19:15
领域: cs.CL,cs.AI
Analytical results for uncertainty propagation through trained machine learning regression models
Machine learning (ML) models are increasingly being used in metrology applications. However, for ML models to be credible in a metrology context they should be accompanied by principled uncertainty quantification. This paper addresses the challenge of uncertainty propagation through trained/fixed machine learning (ML) regression models. Analytical expressions for the mean and variance of the model output are obtained/presented for certain input data distributions and for a variety of ML models. Our results cover several popular ML models including linear regression, penalised linear regression, kernel ridge regression, Gaussian Processes (GPs), support vector machines (SVMs) and relevance vector machines (RVMs). We present numerical experiments in which we validate our methods and compare them with a Monte Carlo approach from a computational efficiency point of view. We also illustrate our methods in the context of a metrology application, namely modelling the state-of-health of lithium-ion cells based upon Electrical Impedance Spectroscopy (EIS) data
Updated: 2024-04-17 10:16:20
标题: 经过训练的机器学习回归模型中不确定性传播的分析结果
摘要: 机器学习(ML)模型越来越多地被用于计量学应用。然而,在计量学环境中,为了使ML模型可信,应该伴随有原则性的不确定性量化。本文解决了通过训练/固定的机器学习(ML)回归模型传播不确定性的挑战。针对特定的输入数据分布和多种ML模型,获得/呈现了模型输出的均值和方差的解析表达式。我们的结果涵盖了几种流行的ML模型,包括线性回归、惩罚线性回归、核岭回归、高斯过程(GPs)、支持向量机(SVMs)和相关向量机(RVMs)。我们进行了数值实验,验证了我们的方法,并从计算效率的角度与蒙特卡洛方法进行了比较。我们还在计量学应用的背景下展示了我们的方法,即基于电子阻抗谱(EIS)数据对锂离子电池的健康状态进行建模。
更新时间: 2024-04-17 10:16:20
领域: cs.LG,stat.ML
Federated Class-Incremental Learning with New-Class Augmented Self-Distillation
Federated Learning (FL) enables collaborative model training among participants while guaranteeing the privacy of raw data. Mainstream FL methodologies overlook the dynamic nature of real-world data, particularly its tendency to grow in volume and diversify in classes over time. This oversight results in FL methods suffering from catastrophic forgetting, where the trained models inadvertently discard previously learned information upon assimilating new data. In response to this challenge, we propose a novel Federated Class-Incremental Learning (FCIL) method, named \underline{Fed}erated \underline{C}lass-Incremental \underline{L}earning with New-Class \underline{A}ugmented \underline{S}elf-Di\underline{S}tillation (FedCLASS). The core of FedCLASS is to enrich the class scores of historical models with new class scores predicted by current models and utilize the combined knowledge for self-distillation, enabling a more sufficient and precise knowledge transfer from historical models to current models. Theoretical analyses demonstrate that FedCLASS stands on reliable foundations, considering scores of old classes predicted by historical models as conditional probabilities in the absence of new classes, and the scores of new classes predicted by current models as the conditional probabilities of class scores derived from historical models. Empirical experiments demonstrate the superiority of FedCLASS over four baseline algorithms in reducing average forgetting rate and boosting global accuracy.
Updated: 2024-04-17 10:13:36
标题: 具有新类增强自蒸馏的联邦式类增量学习
摘要: Federated Learning(FL)使参与者在保证原始数据隐私的同时进行协作模型训练成为可能。主流的FL方法忽视了现实世界数据的动态特性,特别是随着时间推移,数据量增长和类别多样化的趋势。这种忽视导致FL方法遭受灾难性遗忘的问题,即在吸收新数据时,训练模型无意中丢弃了先前学到的信息。为了应对这一挑战,我们提出了一种新颖的联邦增量学习(FCIL)方法,命名为FedCLASS,即联邦类增量学习与新类增强自我蒸馏。FedCLASS的核心是利用当前模型预测的新类别分数来丰富历史模型的类别分数,并利用结合的知识进行自我蒸馏,实现了更充分和精确的知识传递,从而历史模型到当前模型。理论分析表明,FedCLASS站在可靠的基础上,将历史模型预测的老类别分数视为在没有新类别的情况下的条件概率,并将当前模型预测的新类别分数视为从历史模型推导出的类别分数的条件概率。实证实验证明了FedCLASS在降低平均遗忘率和提高全局准确率方面优于四种基准算法。
更新时间: 2024-04-17 10:13:36
领域: cs.LG
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems
We consider stochastic optimization problems with heavy-tailed noise with structured density. For such problems, we show that it is possible to get faster rates of convergence than $\mathcal{O}(K^{-2(\alpha - 1)/\alpha})$, when the stochastic gradients have finite moments of order $\alpha \in (1, 2]$. In particular, our analysis allows the noise norm to have an unbounded expectation. To achieve these results, we stabilize stochastic gradients, using smoothed medians of means. We prove that the resulting estimates have negligible bias and controllable variance. This allows us to carefully incorporate them into clipped-SGD and clipped-SSTM and derive new high-probability complexity bounds in the considered setup.
Updated: 2024-04-17 10:12:59
标题: 打破随机优化问题中的重尾噪声障碍
摘要: 我们考虑具有结构密度的重尾噪声的随机优化问题。对于这样的问题,我们展示了在随机梯度具有阶数为$\alpha \in (1, 2]$的有限矩时,可以获得比$\mathcal{O}(K^{-2(\alpha - 1)/\alpha})$更快的收敛速度。特别地,我们的分析允许噪声范数具有无界期望。为了实现这些结果,我们通过使用平滑的均值中值稳定随机梯度。我们证明了由此得到的估计具有可忽略的偏差和可控制的方差。这使我们能够将它们谨慎地结合到剪切SGD和剪切SSTM中,并在考虑的设置中推导新的高概率复杂度界。
更新时间: 2024-04-17 10:12:59
领域: math.OC,cs.DS,cs.LG,math.ST,stat.TH
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Effective attention modules have played a crucial role in the success of Transformer-based large language models (LLMs), but the quadratic time and memory complexities of these attention modules also pose a challenge when processing long sequences. One potential solution for the long sequence problem is to utilize distributed clusters to parallelize the computation of attention modules across multiple devices (e.g., GPUs). However, adopting a distributed approach inevitably introduces extra memory overheads to store local attention results and incurs additional communication costs to aggregate local results into global ones. In this paper, we propose a distributed attention framework named ``BurstAttention'' to optimize memory access and communication operations at both the global cluster and local device levels. In our experiments, we compare BurstAttention with other competitive distributed attention solutions for long sequence processing. The experimental results under different length settings demonstrate that BurstAttention offers significant advantages for processing long sequences compared with these competitive baselines, reducing 40% communication overheads and achieving 1.37 X speedup during training 128K sequence length on 32 X A100.
Updated: 2024-04-17 10:07:14
标题: BurstAttention:一种用于极长序列的高效分布式注意力框架
摘要: 有效的注意力模块在基于Transformer的大型语言模型(LLMs)的成功中起着至关重要的作用,但是这些注意力模块的二次时间和内存复杂性在处理长序列时也带来了挑战。解决长序列问题的一个潜在解决方案是利用分布式集群来并行计算注意力模块在多个设备(例如GPU)上。然而,采用分布式方法不可避免地会引入额外的内存开销来存储本地注意力结果,并带来额外的通信成本来将本地结果聚合成全局结果。在本文中,我们提出了一种名为“BurstAttention”的分布式注意力框架,以优化全局集群和本地设备级别的内存访问和通信操作。在我们的实验中,我们将BurstAttention与其他竞争性的分布式注意力解决方案进行了比较,用于处理长序列。不同长度设置下的实验结果表明,与这些竞争性基线相比,BurstAttention在处理长序列时提供了显著优势,减少了40%的通信开销,并在训练128K序列长度时在32 X A100上实现了1.37倍的加速。
更新时间: 2024-04-17 10:07:14
领域: cs.DC,cs.LG
Position Engineering: Boosting Large Language Models through Positional Information Manipulation
The performance of large language models (LLMs) is significantly influenced by the quality of the prompts provided. In response, researchers have developed enormous prompt engineering strategies aimed at modifying the prompt text to enhance task performance. In this paper, we introduce a novel technique termed position engineering, which offers a more efficient way to guide large language models. Unlike prompt engineering, which requires substantial effort to modify the text provided to LLMs, position engineering merely involves altering the positional information in the prompt without modifying the text itself. We have evaluated position engineering in two widely-used LLM scenarios: retrieval-augmented generation (RAG) and in-context learning (ICL). Our findings show that position engineering substantially improves upon the baseline in both cases. Position engineering thus represents a promising new strategy for exploiting the capabilities of large language models.
Updated: 2024-04-17 10:00:56
标题: 位置工程:通过位置信息操作提升大型语言模型
摘要: 大型语言模型(LLMs)的性能受到所提供提示的质量显著影响。作为回应,研究人员开发了大量的提示工程策略,旨在修改提示文本以增强任务性能。本文介绍了一种名为位置工程的新技术,它提供了一种更有效的方式来引导大型语言模型。与需要大量努力修改提供给LLMs的文本的提示工程不同,位置工程仅涉及在不修改文本本身的情况下修改提示的位置信息。我们在两种广泛使用的LLM场景中评估了位置工程:检索增强生成(RAG)和上下文学习(ICL)。我们的发现表明,位置工程在两种情况下都显著改善了基线。因此,位置工程代表了一种利用大型语言模型能力的有前途的新策略。
更新时间: 2024-04-17 10:00:56
领域: cs.CL,cs.AI,cs.LG
Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual Conditions
A significant challenge in the field of object detection lies in the system's performance under non-ideal imaging conditions, such as rain, fog, low illumination, or raw Bayer images that lack ISP processing. Our study introduces "Feature Corrective Transfer Learning", a novel approach that leverages transfer learning and a bespoke loss function to facilitate the end-to-end detection of objects in these challenging scenarios without the need to convert non-ideal images into their RGB counterparts. In our methodology, we initially train a comprehensive model on a pristine RGB image dataset. Subsequently, non-ideal images are processed by comparing their feature maps against those from the initial ideal RGB model. This comparison employs the Extended Area Novel Structural Discrepancy Loss (EANSDL), a novel loss function designed to quantify similarities and integrate them into the detection loss. This approach refines the model's ability to perform object detection across varying conditions through direct feature map correction, encapsulating the essence of Feature Corrective Transfer Learning. Experimental validation on variants of the KITTI dataset demonstrates a significant improvement in mean Average Precision (mAP), resulting in a 3.8-8.1% relative enhancement in detection under non-ideal conditions compared to the baseline model, and a less marginal performance difference within 1.3% of the mAP@[0.5:0.95] achieved under ideal conditions by the standard Faster RCNN algorithm.
Updated: 2024-04-17 09:58:53
标题: 特征校正迁移学习:端到端解决在非理想视觉条件下的目标检测
摘要: 目前目标检测领域面临的一个重要挑战在于系统在非理想成像条件下的性能,比如雨、雾、低照明或缺乏ISP处理的原始Bayer图像。我们的研究引入了一种新颖的方法,称为“特征校正迁移学习”,它利用迁移学习和定制损失函数,促进在这些具有挑战性的场景中端到端地检测对象,而无需将非理想图像转换为它们的RGB对应图像。在我们的方法论中,我们首先在一个完整的RGB图像数据集上训练一个模型。随后,通过比较非理想图像的特征映射与初始理想RGB模型的特征映射,来处理非理想图像。这种比较采用了扩展区域新结构差异损失(EANSDL),这是一种新颖的损失函数,旨在量化相似性并将其整合到检测损失中。这种方法通过直接特征映射校正,提炼了模型在不同条件下执行对象检测的能力,体现了特征校正迁移学习的本质。对KITTI数据集的变种进行的实验验证显示,与基准模型相比,在非理想条件下检测的平均精度(mAP)显著提高,相对增强了3.8-8.1%,并且在标准Faster RCNN算法在理想条件下实现的mAP@[0.5:0.95]上的表现差异较小,仅为1.3%。
更新时间: 2024-04-17 09:58:53
领域: cs.CV,cs.AI
Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in Surface Electromyographic Signal Analysis
Gesture recognition based on surface electromyography (sEMG) has been gaining importance in many 3D Interactive Scenes. However, sEMG is easily influenced by various forms of noise in real-world environments, leading to challenges in providing long-term stable interactions through sEMG. Existing methods often struggle to enhance model noise resilience through various predefined data augmentation techniques. In this work, we revisit the problem from a short term enhancement perspective to improve precision and robustness against various common noisy scenarios with learnable denoise using sEMG intrinsic pattern information and sliding-window attention. We propose a Short Term Enhancement Module(STEM) which can be easily integrated with various models. STEM offers several benefits: 1) Learnable denoise, enabling noise reduction without manual data augmentation; 2) Scalability, adaptable to various models; and 3) Cost-effectiveness, achieving short-term enhancement through minimal weight-sharing in an efficient attention mechanism. In particular, we incorporate STEM into a transformer, creating the Short Term Enhanced Transformer (STET). Compared with best-competing approaches, the impact of noise on STET is reduced by more than 20%. We also report promising results on both classification and regression datasets and demonstrate that STEM generalizes across different gesture recognition tasks.
Updated: 2024-04-17 09:57:40
标题: 重新审视手势识别中的噪声鲁棒性策略:表面肌电信号分析中的短期增强
摘要: 基于表面肌电图(sEMG)的手势识别在许多3D交互场景中变得越来越重要。然而,在现实环境中,sEMG很容易受到各种形式的噪音的影响,导致通过sEMG提供长期稳定交互的挑战。现有方法通常通过各种预定义的数据增强技术来增强模型对噪音的鲁棒性。在这项工作中,我们从短期增强的角度重新审视这个问题,以改善针对各种常见噪音场景的精度和稳健性,通过使用sEMG固有模式信息和滑动窗口注意力学习去噪。我们提出了一个短期增强模块(STEM),它可以轻松集成到各种模型中。STEM具有几个优点:1)可学习去噪,实现无需手动数据增强的噪声降低;2)可扩展性,适用于各种模型;3)成本效益,通过高效的注意机制中的最小权重共享实现短期增强。特别是,我们将STEM整合到一个transformer中,创建了短期增强变压器(STET)。与最佳竞争方法相比,噪音对STET的影响降低了超过20%。我们还在分类和回归数据集上报告了有希望的结果,并证明STEM在不同手势识别任务中具有泛化能力。
更新时间: 2024-04-17 09:57:40
领域: eess.SP,cs.AI
Prompt-Guided Generation of Structured Chest X-Ray Report Using a Pre-trained LLM
Medical report generation automates radiology descriptions from images, easing the burden on physicians and minimizing errors. However, current methods lack structured outputs and physician interactivity for clear, clinically relevant reports. Our method introduces a prompt-guided approach to generate structured chest X-ray reports using a pre-trained large language model (LLM). First, we identify anatomical regions in chest X-rays to generate focused sentences that center on key visual elements, thereby establishing a structured report foundation with anatomy-based sentences. We also convert the detected anatomy into textual prompts conveying anatomical comprehension to the LLM. Additionally, the clinical context prompts guide the LLM to emphasize interactivity and clinical requirements. By integrating anatomy-focused sentences and anatomy/clinical prompts, the pre-trained LLM can generate structured chest X-ray reports tailored to prompted anatomical regions and clinical contexts. We evaluate using language generation and clinical effectiveness metrics, demonstrating strong performance.
Updated: 2024-04-17 09:45:43
标题: 即时引导生成结构化胸部X射线报告:使用预训练LLM
摘要: 医学报告生成自动化地从图像中提取放射学描述,减轻了医生的负担并最小化了错误。然而,当前的方法缺乏结构化输出和医生互动,无法产生清晰、临床相关的报告。我们的方法引入了一种通过提示引导的方法,使用预训练的大型语言模型(LLM)生成结构化的胸部X射线报告。首先,我们识别胸部X射线中的解剖区域,生成以关键视觉元素为中心的专注句子,从而建立以解剖学句子为基础的结构化报告基础。我们还将检测到的解剖结构转换为传达解剖理解的文本提示,提供给LLM。此外,临床背景提示引导LLM强调互动性和临床要求。通过整合以解剖为中心的句子和解剖/临床提示,预训练的LLM可以生成根据提示的解剖区域和临床背景定制的结构化胸部X射线报告。我们通过使用语言生成和临床有效性指标进行评估,展示了强大的性能。
更新时间: 2024-04-17 09:45:43
领域: cs.AI,cs.CV,cs.MM
CAGE: Causality-Aware Shapley Value for Global Explanations
As Artificial Intelligence (AI) is having more influence on our everyday lives, it becomes important that AI-based decisions are transparent and explainable. As a consequence, the field of eXplainable AI (or XAI) has become popular in recent years. One way to explain AI models is to elucidate the predictive importance of the input features for the AI model in general, also referred to as global explanations. Inspired by cooperative game theory, Shapley values offer a convenient way for quantifying the feature importance as explanations. However many methods based on Shapley values are built on the assumption of feature independence and often overlook causal relations of the features which could impact their importance for the ML model. Inspired by studies of explanations at the local level, we propose CAGE (Causally-Aware Shapley Values for Global Explanations). In particular, we introduce a novel sampling procedure for out-coalition features that respects the causal relations of the input features. We derive a practical approach that incorporates causal knowledge into global explanation and offers the possibility to interpret the predictive feature importance considering their causal relation. We evaluate our method on synthetic data and real-world data. The explanations from our approach suggest that they are not only more intuitive but also more faithful compared to previous global explanation methods.
Updated: 2024-04-17 09:43:54
标题: CAGE:因果感知夏普利值用于全局解释
摘要: 随着人工智能(AI)在我们日常生活中的影响越来越大,AI决策的透明和可解释性变得至关重要。因此,最近“可解释AI”(或XAI)领域变得流行起来。解释AI模型的一种方法是阐明输入特征对AI模型的预测重要性,也称为全局解释。受合作博弈理论启发,Shapley值提供了一种方便的方法来量化特征重要性作为解释。然而,许多基于Shapley值的方法建立在特征独立性的假设基础上,通常忽视了特征之间的因果关系可能影响它们对ML模型的重要性。受到局部级别解释研究的启发,我们提出了CAGE(考虑因果关系的Shapley值全局解释)。具体地,我们引入了一种新颖的采样程序,用于尊重输入特征的因果关系的外部联盟特征。我们提出了一个将因果知识纳入全局解释的实用方法,并提供了解释预测特征重要性并考虑其因果关系的可能性。我们在合成数据和真实世界数据上评估了我们的方法。我们的方法的解释表明,与先前的全局解释方法相比,它们不仅更直观,而且更可靠。
更新时间: 2024-04-17 09:43:54
领域: cs.AI
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Although Multimodal Large Language Models (MLLMs) have demonstrated promising versatile capabilities, their performance is still inferior to specialized models on downstream tasks, which makes adaptation necessary to enhance their utility. However, fine-tuning methods require independent training for every model, leading to huge computation and memory overheads. In this paper, we propose a novel setting where we aim to improve the performance of diverse MLLMs with a group of shared parameters optimized for a downstream task. To achieve this, we propose Transferable Visual Prompting (TVP), a simple and effective approach to generate visual prompts that can transfer to different models and improve their performance on downstream tasks after trained on only one model. We introduce two strategies to address the issue of cross-model feature corruption of existing visual prompting methods and enhance the transferability of the learned prompts, including 1) Feature Consistency Alignment: which imposes constraints to the prompted feature changes to maintain task-agnostic knowledge; 2) Task Semantics Enrichment: which encourages the prompted images to contain richer task-specific semantics with language guidance. We validate the effectiveness of TVP through extensive experiments with 6 modern MLLMs on a wide variety of tasks ranging from object recognition and counting to multimodal reasoning and hallucination correction.
Updated: 2024-04-17 09:39:07
标题: 探索多模态大型语言模型的视觉提示可迁移性
摘要: 尽管多模态大型语言模型(MLLMs)展现出了令人期待的多才多艺的能力,但它们在下游任务中的表现仍然不如专门的模型,这使得适应性成为提高它们实用性的必要条件。然而,微调方法需要为每个模型进行独立训练,导致巨大的计算和内存开销。在本文中,我们提出了一个新颖的设置,旨在通过一组为下游任务优化的共享参数来提高不同MLLMs的性能。为了实现这一目标,我们提出了可转移的视觉提示(TVP),这是一种简单而有效的方法,可以生成可以转移到不同模型并在仅训练一个模型后提高它们在下游任务中的性能的视觉提示。我们引入了两种策略来解决现有视觉提示方法的跨模型特征污染问题,并增强学习提示的可转移性,包括1)特征一致性对齐:对提示的特征变化施加约束,以保持任务不可知的知识;2)任务语义丰富化:鼓励提示的图像包含更丰富的具有语言指导的任务特定语义。我们通过广泛的实验验证了TVP的有效性,在各种任务上与6个现代MLLMs进行了实验,从对象识别和计数到多模态推理和幻觉纠正。
更新时间: 2024-04-17 09:39:07
领域: cs.CV,cs.AI,cs.LG
Zero-Shot Reinforcement Learning from Low Quality Data
Zero-shot reinforcement learning (RL) promises to provide agents that can perform any task in an environment after an offline, reward-free pre-training phase. Methods leveraging successor measures and successor features have shown strong performance in this setting, but require access to large heterogenous datasets for pre-training which cannot be expected for most real problems. Here, we explore how the performance of zero-shot RL methods degrades when trained on small homogeneous datasets, and propose fixes inspired by conservatism, a well-established feature of performant single-task offline RL algorithms. We evaluate our proposals across various datasets, domains and tasks, and show that conservative zero-shot RL algorithms outperform their non-conservative counterparts on low quality datasets, and perform no worse on high quality datasets. Somewhat surprisingly, our proposals also outperform baselines that get to see the task during training. Our code is available via https://enjeeneer.io/projects/zero-shot-rl/.
Updated: 2024-04-17 09:36:47
标题: 从低质量数据中的零样本强化学习
摘要: 零样本强化学习(RL)承诺在离线、无奖励预训练阶段后,提供可以在环境中执行任何任务的代理。利用继承者度量和继承者特征的方法在这种情况下表现出色,但需要访问大型异构数据集进行预训练,这在大多数实际问题中是无法预期的。在这里,我们探讨了当在小型同质数据集上训练时,零样本RL方法的性能如何下降,并提出了受到保守主义启发的修复方法,这是性能良好的单任务离线RL算法的一个成熟特征。我们在各种数据集、领域和任务上评估了我们的提议,并展示了保守的零样本RL算法在低质量数据集上优于非保守的对应物,在高质量数据集上表现不差。令人惊讶的是,我们的提议还优于在训练中看到任务的基线。我们的代码可通过https://enjeeneer.io/projects/zero-shot-rl/获得。
更新时间: 2024-04-17 09:36:47
领域: cs.LG,cs.AI
T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation
Recent methods in text-to-3D leverage powerful pretrained diffusion models to optimize NeRF. Notably, these methods are able to produce high-quality 3D scenes without training on 3D data. Due to the open-ended nature of the task, most studies evaluate their results with subjective case studies and user experiments, thereby presenting a challenge in quantitatively addressing the question: How has current progress in Text-to-3D gone so far? In this paper, we introduce T$^3$Bench, the first comprehensive text-to-3D benchmark containing diverse text prompts of three increasing complexity levels that are specially designed for 3D generation. To assess both the subjective quality and the text alignment, we propose two automatic metrics based on multi-view images produced by the 3D contents. The quality metric combines multi-view text-image scores and regional convolution to detect quality and view inconsistency. The alignment metric uses multi-view captioning and GPT-4 evaluation to measure text-3D consistency. Both metrics closely correlate with different dimensions of human judgments, providing a paradigm for efficiently evaluating text-to-3D models. The benchmarking results, shown in Fig. 1, reveal performance differences among an extensive 10 prevalent text-to-3D methods. Our analysis further highlights the common struggles for current methods on generating surroundings and multi-object scenes, as well as the bottleneck of leveraging 2D guidance for 3D generation. Our project page is available at: https://t3bench.com.
Updated: 2024-04-17 09:09:17
标题: T$^3$Bench:文本到3D生成领域的当前进展基准测试
摘要: 最近的文本到3D方法利用强大的预训练扩散模型来优化NeRF。值得注意的是,这些方法能够在不训练3D数据的情况下生成高质量的3D场景。由于任务的开放性质,大多数研究通过主观案例研究和用户实验来评估他们的结果,从而在定量上解决一个问题:目前文本到3D的进展如何?在本文中,我们介绍了T$^3$Bench,这是第一个包含三种不同复杂性水平的文本到3D综合基准,专门设计用于3D生成。为了评估主观质量和文本对齐,我们提出了两个基于3D内容产生的多视图图像的自动度量标准。质量度量结合了多视图文本-图像分数和区域卷积,以检测质量和视图不一致性。对齐度量使用多视图字幕和GPT-4评估来衡量文本-3D一致性。这两个度量与人类判断的不同维度密切相关,为有效评估文本到3D模型提供了一个范例。图1中显示的基准结果揭示了广泛的10种主流文本到3D方法之间的性能差异。我们的分析进一步突显了当前方法在生成环境和多物体场景方面的共同困难,以及在3D生成中利用2D引导的瓶颈。我们的项目页面网址为:https://t3bench.com。
更新时间: 2024-04-17 09:09:17
领域: cs.CV,cs.CL,cs.LG
The Writing is on the Wall: Analyzing the Boom of Inscriptions and its Impact on Rollup Performance and Cost Efficiency
Late 2023 witnessed significant user activity on EVM chains, resulting in a surge in transaction activity and putting many rollups into the first live test. While some rollups performed well, some others experienced downtime during this period, affecting transaction finality time and gas fees. To address the lack of empirical research on rollups, we perform the first study during a heightened activity during the late 2023 transaction boom, as attributed to inscriptions - a novel technique that enables NFT and ERC-20 token creation on Bitcoin and other blockchains. We observe that minting inscription-based meme tokens on zkSync Era allows for trading at a fraction of the costs, compared to the Bitcoin or Ethereum networks. We also found that the increased transaction activity, over 99% attributed to the minting of new inscription tokens, positively affected other users of zkSync Era, resulting in lowered gas fees. Unlike L1 blockchains, ZK rollups may experience lower gas fees with increased transaction volume. Lastly, the introduction of blobs - a form of temporary data storage - decreased the gas costs of Ethereum rollups, but also raised a number of questions about the security of inscription-based tokens.
Updated: 2024-04-17 09:08:42
标题: 墙上的文字:分析题词繁荣及其对卷包性能和成本效率的影响
摘要: 2023年末,以太坊虚拟机链上的用户活动显著增加,导致交易活动激增,许多Rollups进入了首次实时测试阶段。虽然有些Rollups表现良好,但在此期间也有一些经历了停机时间,影响了交易最终确认时间和Gas费用。为了解决对Rollups缺乏经验性研究的问题,我们进行了首次研究,针对2023年末交易繁荣期间的高活动性,归因于铭文技术——一种在比特币和其他区块链上实现NFT和ERC-20代币创建的新技术。我们发现,在zkSync Era上铸造基于铭文的迷因代币可以以较低成本进行交易,相对于比特币或以太坊网络。我们还发现,超过99%的交易活动归因于铸造新的铭文代币,积极影响了zkSync Era的其他用户,降低了Gas费用。与L1区块链不同,ZK Rollups在交易量增加时可能会降低Gas费用。最后,引入了Blob——一种临时数据存储形式,降低了以太坊Rollups的Gas成本,但也引发了一些关于基于铭文代币安全性的问题。
更新时间: 2024-04-17 09:08:42
领域: cs.CR
How Prevalent is Gender Bias in ChatGPT? -- Exploring German and English ChatGPT Responses
With the introduction of ChatGPT, OpenAI made large language models (LLM) accessible to users with limited IT expertise. However, users with no background in natural language processing (NLP) might lack a proper understanding of LLMs. Thus the awareness of their inherent limitations, and therefore will take the systems' output at face value. In this paper, we systematically analyse prompts and the generated responses to identify possible problematic issues with a special focus on gender biases, which users need to be aware of when processing the system's output. We explore how ChatGPT reacts in English and German if prompted to answer from a female, male, or neutral perspective. In an in-depth investigation, we examine selected prompts and analyse to what extent responses differ if the system is prompted several times in an identical way. On this basis, we show that ChatGPT is indeed useful for helping non-IT users draft texts for their daily work. However, it is absolutely crucial to thoroughly check the system's responses for biases as well as for syntactic and grammatical mistakes.
Updated: 2024-04-17 09:04:28
标题: ChatGPT中存在性别偏见有多普遍? - 探讨德语和英语ChatGPT的回复
摘要: 随着ChatGPT的推出,OpenAI使得大型语言模型(LLM)对于具有有限IT专业知识的用户变得更加容易使用。然而,没有自然语言处理(NLP)背景的用户可能缺乏对LLM的适当了解。因此,对LLM固有限制的认识不足,因此可能会将系统的输出视为事实。在本文中,我们系统地分析提示和生成的响应,以识别可能存在的问题,特别关注性别偏见,用户在处理系统输出时需要意识到这些问题。我们探讨了当ChatGPT被提示以女性、男性或中性角度回答时,在英语和德语中的反应。在深入调查中,我们检查了选定的提示,并分析了在相同方式多次提示系统时响应的差异程度。基于此,我们展示了ChatGPT确实有助于帮助非IT用户为他们的日常工作起草文本。然而,彻底检查系统的响应以排除偏见以及语法和语法错误是绝对关键的。
更新时间: 2024-04-17 09:04:28
领域: cs.CL,cs.AI,cs.CY,cs.LG
KI-GAN: Knowledge-Informed Generative Adversarial Networks for Enhanced Multi-Vehicle Trajectory Forecasting at Signalized Intersections
Reliable prediction of vehicle trajectories at signalized intersections is crucial to urban traffic management and autonomous driving systems. However, it presents unique challenges, due to the complex roadway layout at intersections, involvement of traffic signal controls, and interactions among different types of road users. To address these issues, we present in this paper a novel model called Knowledge-Informed Generative Adversarial Network (KI-GAN), which integrates both traffic signal information and multi-vehicle interactions to predict vehicle trajectories accurately. Additionally, we propose a specialized attention pooling method that accounts for vehicle orientation and proximity at intersections. Based on the SinD dataset, our KI-GAN model is able to achieve an Average Displacement Error (ADE) of 0.05 and a Final Displacement Error (FDE) of 0.12 for a 6-second observation and 6-second prediction cycle. When the prediction window is extended to 9 seconds, the ADE and FDE values are further reduced to 0.11 and 0.26, respectively. These results demonstrate the effectiveness of the proposed KI-GAN model in vehicle trajectory prediction under complex scenarios at signalized intersections, which represents a significant advancement in the target field.
Updated: 2024-04-17 08:53:59
标题: KI-GAN:基于知识的生成对抗网络,用于增强信号化交叉口多车辆轨迹预测
摘要: 在信号化交叉口可靠地预测车辆轨迹对城市交通管理和自动驾驶系统至关重要。然而,由于交叉口复杂的道路布局、交通信号控制的参与以及不同类型道路使用者之间的互动,这一问题存在独特的挑战。为解决这些问题,在本文中我们提出了一种新颖的模型,名为知识引导生成对抗网络(KI-GAN),该模型整合了交通信号信息和多车辆互动,能够准确预测车辆轨迹。此外,我们提出了一种专门的注意力池化方法,考虑了车辆在交叉口的方向和距离。基于SinD数据集,我们的KI-GAN模型能够在6秒观测和6秒预测周期内实现平均位移误差(ADE)为0.05和最终位移误差(FDE)为0.12。当预测窗口延长至9秒时,ADE和FDE值分别进一步降低至0.11和0.26。这些结果展示了所提出的KI-GAN模型在信号化交叉口复杂场景下的车辆轨迹预测中的有效性,这代表了目标领域的重大进步。
更新时间: 2024-04-17 08:53:59
领域: cs.LG,cs.AI,cs.RO
Boolean proportions
The author has recently introduced an abstract algebraic framework of analogical proportions within the general setting of universal algebra. This paper studies analogical proportions in the boolean domain consisting of two elements 0 and 1 within his framework. It turns out that our notion of boolean proportions coincides with two prominent models from the literature in different settings. This means that we can capture two separate modellings of boolean proportions within a single framework which is mathematically appealing and provides further evidence for the robustness and applicability of the general framework.
Updated: 2024-04-17 08:47:32
标题: 布尔比例
摘要: 作者最近在通用代数的框架中引入了一个抽象代数的模型,用于研究类比比例。本文研究了在作者框架内由两个元素0和1组成的布尔域中的类比比例。结果表明我们的布尔比例概念与文献中不同设置中的两个著名模型相一致。这意味着我们可以在一个单一的数学吸引力框架内捕捉布尔比例的两个独立建模,并为通用框架的稳健性和适用性提供进一步的证据。
更新时间: 2024-04-17 08:47:32
领域: cs.AI,cs.DM,cs.LO
Deep Neural Networks via Complex Network Theory: a Perspective
Deep Neural Networks (DNNs) can be represented as graphs whose links and vertices iteratively process data and solve tasks sub-optimally. Complex Network Theory (CNT), merging statistical physics with graph theory, provides a method for interpreting neural networks by analysing their weights and neuron structures. However, classic works adapt CNT metrics that only permit a topological analysis as they do not account for the effect of the input data. In addition, CNT metrics have been applied to a limited range of architectures, mainly including Fully Connected neural networks. In this work, we extend the existing CNT metrics with measures that sample from the DNNs' training distribution, shifting from a purely topological analysis to one that connects with the interpretability of deep learning. For the novel metrics, in addition to the existing ones, we provide a mathematical formalisation for Fully Connected, AutoEncoder, Convolutional and Recurrent neural networks, of which we vary the activation functions and the number of hidden layers. We show that these metrics differentiate DNNs based on the architecture, the number of hidden layers, and the activation function. Our contribution provides a method rooted in physics for interpreting DNNs that offers insights beyond the traditional input-output relationship and the CNT topological analysis.
Updated: 2024-04-17 08:42:42
标题: 利用复杂网络理论构建深度神经网络:一个观点
摘要: 深度神经网络(DNNs)可以被表示为图,其链接和顶点迭代地处理数据并以次优方式解决任务。复杂网络理论(CNT),将统计物理学与图论相结合,提供了一种通过分析神经网络的权重和神经元结构来解释神经网络的方法。然而,经典作品使用了只允许拓扑分析的CNT度量,因为它们没有考虑输入数据的影响。另外,CNT度量已被应用于有限范围的结构,主要包括全连接神经网络。在这项工作中,我们通过从DNNs的训练分布中抽样来扩展现有的CNT度量,从纯拓扑分析转变为与深度学习可解释性相关的分析。对于新颖的度量,除了现有的度量外,我们为全连接、自编码器、卷积和循环神经网络提供了数学形式化,其中我们变化激活函数和隐藏层的数量。我们展示了这些度量根据架构、隐藏层的数量和激活函数区分DNNs的能力。我们的贡献提供了一种根植于物理学的解释DNNs的方法,提供了超越传统输入-输出关系和CNT拓扑分析的见解。
更新时间: 2024-04-17 08:42:42
领域: cs.LG,cs.AI
Personalized Heart Disease Detection via ECG Digital Twin Generation
Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ digital twins to simulate symptoms of diseases in real patients. In this paper, we present an innovative prospective learning approach for personalized heart disease detection, which generates digital twins of healthy individuals' anomalous ECGs and enhances the model sensitivity to the personalized symptoms. In our approach, a vector quantized feature separator is proposed to locate and isolate the disease symptom and normal segments in ECG signals with ECG report guidance. Thus, the ECG digital twins can simulate specific heart diseases used to train a personalized heart disease detection model. Experiments demonstrate that our approach not only excels in generating high-fidelity ECG signals but also improves personalized heart disease detection. Moreover, our approach ensures robust privacy protection, safeguarding patient data in model development.
Updated: 2024-04-17 08:40:54
标题: 个性化心脏疾病检测通过心电图数字孪生生成
摘要: 心脏疾病居全球死亡原因之首,突显早期诊断和干预的重要性。大多数传统的基于心电图(ECG)的自动诊断方法是在人群水平上进行训练,忽视了个性化ECG的定制化,以增强个体健康管理。解决这一限制的潜在解决方案是利用数字孪生体在真实患者身上模拟疾病症状。在本文中,我们提出了一种创新的前瞻性学习方法,用于个性化心脏疾病检测,该方法生成健康个体异常ECG的数字孪生体,并增强了模型对个性化症状的敏感性。在我们的方法中,提出了一种矢量量化特征分离器,用于在ECG信号中定位和隔离疾病症状和正常段,辅以ECG报告指导。因此,ECG数字孪生体可以模拟用于训练个性化心脏疾病检测模型的特定心脏疾病。实验证明,我们的方法不仅在生成高保真度的ECG信号方面表现出色,还改善了个性化心脏疾病检测。此外,我们的方法确保了强大的隐私保护,保护了模型开发中的患者数据。
更新时间: 2024-04-17 08:40:54
领域: cs.LG,cs.AI,eess.SP
KDAS: Knowledge Distillation via Attention Supervision Framework for Polyp Segmentation
Polyp segmentation, a contentious issue in medical imaging, has seen numerous proposed methods aimed at improving the quality of segmented masks. While current state-of-the-art techniques yield impressive results, the size and computational cost of these models create challenges for practical industry applications. To address this challenge, we present KDAS, a Knowledge Distillation framework that incorporates attention supervision, and our proposed Symmetrical Guiding Module. This framework is designed to facilitate a compact student model with fewer parameters, allowing it to learn the strengths of the teacher model and mitigate the inconsistency between teacher features and student features, a common challenge in Knowledge Distillation, via the Symmetrical Guiding Module. Through extensive experiments, our compact models demonstrate their strength by achieving competitive results with state-of-the-art methods, offering a promising approach to creating compact models with high accuracy for polyp segmentation and in the medical imaging field. The implementation is available on https://github.com/huyquoctrinh/KDAS.
Updated: 2024-04-17 08:38:54
标题: KDAS:通过注意力监督框架进行知识蒸馏,用于息肉分割
摘要: 息息相关的医学成像领域中,息息相关的问题,已经看到了众多旨在改善分割掩模质量的方法。尽管当前最先进的技术取得了令人印象深刻的结果,但这些模型的大小和计算成本为实际行业应用带来挑战。为了解决这一挑战,我们提出了KDAS,这是一个结合了注意监督和我们提出的对称引导模块的知识蒸馏框架。该框架旨在促进一个具有更少参数的紧凑学生模型,使其能够学习教师模型的优势,并通过对称引导模块来减轻知识蒸馏中常见的教师特征与学生特征之间的不一致性挑战。通过大量实验,我们的紧凑模型展示了其强大之处,可以与最先进的方法取得竞争性结果,为息息相关和医学成像领域创建高精度的紧凑模型提供了一种有希望的方法。实现代码可在https://github.com/huyquoctrinh/KDAS 上找到。
更新时间: 2024-04-17 08:38:54
领域: eess.IV,cs.CV,cs.LG
Recommender Systems in the Era of Large Language Models (LLMs)
With the prosperity of e-commerce and web applications, Recommender Systems (RecSys) have become an important component of our daily life, providing personalized suggestions that cater to user preferences. While Deep Neural Networks (DNNs) have made significant advancements in enhancing recommender systems by modeling user-item interactions and incorporating textual side information, DNN-based methods still face limitations, such as difficulties in understanding users' interests and capturing textual side information, inabilities in generalizing to various recommendation scenarios and reasoning on their predictions, etc. Meanwhile, the emergence of Large Language Models (LLMs), such as ChatGPT and GPT4, has revolutionized the fields of Natural Language Processing (NLP) and Artificial Intelligence (AI), due to their remarkable abilities in fundamental responsibilities of language understanding and generation, as well as impressive generalization and reasoning capabilities. As a result, recent studies have attempted to harness the power of LLMs to enhance recommender systems. Given the rapid evolution of this research direction in recommender systems, there is a pressing need for a systematic overview that summarizes existing LLM-empowered recommender systems, to provide researchers in relevant fields with an in-depth understanding. Therefore, in this paper, we conduct a comprehensive review of LLM-empowered recommender systems from various aspects including Pre-training, Fine-tuning, and Prompting. More specifically, we first introduce representative methods to harness the power of LLMs (as a feature encoder) for learning representations of users and items. Then, we review recent techniques of LLMs for enhancing recommender systems from three paradigms, namely pre-training, fine-tuning, and prompting. Finally, we comprehensively discuss future directions in this emerging field.
Updated: 2024-04-17 08:36:26
标题: 大语言模型(LLMs)时代的推荐系统
摘要: 随着电子商务和网络应用的繁荣,推荐系统(RecSys)已经成为我们日常生活中重要的组成部分,提供个性化建议以满足用户偏好。虽然深度神经网络(DNNs)在增强推荐系统方面取得了显著进展,通过建模用户-物品交互和整合文本侧信息,但基于DNN的方法仍然面临一些限制,比如难以理解用户兴趣和捕捉文本侧信息、在各种推荐场景中无法泛化和推理预测等。与此同时,大型语言模型(LLMs),如ChatGPT和GPT4的出现,已经彻底改变了自然语言处理(NLP)和人工智能(AI)领域,由于它们在语言理解和生成的基本职责上具有杰出能力,以及令人印象深刻的泛化和推理能力。因此,最近的研究尝试利用LLMs的力量来增强推荐系统。鉴于推荐系统中这一研究方向的迅速发展,迫切需要一个系统性概述,总结现有的LLM增强推荐系统,以便为相关领域的研究人员提供深入了解。因此,在本文中,我们从预训练、微调和提示等各个方面对LLM增强推荐系统进行了全面审查。具体来说,我们首先介绍了利用LLMs(作为特征编码器)学习用户和物品表示的代表性方法。然后,我们回顾了最近LLMs技术在三种范式(预训练、微调和提示)中增强推荐系统的技术。最后,我们全面讨论了这一新兴领域的未来方向。
更新时间: 2024-04-17 08:36:26
领域: cs.IR,cs.AI,cs.CL
Hacking Task Confounder in Meta-Learning
Meta-learning enables rapid generalization to new tasks by learning knowledge from various tasks. It is intuitively assumed that as the training progresses, a model will acquire richer knowledge, leading to better generalization performance. However, our experiments reveal an unexpected result: there is negative knowledge transfer between tasks, affecting generalization performance. To explain this phenomenon, we conduct Structural Causal Models (SCMs) for causal analysis. Our investigation uncovers the presence of spurious correlations between task-specific causal factors and labels in meta-learning. Furthermore, the confounding factors differ across different batches. We refer to these confounding factors as ``Task Confounders". Based on these findings, we propose a plug-and-play Meta-learning Causal Representation Learner (MetaCRL) to eliminate task confounders. It encodes decoupled generating factors from multiple tasks and utilizes an invariant-based bi-level optimization mechanism to ensure their causality for meta-learning. Extensive experiments on various benchmark datasets demonstrate that our work achieves state-of-the-art (SOTA) performance.
Updated: 2024-04-17 08:36:14
标题: 在元学习中操纵任务混淆因素
摘要: 元学习通过从各种任务中学习知识,实现对新任务的快速泛化。直觉上认为随着训练的进行,模型将获得更丰富的知识,从而导致更好的泛化性能。然而,我们的实验揭示了一个意外的结果:任务之间存在负面的知识转移,影响泛化性能。为了解释这一现象,我们进行了因果分析的结构因果模型(SCMs)。我们的调查揭示了元学习中任务特定的因果因素与标签之间存在虚假相关性。此外,混淆因素在不同批次之间也不同。我们将这些混淆因素称为“任务混淆因素”。基于这些发现,我们提出了一个即插即用的元学习因果表示学习器(MetaCRL)来消除任务混淆因素。它将来自多个任务的解耦生成因素进行编码,并利用基于不变性的双层优化机制确保它们在元学习中的因果关系。对各种基准数据集的大量实验表明,我们的工作实现了最先进的性能。
更新时间: 2024-04-17 08:36:14
领域: cs.LG,stat.ML
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory
Transformer models have been successful in various sequence processing tasks, but the self-attention mechanism's computational cost limits its practicality for long sequences. Although there are existing attention variants that improve computational efficiency, they have a limited ability to abstract global information effectively based on their hand-crafted mixing strategies. On the other hand, state-space models (SSMs) are tailored for long sequences but cannot capture complicated local information. Therefore, the combination of them as a unified token mixer is a trend in recent long-sequence models. However, the linearized attention degrades performance significantly even when equipped with SSMs. To address the issue, we propose a new method called LongVQ. LongVQ uses the vector quantization (VQ) technique to compress the global abstraction as a length-fixed codebook, enabling the linear-time computation of the attention matrix. This technique effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues. Our experiments on the Long Range Arena benchmark, autoregressive language modeling, and image and speech classification demonstrate the effectiveness of LongVQ. Our model achieves significant improvements over other sequence models, including variants of Transformers, Convolutions, and recent State Space Models.
Updated: 2024-04-17 08:26:34
标题: LongVQ:基于结构化内存的向量量化进行长序列建模
摘要: Transformer模型在各种序列处理任务中取得了成功,但是自注意力机制的计算成本限制了其在长序列上的实用性。虽然存在着改进计算效率的现有注意力变种,但根据它们的手工混合策略,它们的全局信息提取能力有限。另一方面,状态空间模型(SSMs)专为长序列而设计,但无法捕捉复杂的局部信息。因此,将它们结合作为统一的令牌混合器是最近长序列模型的趋势。然而,即使配备了SSMs,线性化的注意力在性能上也会显著下降。为了解决这个问题,我们提出了一种名为LongVQ的新方法。LongVQ利用向量量化(VQ)技术将全局抽象压缩为一个长度固定的码本,从而实现了注意力矩阵的线性时间计算。这种技术有效地保持了动态的全局和局部模式,有助于补充长距离依赖问题的不足。我们在长距离竞技场基准测试、自回归语言建模以及图像和语音分类方面的实验表明,LongVQ的有效性。我们的模型在其他序列模型(包括Transformer、卷积和最近的状态空间模型的变体)上取得了显著的改进。
更新时间: 2024-04-17 08:26:34
领域: cs.LG
R2 Indicator and Deep Reinforcement Learning Enhanced Adaptive Multi-Objective Evolutionary Algorithm
Choosing an appropriate optimization algorithm is essential to achieving success in optimization challenges. Here we present a new evolutionary algorithm structure that utilizes a reinforcement learning-based agent aimed at addressing these issues. The agent employs a double deep q-network to choose a specific evolutionary operator based on feedback it receives from the environment during optimization. The algorithm's structure contains five single-objective evolutionary algorithm operators. This single-objective structure is transformed into a multi-objective one using the R2 indicator. This indicator serves two purposes within our structure: first, it renders the algorithm multi-objective, and second, provides a means to evaluate each algorithm's performance in each generation to facilitate constructing the reinforcement learning-based reward function. The proposed R2-reinforcement learning multi-objective evolutionary algorithm (R2-RLMOEA) is compared with six other multi-objective algorithms that are based on R2 indicators. These six algorithms include the operators used in R2-RLMOEA as well as an R2 indicator-based algorithm that randomly selects operators during optimization. We benchmark performance using the CEC09 functions, with performance measured by inverted generational distance and spacing. The R2-RLMOEA algorithm outperforms all other algorithms with strong statistical significance (p<0.001) when compared with the average spacing metric across all ten benchmarks.
Updated: 2024-04-17 08:23:23
标题: R2指标和深度强化学习增强的自适应多目标进化算法
摘要: 选择一个合适的优化算法对于在优化挑战中取得成功至关重要。在这里,我们提出了一种利用强化学习的智能体的新型进化算法结构,旨在解决这些问题。该智能体采用双深度Q网络,根据其在优化过程中从环境中接收到的反馈来选择特定的进化算子。该算法结构包含五个单目标进化算法算子。这个单目标结构通过R2指标转化为多目标结构。这个指标在我们的结构中发挥了两个作用:首先,它使算法变成多目标,其次,提供了一种评估每一代算法性能的方法,以便构建基于强化学习的奖励函数。所提出的R2-强化学习多目标进化算法(R2-RLMOEA)与基于R2指标的其他六种多目标算法进行了比较。这六种算法包括R2-RLMOEA中使用的算子,以及一种在优化过程中随机选择算子的基于R2指标的算法。我们使用CEC09函数作为基准性能,性能通过倒数的代际距离和间隔来衡量。与所有十个基准测试中的平均间隔度量相比,R2-RLMOEA算法表现出了明显的统计学意义上的优势(p <0.001)。
更新时间: 2024-04-17 08:23:23
领域: cs.NE,cs.AI
Pre-processing matters: A segment search method for WSI classification
Pre-processing for whole slide images can affect classification performance both in the training and inference stages. Our study analyzes the impact of pre-processing parameters on inference and training across single- and multiple-domain datasets. However, searching for an optimal parameter set is time-consuming. To overcome this, we propose a novel Similarity-based Simulated Annealing approach for fast parameter tuning to enhance inference performance on single-domain data. Our method demonstrates significant performance improvements in accuracy, which raise accuracy from 0.512 to 0.847 in a single domain. We further extend our insight into training performance in multi-domain data by employing a novel Bayesian optimization to search optimal pre-processing parameters, resulting in a high AUC of 0.967. We highlight that better pre-processing for WSI can contribute to further accuracy improvement in the histology area.
Updated: 2024-04-17 08:21:02
标题: 预处理很重要:一种用于WSI分类的段搜索方法
摘要: 全切片图像的预处理可以影响分类性能,无论是在训练阶段还是推断阶段。我们的研究分析了预处理参数对单一和多个领域数据集的推断和训练的影响。然而,寻找最佳参数集是耗时的。为了克服这一问题,我们提出了一种基于相似性的模拟退火方法,用于快速参数调整,以增强单一领域数据的推断性能。我们的方法在准确性方面表现出显著的改进,将准确性从0.512提高到0.847。我们进一步通过采用一种新颖的贝叶斯优化方法,在多领域数据中搜索最佳预处理参数,实现高达0.967的AUC。我们强调,更好的WSI预处理可以在组织学领域进一步提高准确性。
更新时间: 2024-04-17 08:21:02
领域: cs.CV,cs.LG
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation
Large Language Models (LLMs) have become the go-to solution for many Natural Language Processing (NLP) tasks due to their ability to tackle various problems and produce high-quality results. Specifically, they are increasingly used to automatically generate code, easing the burden on developers by handling repetitive tasks. However, this improvement in quality has led to high computational and memory demands, making LLMs inaccessible to users with limited resources. In this paper, we focus on Central Processing Unit (CPU)-compatible models and conduct a thorough semi-manual evaluation of their strengths and weaknesses in generating Python code. We enhance their performance by introducing a Chain-of-Thought prompt that guides the model in problem-solving. Additionally, we propose a dataset of 60 programming problems with varying difficulty levels for evaluation purposes. Our assessment also includes testing these models on two state-of-the-art datasets: HumanEval and EvalPlus. We commit to sharing our dataset and experimental results publicly to ensure transparency.
Updated: 2024-04-17 08:16:48
标题: 低成本语言模型:对Python代码生成的调查和性能评估
摘要: 大型语言模型(LLMs)已成为许多自然语言处理(NLP)任务的首选解决方案,因为它们能够解决各种问题并产生高质量的结果。具体而言,它们越来越多地被用于自动生成代码,减轻开发人员的负担,处理重复性任务。然而,质量的提高导致了高计算和内存需求,使得LLMs对资源有限的用户不可访问。在本文中,我们专注于与中央处理单元(CPU)兼容的模型,并对它们在生成Python代码方面的优势和劣势进行了彻底的半手动评估。通过引入一个“思维链”提示,我们提高了它们的性能,指导模型解决问题。此外,我们提出了一个包含60个不同难度级别的编程问题的数据集,用于评估目的。我们的评估还包括在两个最先进的数据集HumanEval和EvalPlus上测试这些模型。我们承诺公开分享我们的数据集和实验结果,以确保透明度。
更新时间: 2024-04-17 08:16:48
领域: cs.AI
Explainable Machine Learning System for Predicting Chronic Kidney Disease in High-Risk Cardiovascular Patients
As the global population ages, the incidence of Chronic Kidney Disease (CKD) is rising. CKD often remains asymptomatic until advanced stages, which significantly burdens both the healthcare system and patient quality of life. This research developed an explainable machine learning system for predicting CKD in patients with cardiovascular risks, utilizing medical history and laboratory data. The Random Forest model achieved the highest sensitivity of 88.2%. The study introduces a comprehensive explainability framework that extends beyond traditional feature importance methods, incorporating global and local interpretations, bias inspection, biomedical relevance, and safety assessments. Key predictive features identified in global interpretation were the use of diabetic and ACEI/ARB medications, and initial eGFR values. Local interpretation provided model insights through counterfactual explanations, which aligned with other system parts. After conducting a bias inspection, it was found that the initial eGFR values and CKD predictions exhibited some bias, but no significant gender bias was identified. The model's logic, extracted by scoped rules, was confirmed to align with existing medical literature. The safety assessment tested potentially dangerous cases and confirmed that the model behaved safely. This system enhances the explainability, reliability, and accountability of the model, promoting its potential integration into healthcare settings and compliance with upcoming regulatory standards, and showing promise for broader applications in healthcare machine learning.
Updated: 2024-04-17 07:59:33
标题: 可解释的机器学习系统用于预测高危心血管患者的慢性肾脏疾病
摘要: 随着全球人口老龄化,慢性肾脏疾病(CKD)的发病率正在上升。CKD在晚期阶段通常保持无症状,这显著加重了医疗系统和患者生活质量的负担。本研究开发了一个可解释的机器学习系统,用于预测具有心血管风险的患者中的CKD,利用医疗史和实验室数据。随机森林模型实现了最高灵敏度88.2%。该研究介绍了一个全面的解释性框架,超越传统的特征重要性方法,包括全局和局部解释、偏见检查、生物医学相关性和安全评估。在全局解释中识别出的关键预测特征是使用糖尿病和ACEI/ARB药物,以及初始eGFR值。局部解释通过反事实解释提供了模型见解,与其他系统部分一致。进行偏见检查后发现,初始eGFR值和CKD预测表现出一些偏见,但未发现明显的性别偏见。通过范围规则提取的模型逻辑已确认与现有医学文献一致。安全评估测试了潜在危险案例,并确认模型表现安全。该系统增强了模型的可解释性、可靠性和责任性,促进其潜在整合到医疗环境中,并符合即将出台的监管标准,展示了在医疗机器学习中更广泛应用的潜力。
更新时间: 2024-04-17 07:59:33
领域: cs.AI,cs.LG,J.3; I.2.6
Comprehensive Taxonomies of Nature- and Bio-inspired Optimization: Inspiration versus Algorithmic Behavior, Critical Analysis and Recommendations (from 2020 to 2024)
In recent years, bio-inspired optimization methods, which mimic biological processes to solve complex problems, have gained popularity in recent literature. The proliferation of proposals prove the growing interest in this field. The increase in nature- and bio-inspired algorithms, applications, and guidelines highlights growing interest in this field. However, the exponential rise in the number of bio-inspired algorithms poses a challenge to the future trajectory of this research domain. Along the five versions of this document, the number of approaches grows incessantly, and where having a new biological description takes precedence over real problem-solving. This document presents two comprehensive taxonomies. One based on principles of biological similarity, and the other one based on operational aspects associated with the iteration of population models that initially have a biological inspiration. Therefore, these taxonomies enable researchers to categorize existing algorithmic developments into well-defined classes, considering two criteria: the source of inspiration, and the behavior exhibited by each algorithm. Using these taxonomies, we classify 518 algorithms based on nature-inspired and bio-inspired principles. Each algorithm within these categories is thoroughly examined, allowing for a critical synthesis of design trends and similarities, and identifying the most analogous classical algorithm for each proposal. From our analysis, we conclude that a poor relationship is often found between the natural inspiration of an algorithm and its behavior. Furthermore, similarities in terms of behavior between different algorithms are greater than what is claimed in their public disclosure: specifically, we show that more than one-fourth of the reviewed solvers are versions of classical algorithms. The conclusions from the analysis of the algorithms lead to several learned lessons.
Updated: 2024-04-17 07:59:26
标题: 全面的自然和生物启发优化的分类法:灵感与算法行为,关键分析和建议(从2020年到2024年)
摘要: 近年来,生物启发优化方法在最近的文献中变得越来越受欢迎,这些方法模仿生物过程来解决复杂问题。提出的方案的增加证明了对这一领域日益增长的兴趣。自然和生物启发算法、应用和指南的增加突显了对这一领域的日益增长的兴趣。然而,生物启发算法数量的指数增长对这一研究领域未来的发展轨迹构成了挑战。在这份文件的五个版本中,方法的数量不断增长,新的生物描述优先于真正的问题解决。该文件提出了两个全面的分类法。一个基于生物相似性原则,另一个基于最初具有生物启发的种群模型迭代相关操作方面。因此,这些分类法使研究人员能够将现有的算法发展归类为明确定义的类别,考虑两个标准:灵感来源和每个算法展示的行为。使用这些分类法,我们将518种基于自然启发和生物启发原则的算法进行分类。对这些类别中的每种算法进行了彻底审查,允许对设计趋势和相似之处进行批判性综合,并确定每个提案的最相似的经典算法。从我们的分析中,我们得出结论,算法的自然启发与其行为之间通常存在较差的关系。此外,不同算法之间在行为方面的相似性大于其公开披露中所声称的:具体而言,我们展示了超过四分之一的审查求解器是经典算法的版本。从对算法的分析中得出的结论带来了几个学到的教训。
更新时间: 2024-04-17 07:59:26
领域: cs.AI,I.2.8
Finding Decision Tree Splits in Streaming and Massively Parallel Models
In this work, we provide data stream algorithms that compute optimal splits in decision tree learning. In particular, given a data stream of observations $x_i$ and their labels $y_i$, the goal is to find the optimal split point $j$ that divides the data into two sets such that the mean squared error (for regression) or misclassification rate (for classification) is minimized. We provide various fast streaming algorithms that use sublinear space and a small number of passes for these problems. These algorithms can also be extended to the massively parallel computation model. Our work, while not directly comparable, complements the seminal work of Domingos and Hulten (KDD 2000).
Updated: 2024-04-17 07:57:44
标题: 在流式和大规模并行模型中寻找决策树分裂
摘要: 在这项工作中,我们提供了一些数据流算法,用于计算决策树学习中的最佳分裂。具体来说,给定一组观测数据流$x_i$及其标签$y_i$,目标是找到将数据分成两组的最佳分裂点$j$,使得均方误差(回归问题)或误分类率(分类问题)最小化。我们提供了各种快速数据流算法,这些算法使用亚线性空间和少量遍历次数来解决这些问题。这些算法还可以扩展到大规模并行计算模型中。我们的工作虽然不能直接比较,但与Domingos和Hulten(KDD 2000)的开创性工作相辅相成。
更新时间: 2024-04-17 07:57:44
领域: cs.DS,cs.AI,cs.LG
Locality Sensitive Sparse Encoding for Learning World Models Online
Acquiring an accurate world model online for model-based reinforcement learning (MBRL) is challenging due to data nonstationarity, which typically causes catastrophic forgetting for neural networks (NNs). From the online learning perspective, a Follow-The-Leader (FTL) world model is desirable, which optimally fits all previous experiences at each round. Unfortunately, NN-based models need re-training on all accumulated data at every interaction step to achieve FTL, which is computationally expensive for lifelong agents. In this paper, we revisit models that can achieve FTL with incremental updates. Specifically, our world model is a linear regression model supported by nonlinear random features. The linear part ensures efficient FTL update while the nonlinear random feature empowers the fitting of complex environments. To best trade off model capacity and computation efficiency, we introduce a locality sensitive sparse encoding, which allows us to conduct efficient sparse updates even with very high dimensional nonlinear features. We validate the representation power of our encoding and verify that it allows efficient online learning under data covariate shift. We also show, in the Dyna MBRL setting, that our world models learned online using a single pass of trajectory data either surpass or match the performance of deep world models trained with replay and other continual learning methods.
Updated: 2024-04-17 07:54:45
标题: 局部敏感稀疏编码用于在线学习世界模型
摘要: 在线为基于模型的强化学习(MBRL)获取准确的世界模型具有挑战性,这是由于数据的非稳态性,这通常会导致神经网络(NN)的灾难性遗忘。从在线学习的角度来看,一种跟随领导者(FTL)世界模型是理想的,它在每一轮中都最佳地适应所有先前的经验。不幸的是,基于NN的模型需要在每次交互步骤上重新训练所有累积的数据以实现FTL,这对于终身代理来说在计算上是昂贵的。在本文中,我们重新审视了可以通过增量更新实现FTL的模型。具体而言,我们的世界模型是一个由非线性随机特征支持的线性回归模型。线性部分确保了有效的FTL更新,而非线性随机特征则增强了对复杂环境的拟合能力。为了在模型容量和计算效率之间取得最佳平衡,我们引入了一种局部敏感的稀疏编码,这使我们能够即使使用非常高维度的非线性特征也能进行高效的稀疏更新。我们验证了我们编码的表征能力,并验证了它允许在数据协变量转移下进行高效的在线学习。在Dyna MBRL设置中,我们还展示了,使用单次轨迹数据在线学习的世界模型要么超过要么与使用重播和其他持续学习方法训练的深度世界模型的性能相匹配。
更新时间: 2024-04-17 07:54:45
领域: cs.LG,cs.AI
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
Few-shot knowledge distillation recently emerged as a viable approach to harness the knowledge of large-scale pre-trained models, using limited data and computational resources. In this paper, we propose a novel few-shot feature distillation approach for vision transformers. Our approach is based on two key steps. Leveraging the fact that vision transformers have a consistent depth-wise structure, we first copy the weights from intermittent layers of existing pre-trained vision transformers (teachers) into shallower architectures (students), where the intermittence factor controls the complexity of the student transformer with respect to its teacher. Next, we employ an enhanced version of Low-Rank Adaptation (LoRA) to distill knowledge into the student in a few-shot scenario, aiming to recover the information processing carried out by the skipped teacher layers. We present comprehensive experiments with supervised and self-supervised transformers as teachers, on five data sets from various domains, including natural, medical and satellite images. The empirical results confirm the superiority of our approach over competitive baselines. Moreover, the ablation results demonstrate the usefulness of each component of the proposed pipeline.
Updated: 2024-04-17 07:46:28
标题: 视觉变换器的少样本蒸馏中的权重复制和低秩调整
摘要: 最近,少样本知识蒸馏作为一种可行的方法,利用有限的数据和计算资源来利用大规模预训练模型的知识。在本文中,我们提出了一种新颖的少样本特征蒸馏方法,用于视觉Transformer。我们的方法基于两个关键步骤。首先利用视觉Transformer具有一致的深度结构这一事实,我们将现有预训练的视觉Transformer(教师)的间歇层的权重复制到更浅的架构(学生)中,其中间歇因子控制学生Transformer相对于其教师的复杂性。接下来,我们使用增强版本的低秩适应(LoRA)来在少样本场景中将知识蒸馏到学生中,旨在恢复被跳过的教师层进行的信息处理。我们在包括自然、医学和卫星图像在内的各个领域的五个数据集上进行了全面实验,使用监督和自监督Transformer作为教师。实证结果证实了我们的方法优于竞争基线的优越性。此外,消融实验结果证明了建议管道的每个组件的有用性。
更新时间: 2024-04-17 07:46:28
领域: cs.CV,cs.AI,cs.LG
Self-adaptive PSRO: Towards an Automatic Population-based Game Solver
Policy-Space Response Oracles (PSRO) as a general algorithmic framework has achieved state-of-the-art performance in learning equilibrium policies of two-player zero-sum games. However, the hand-crafted hyperparameter value selection in most of the existing works requires extensive domain knowledge, forming the main barrier to applying PSRO to different games. In this work, we make the first attempt to investigate the possibility of self-adaptively determining the optimal hyperparameter values in the PSRO framework. Our contributions are three-fold: (1) Using several hyperparameters, we propose a parametric PSRO that unifies the gradient descent ascent (GDA) and different PSRO variants. (2) We propose the self-adaptive PSRO (SPSRO) by casting the hyperparameter value selection of the parametric PSRO as a hyperparameter optimization (HPO) problem where our objective is to learn an HPO policy that can self-adaptively determine the optimal hyperparameter values during the running of the parametric PSRO. (3) To overcome the poor performance of online HPO methods, we propose a novel offline HPO approach to optimize the HPO policy based on the Transformer architecture. Experiments on various two-player zero-sum games demonstrate the superiority of SPSRO over different baselines.
Updated: 2024-04-17 07:40:57
标题: 自适应PSRO:朝向自动基于人口的游戏解决器
摘要: 政策空间响应神谕(PSRO)作为一种通用的算法框架,在学习两人零和游戏的均衡策略方面取得了最先进的性能。然而,在大多数现有作品中手工选择超参数值需要广泛的领域知识,这构成了将PSRO应用于不同游戏的主要障碍。在这项工作中,我们首次尝试调查在PSRO框架中自适应确定最佳超参数值的可能性。我们的贡献有三点:(1)使用几个超参数,我们提出了一个参数化PSRO,统一了梯度下降上升(GDA)和不同的PSRO变体。 (2)我们提出了自适应PSRO(SPSRO),将参数化PSRO的超参数值选择作为一个超参数优化(HPO)问题,我们的目标是学习一个HPO策略,它可以在运行参数化PSRO时自适应地确定最佳超参数值。 (3)为了克服在线HPO方法的性能不佳,我们提出了一种基于Transformer架构的新颖离线HPO方法来优化HPO策略。对各种两人零和游戏的实验表明,SPSRO相对于不同基线的优越性。
更新时间: 2024-04-17 07:40:57
领域: cs.AI,cs.GT,cs.MA
Manifold Gaussian Variational Bayes on the Precision Matrix
We propose an optimization algorithm for Variational Inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for Gaussian Variational Inference whose updates satisfy the positive definite constraint on the variational covariance matrix. Our Manifold Gaussian Variational Bayes on the Precision matrix (MGVBP) solution provides simple update rules, is straightforward to implement, and the use of the precision matrix parametrization has a significant computational advantage. Due to its black-box nature, MGVBP stands as a ready-to-use solution for VI in complex models. Over five datasets, we empirically validate our feasible approach on different statistical and econometric models, discussing its performance with respect to baseline methods.
Updated: 2024-04-17 07:38:55
标题: 精度矩阵上的多重高斯变分贝叶斯
摘要: 我们提出了一种针对复杂模型中变分推断(VI)的优化算法。我们的方法依赖于自然梯度更新,其中变分空间是一个黎曼流形。我们为高斯变分推断开发了一个高效的算法,其更新满足变分协方差矩阵上的正定约束。我们的Manifold Gaussian Variational Bayes on the Precision matrix(MGVBP)解决方案提供了简单的更新规则,易于实现,并且利用精度矩阵参数化具有显著的计算优势。由于其黑盒特性,MGVBP是一个适用于复杂模型中的VI的即插即用解决方案。在五个数据集上,我们在不同的统计和计量模型上进行了实证验证,讨论了其相对于基线方法的性能。
更新时间: 2024-04-17 07:38:55
领域: stat.ML,cs.LG
Adaptive Lasso, Transfer Lasso, and Beyond: An Asymptotic Perspective
This paper presents a comprehensive exploration of the theoretical properties inherent in the Adaptive Lasso and the Transfer Lasso. The Adaptive Lasso, a well-established method, employs regularization divided by initial estimators and is characterized by asymptotic normality and variable selection consistency. In contrast, the recently proposed Transfer Lasso employs regularization subtracted by initial estimators with the demonstrated capacity to curtail non-asymptotic estimation errors. A pivotal question thus emerges: Given the distinct ways the Adaptive Lasso and the Transfer Lasso employ initial estimators, what benefits or drawbacks does this disparity confer upon each method? This paper conducts a theoretical examination of the asymptotic properties of the Transfer Lasso, thereby elucidating its differentiation from the Adaptive Lasso. Informed by the findings of this analysis, we introduce a novel method, one that amalgamates the strengths and compensates for the weaknesses of both methods. The paper concludes with validations of our theory and comparisons of the methods via simulation experiments.
Updated: 2024-04-17 07:31:57
标题: 自适应Lasso,转移Lasso,及更多:一个渐近视角
摘要: 本文全面探讨了自适应 Lasso 和转移 Lasso 中固有的理论性质。自适应 Lasso 是一种成熟的方法,采用通过初步估计器进行正则化,并具有渐近正态性和变量选择一致性的特征。相比之下,最近提出的转移 Lasso 则采用通过初步估计器减去正则化,具有减少非渐近估计误差的能力。因此,一个关键问题出现了:鉴于自适应 Lasso 和转移 Lasso 使用初步估计器的不同方式,这种差异给每种方法带来了什么优势或劣势?本文通过对转移 Lasso 的渐近性质进行理论考察,阐明了其与自适应 Lasso 的区别。根据分析结果,我们介绍了一种新方法,既融合了两种方法的优势,又弥补了它们的弱点。最后,本文通过模拟实验验证了我们的理论并比较了这两种方法。
更新时间: 2024-04-17 07:31:57
领域: stat.ML,cs.LG,math.ST,stat.ME,stat.TH
On Diversified Preferences of Large Language Model Alignment
Aligning large language models (LLMs) with human preferences has been recognized as the key to improving LLMs' interaction quality. However, in this pluralistic world, human preferences can be diversified due to annotators' different tastes, which hinders the effectiveness of LLM alignment methods. This paper presents the first quantitative analysis of commonly used human feedback datasets to investigate the impact of diversified preferences on reward modeling. Our analysis reveals a correlation between the calibration performance of reward models (RMs) and the alignment performance of LLMs. We find that diversified preference data negatively affect the calibration performance of RMs on human-shared preferences, such as Harmless\&Helpful, thereby impairing the alignment performance of LLMs. To address the ineffectiveness, we propose a novel Multi-Objective Reward learning method (MORE) to enhance the calibration performance of RMs on shared preferences. We validate our findings by experiments on three models and five human preference datasets. Our method significantly improves the prediction calibration of RMs, leading to better alignment of the Alpaca-7B model with Harmless\&Helpful preferences. Furthermore, the connection between reward calibration and preference alignment performance suggests that calibration error can be adopted as a key metric for evaluating RMs. The open-source code and data are available at https://github.com/dunzeng/MORE.
Updated: 2024-04-17 07:28:00
标题: 大型语言模型对齐的多样化偏好
摘要: 将大型语言模型(LLMs)与人类偏好对齐被认为是提高LLMs互动质量的关键。然而,在这个多元化的世界中,由于标注者的不同口味,人类偏好可能会变得多样化,这阻碍了LLM对齐方法的有效性。本文首次对常用的人类反馈数据集进行定量分析,以研究多样化偏好对奖励建模的影响。我们的分析揭示了奖励模型(RMs)的校准性能与LLMs对齐性能之间的相关性。我们发现,多样化偏好数据对RMs在人类共享偏好(如无害和有益)上的校准性能产生负面影响,从而损害了LLMs的对齐性能。为了解决这种无效性,我们提出了一种新颖的多目标奖励学习方法(MORE),以增强RMs在共享偏好上的校准性能。我们通过对三个模型和五个人类偏好数据集进行实验证实了我们的发现。我们的方法显著提高了RMs的预测校准性能,从而实现了Alpaca-7B模型与无害和有益偏好的更好对齐。此外,奖励校准与偏好对齐性能之间的关联表明,校准误差可以作为评估RMs的关键指标。开源代码和数据可在https://github.com/dunzeng/MORE 上获得。
更新时间: 2024-04-17 07:28:00
领域: cs.AI
Learning epidemic trajectories through Kernel Operator Learning: from modelling to optimal control
Since infectious pathogens start spreading into a susceptible population, mathematical models can provide policy makers with reliable forecasts and scenario analyses, which can be concretely implemented or solely consulted. In these complex epidemiological scenarios, machine learning architectures can play an important role, since they directly reconstruct data-driven models circumventing the specific modelling choices and the parameter calibration, typical of classical compartmental models. In this work, we discuss the efficacy of Kernel Operator Learning (KOL) to reconstruct population dynamics during epidemic outbreaks, where the transmission rate is ruled by an input strategy. In particular, we introduce two surrogate models, named KOL-m and KOL-$\partial$, which reconstruct in two different ways the evolution of the epidemics. Moreover, we evaluate the generalization performances of the two approaches with different kernels, including the Neural Tangent Kernels, and compare them with a classical neural network model learning method. Employing synthetic but semi-realistic data, we show how the two introduced approaches are suitable for realizing fast and robust forecasts and scenario analyses, and how these approaches are competitive for determining optimal intervention strategies with respect to specific performance measures.
Updated: 2024-04-17 07:21:17
标题: 通过核操作学习学习流行病轨迹:从建模到最优控制
摘要: 由于传染病病原体开始传播到易感人群中,数学模型可以为决策者提供可靠的预测和情景分析,这些预测和分析可以具体实施或仅供参考。在这些复杂的流行病学情景中,机器学习架构可以发挥重要作用,因为它们直接重建基于数据的模型,避免了传统隔室模型所需的特定建模选择和参数校准。在这项工作中,我们讨论了核算子学习(KOL)在重建流行病暴发期间人口动态方面的功效,其中传播率由输入策略控制。特别是,我们引入了两个替代模型,命名为KOL-m和KOL-$\partial$,它们以两种不同的方式重建了流行病的演变。此外,我们评估了两种方法在不同核函数下的泛化性能,包括神经切向核,并将它们与经典神经网络模型学习方法进行比较。通过使用合成但半真实的数据,我们展示了这两种方法如何适用于实现快速和稳健的预测和情景分析,以及这些方法如何在确定针对特定性能指标的最佳干预策略方面具有竞争力。
更新时间: 2024-04-17 07:21:17
领域: math.NA,cs.LG,cs.NA,math.OC,q-bio.PE
What's under the hood: Investigating Automatic Metrics on Meeting Summarization
Meeting summarization has become a critical task considering the increase in online interactions. While new techniques are introduced regularly, their evaluation uses metrics not designed to capture meeting-specific errors, undermining effective evaluation. This paper investigates what the frequently used automatic metrics capture and which errors they mask by correlating automatic metric scores with human evaluations across a broad error taxonomy. We commence with a comprehensive literature review on English meeting summarization to define key challenges like speaker dynamics and contextual turn-taking and error types such as missing information and linguistic inaccuracy, concepts previously loosely defined in the field. We examine the relationship between characteristic challenges and errors by using annotated transcripts and summaries from Transformer-based sequence-to-sequence and autoregressive models from the general summary QMSum dataset. Through experimental validation, we find that different model architectures respond variably to challenges in meeting transcripts, resulting in different pronounced links between challenges and errors. Current default-used metrics struggle to capture observable errors, showing weak to mid-correlations, while a third of the correlations show trends of error masking. Only a subset reacts accurately to specific errors, while most correlations show either unresponsiveness or failure to reflect the error's impact on summary quality.
Updated: 2024-04-17 07:15:07
标题: 引擎盖下面是什么:探究会议摘要自动评估指标
摘要: 会议总结已成为一个关键任务,考虑到在线互动的增加。虽然新技术不断推出,但它们的评估使用的指标并不是为捕捉特定于会议的错误而设计的,从而削弱了有效的评估。本文通过将自动指标得分与广泛的错误分类中的人工评估相关联,研究了常用自动指标捕捉了什么以及它们掩盖了哪些错误。我们首先进行了英语会议总结的全面文献回顾,以定义如发言者动态和语境交替等关键挑战,以及如缺失信息和语言不准确等错误类型,这些概念在该领域先前定义不明确。我们通过使用来自通用摘要QMSum数据集的基于Transformer的序列到序列和自回归模型的标注转录和摘要,来检查特征挑战和错误之间的关系。通过实验证实,我们发现不同的模型架构对会议转录中的挑战有不同的响应,导致挑战和错误之间存在不同程度的联系。当前默认使用的指标难以捕捉可观察的错误,显示出弱到中等的相关性,而三分之一的相关性显示出错误掩盖的趋势。只有一小部分对特定错误做出准确反应,而大多数相关性显示出无响应或无法反映错误对总结质量的影响。
更新时间: 2024-04-17 07:15:07
领域: cs.CL,cs.AI
Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification
This study is part of the debate on the efficiency of large versus small language models for text classification by prompting.We assess the performance of small language models in zero-shot text classification, challenging the prevailing dominance of large models.Across 15 datasets, our investigation benchmarks language models from 77M to 40B parameters using different architectures and scoring functions. Our findings reveal that small models can effectively classify texts, getting on par with or surpassing their larger counterparts.We developed and shared a comprehensive open-source repository that encapsulates our methodologies. This research underscores the notion that bigger isn't always better, suggesting that resource-efficient small models may offer viable solutions for specific data classification challenges.
Updated: 2024-04-17 07:10:28
标题: 小语言模型也很好:一项零样本分类的实证研究
摘要: 这项研究是关于大型与小型语言模型在文本分类中的效率辩论的一部分。我们评估了零-shot文本分类中小型语言模型的性能,挑战了大型模型的主导地位。在15个数据集上,我们的调查使用不同的架构和评分函数对来自77M到40B参数的语言模型进行基准测试。我们的研究结果显示,小型模型可以有效地对文本进行分类,与或超越其更大的对手。我们开发并共享了一个综合的开源存储库,概括了我们的方法论。这项研究强调了“更大不一定更好”的观念,表明资源高效的小型模型可能为特定数据分类挑战提供可行的解决方案。
更新时间: 2024-04-17 07:10:28
领域: cs.AI
TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment
Proprietary large language models (LLMs) have been widely applied in various scenarios. Additionally, deploying LLMs on edge devices is trending for efficiency and privacy reasons. However, edge deployment of proprietary LLMs introduces new security challenges: edge-deployed models are exposed as white-box accessible to users, enabling adversaries to conduct effective model stealing (MS) attacks. Unfortunately, existing defense mechanisms fail to provide effective protection. Specifically, we identify four critical protection properties that existing methods fail to simultaneously satisfy: (1) maintaining protection after a model is physically copied; (2) authorizing model access at request level; (3) safeguarding runtime reverse engineering; (4) achieving high security with negligible runtime overhead. To address the above issues, we propose TransLinkGuard, a plug-and-play model protection approach against model stealing on edge devices. The core part of TransLinkGuard is a lightweight authorization module residing in a secure environment, e.g., TEE. The authorization module can freshly authorize each request based on its input. Extensive experiments show that TransLinkGuard achieves the same security protection as the black-box security guarantees with negligible overhead.
Updated: 2024-04-17 07:08:45
标题: TransLinkGuard: 在边缘部署中保护Transformer模型免受模型盗窃的方法
摘要: 专有的大型语言模型(LLMs)已被广泛应用于各种场景中。此外,将LLMs部署在边缘设备上出于效率和隐私原因正在成为趋势。然而,专有LLMs在边缘部署引入了新的安全挑战:边缘部署的模型暴露给用户,使得对手能够进行有效的模型窃取(MS)攻击。不幸的是,现有的防御机制未能提供有效的保护。具体而言,我们确定了现有方法未能同时满足的四个关键保护属性:(1)在模型被物理复制后维持保护;(2)在请求级别授权模型访问;(3)保护运行时反向工程;(4)在几乎没有运行时开销的情况下实现高安全性。为了解决上述问题,我们提出了TransLinkGuard,一个针对边缘设备上的模型窃取的即插即用模型保护方法。TransLinkGuard的核心部分是一个轻量级授权模块,驻留在安全环境中,如可信执行环境(TEE)。授权模块可以根据输入对每个请求进行新鲜授权。大量实验证明,TransLinkGuard实现了与黑盒安全保证相同的安全保护,几乎没有额外开销。
更新时间: 2024-04-17 07:08:45
领域: cs.CR,cs.AI
Variational quantization for state space models
Forecasting tasks using large datasets gathering thousands of heterogeneous time series is a crucial statistical problem in numerous sectors. The main challenge is to model a rich variety of time series, leverage any available external signals and provide sharp predictions with statistical guarantees. In this work, we propose a new forecasting model that combines discrete state space hidden Markov models with recent neural network architectures and training procedures inspired by vector quantized variational autoencoders. We introduce a variational discrete posterior distribution of the latent states given the observations and a two-stage training procedure to alternatively train the parameters of the latent states and of the emission distributions. By learning a collection of emission laws and temporarily activating them depending on the hidden process dynamics, the proposed method allows to explore large datasets and leverage available external signals. We assess the performance of the proposed method using several datasets and show that it outperforms other state-of-the-art solutions.
Updated: 2024-04-17 07:01:41
标题: 状态空间模型的变分量子化
摘要: 使用收集了数千个异质时间序列的大型数据集进行预测任务是许多领域中的一个关键统计问题。主要挑战是对各种各样的时间序列进行建模,利用任何可用的外部信号,并提供具有统计保证的精确预测。在这项工作中,我们提出了一种新的预测模型,将离散状态空间隐马尔可夫模型与最近的神经网络架构和受向量量化变分自动编码器启发的训练程序相结合。我们引入了给定观察结果的潜在状态的变分离散后验分布,以及一个两阶段训练过程,交替训练潜在状态的参数和发射分布的参数。通过学习一系列发射规律,并根据隐藏过程动态临时激活它们,所提出的方法允许探索大型数据集并利用可用的外部信号。我们使用几个数据集评估了所提出方法的性能,并显示其优于其他最先进的解决方案。
更新时间: 2024-04-17 07:01:41
领域: cs.LG
Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge
In this challenge, we disentangle the deep filters from the original DeepfilterNet and incorporate them into our Spec-UNet-based network to further improve a hybrid Demucs (hdemucs) based remixing pipeline. The motivation behind the use of the deep filter component lies at its potential in better handling temporal fine structures. We demonstrate an incremental improvement in both the Signal-to-Distortion Ratio (SDR) and the Hearing Aid Audio Quality Index (HAAQI) metrics when comparing the performance of hdemucs against different versions of our model.
Updated: 2024-04-17 07:01:29
标题: 音乐增强与深度滤波:ICASSP 2024卡登萨挑战赛技术报告
摘要: 在这个挑战中,我们将原始DeepfilterNet中的深度滤波器与我们基于Spec-UNet的网络结合起来,进一步改进了基于混合Demucs(hdemucs)的混音流程。使用深度滤波器组件的动机在于其在更好处理时间细微结构方面的潜力。当比较hdemucs与我们模型的不同版本的性能时,我们展示了信号失真比(SDR)和助听器音频质量指数(HAAQI)指标的逐步改善。
更新时间: 2024-04-17 07:01:29
领域: cs.SD,cs.AI,cs.LG,cs.MM,eess.AS
Reuse out-of-year data to enhance land cover mappingvia feature disentanglement and contrastive learning
Timely up-to-date land use/land cover (LULC) maps play a pivotal role in supporting agricultural territory management, environmental monitoring and facilitating well-informed and sustainable decision-making. Typically, when creating a land cover (LC) map, precise ground truth data is collected through time-consuming and expensive field campaigns. This data is then utilized in conjunction with satellite image time series (SITS) through advanced machine learning algorithms to get the final map. Unfortunately, each time this process is repeated (e.g., annually over a region to estimate agricultural production or potential biodiversity loss), new ground truth data must be collected, leading to the complete disregard of previously gathered reference data despite the substantial financial and time investment they have required. How to make value of historical data, from the same or similar study sites, to enhance the current LULC mapping process constitutes a significant challenge that could enable the financial and human-resource efforts invested in previous data campaigns to be valued again. Aiming to tackle this important challenge, we here propose a deep learning framework based on recent advances in domain adaptation and generalization to combine remote sensing and reference data coming from two different domains (e.g. historical data and fresh ones) to ameliorate the current LC mapping process. Our approach, namely REFeD (data Reuse with Effective Feature Disentanglement for land cover mapping), leverages a disentanglement strategy, based on contrastive learning, where invariant and specific per-domain features are derived to recover the intrinsic information related to the downstream LC mapping task and alleviate possible distribution shifts between domains. Additionally, REFeD is equipped with an effective supervision scheme where feature disentanglement is further enforced via multiple levels of supervision at different granularities. The experimental assessment over two study areas covering extremely diverse and contrasted landscapes, namely Koumbia (located in the West-Africa region, in Burkina Faso) and Centre Val de Loire (located in centre Europe, France), underlines the quality of our framework and the obtained findings demonstrate that out-of-year information coming from the same (or similar) study site, at different periods of time, can constitute a valuable additional source of information to enhance the LC mapping process.
Updated: 2024-04-17 07:00:20
标题: 利用非年度数据通过特征分解和对比学习提升土地覆盖地图绘制
摘要: 及时更新的土地利用/土地覆盖(LULC)地图在支持农业领域管理、环境监测和促进明智和可持续决策方面发挥着关键作用。通常,在创建土地覆盖(LC)地图时,通过耗时且昂贵的野外调查收集精确的地面真实数据。然后利用这些数据与卫星图像时间序列(SITS)结合,通过先进的机器学习算法获取最终地图。不幸的是,每次重复此过程(例如,每年在一个地区估计农业生产或潜在的生物多样性损失),都必须收集新的地面真实数据,导致先前收集的参考数据完全被忽视,尽管它们需要大量的财务和时间投资。如何利用历史数据,从相同或类似的研究地点,以增强当前的LULC地图制作过程构成了一个重大挑战,可以使之前数据调查所投入的财务和人力资源努力再次得到重视。为了解决这一重要挑战,我们在此提出了一种基于最新领域适应和泛化进展的深度学习框架,用于结合来自两个不同领域(例如历史数据和新数据)的遥感和参考数据,以改善当前的LC制图过程。我们的方法,即REFeD(用于土地覆盖制图的有效特征分离数据重用),利用了一种基于对比学习的分离策略,从而从各自领域推导出不变和特定的特征,以恢复与下游LC制图任务相关的内在信息,并减轻可能存在的领域之间的分布偏移。此外,REFeD配备了一种有效的监督方案,通过在不同粒度级别进行多层次监督来进一步强化特征分离。在涵盖极其多样化和对比鲜明的两个研究区域,即位于布基纳法索西非地区的Koumbia和位于法国中欧的Centre Val de Loire的实验评估中,强调了我们框架的质量,并得出的结果表明,来自相同(或类似)研究地点,不同时间段的跨年信息可以构成增强LC制图过程的有价值的额外信息来源。
更新时间: 2024-04-17 07:00:20
领域: cs.LG
KernJC: Automated Vulnerable Environment Generation for Linux Kernel Vulnerabilities
Linux kernel vulnerability reproduction is a critical task in system security. To reproduce a kernel vulnerability, the vulnerable environment and the Proof of Concept (PoC) program are needed. Most existing research focuses on the generation of PoC, while the construction of environment is overlooked. However, establishing an effective vulnerable environment to trigger a vulnerability is challenging. Firstly, it is hard to guarantee that the selected kernel version for reproduction is vulnerable, as the vulnerability version claims in online databases can occasionally be spurious. Secondly, many vulnerabilities can not be reproduced in kernels built with default configurations. Intricate non-default kernel configurations must be set to include and trigger a kernel vulnerability, but less information is available on how to recognize these configurations. To solve these challenges, we propose a patch-based approach to identify real vulnerable kernel versions and a graph-based approach to identify necessary configs for activating a specific vulnerability. We implement these approaches in a tool, KernJC, automating the generation of vulnerable environments for kernel vulnerabilities. To evaluate the efficacy of KernJC, we build a dataset containing 66 representative real-world vulnerabilities with PoCs from kernel vulnerability research in the past five years. The evaluation shows that KernJC builds vulnerable environments for all these vulnerabilities, 48.5% of which require non-default configs, and 4 have incorrect version claims in the National Vulnerability Database (NVD). Furthermore, we conduct large-scale spurious version detection on kernel vulnerabilities and identify 128 vulnerabilities which have spurious version claims in NVD. To foster future research, we release KernJC with the dataset in the community.
Updated: 2024-04-17 06:45:05
标题: KernJC:用于Linux内核漏洞的自动化脆弱环境生成
摘要: Linux内核漏洞复现是系统安全中的关键任务。为了复现内核漏洞,需要有易受攻击的环境和概念验证(PoC)程序。大部分现有研究关注于PoC的生成,而环境的构建却被忽视了。然而,建立一个有效的易受攻击环境来触发漏洞是具有挑战性的。首先,很难保证为复现选择的内核版本是有漏洞的,因为在线数据库中的漏洞版本声明有时可能是虚假的。其次,许多漏洞无法在使用默认配置构建的内核中复现。必须设置复杂的非默认内核配置来包含和触发内核漏洞,但关于如何识别这些配置的信息较少。 为了解决这些挑战,我们提出了一种基于补丁的方法来识别真实易受攻击的内核版本,并提出了一种基于图的方法来识别激活特定漏洞所需的配置。我们将这些方法实现在一个工具KernJC中,自动化生成内核漏洞的易受攻击环境。为了评估KernJC的有效性,我们构建了一个包含66个代表性真实世界漏洞和过去五年内核漏洞研究中的PoC的数据集。评估结果显示,KernJC为所有这些漏洞构建了易受攻击环境,其中48.5%需要非默认配置,4个在国家漏洞数据库(NVD)中有错误的版本声明。此外,我们对内核漏洞进行了大规模虚假版本检测,并识别出128个在NVD中有虚假版本声明的漏洞。为了促进未来研究,我们在社区发布了KernJC和数据集。
更新时间: 2024-04-17 06:45:05
领域: cs.CR,cs.SE
Fourier-Mixed Window Attention: Accelerating Informer for Long Sequence Time-Series Forecasting
We study a fast local-global window-based attention method to accelerate Informer for long sequence time-series forecasting. While window attention being local is a considerable computational saving, it lacks the ability to capture global token information which is compensated by a subsequent Fourier transform block. Our method, named FWin, does not rely on query sparsity hypothesis and an empirical approximation underlying the ProbSparse attention of Informer. Through experiments on univariate and multivariate datasets, we show that FWin transformers improve the overall prediction accuracies of Informer while accelerating its inference speeds by 1.6 to 2 times. We also provide a mathematical definition of FWin attention, and prove that it is equivalent to the canonical full attention under the block diagonal invertibility (BDI) condition of the attention matrix. The BDI is shown experimentally to hold with high probability for typical benchmark datasets.
Updated: 2024-04-17 06:37:30
标题: 傅立叶混合窗口注意力:加速长序列时间序列预测的Informer
摘要: 我们研究了一种快速的本地-全局窗口注意力方法,用于加速Informer进行长序列时间序列预测。虽然窗口注意力是本地的,这在计算上是一个很大的节省,但它缺乏捕获全局令牌信息的能力,这一点由后续的傅里叶变换块来弥补。我们的方法名为FWin,不依赖于Informer的ProbSparse注意力中的查询稀疏假设和经验逼近。通过对单变量和多变量数据集的实验,我们展示了FWin transformers提高了Informer的整体预测准确性,并将其推断速度加速了1.6到2倍。我们还提供了FWin注意力的数学定义,并证明在注意力矩阵的块对角可逆性(BDI)条件下,它等效于标准的完全注意力。实验证明,对于典型的基准数据集,BDI以高概率成立。
更新时间: 2024-04-17 06:37:30
领域: cs.LG,cs.AI
Synthesizing Realistic Data for Table Recognition
To overcome the limitations and challenges of current automatic table data annotation methods and random table data synthesis approaches, we propose a novel method for synthesizing annotation data specifically designed for table recognition. This method utilizes the structure and content of existing complex tables, facilitating the efficient creation of tables that closely replicate the authentic styles found in the target domain. By leveraging the actual structure and content of tables from Chinese financial announcements, we have developed the first extensive table annotation dataset in this domain. We used this dataset to train several recent deep learning-based end-to-end table recognition models. Additionally, we have established the inaugural benchmark for real-world complex tables in the Chinese financial announcement domain, using it to assess the performance of models trained on our synthetic data, thereby effectively validating our method's practicality and effectiveness. Furthermore, we applied our synthesis method to augment the FinTabNet dataset, extracted from English financial announcements, by increasing the proportion of tables with multiple spanning cells to introduce greater complexity. Our experiments show that models trained on this augmented dataset achieve comprehensive improvements in performance, especially in the recognition of tables with multiple spanning cells.
Updated: 2024-04-17 06:36:17
标题: 合成逼真数据以进行表格识别
摘要: 为了克服当前自动表格数据标注方法和随机表格数据合成方法的局限性和挑战,我们提出了一种新颖的方法,用于合成专门设计用于表格识别的标注数据。该方法利用现有复杂表格的结构和内容,促进了有效创建与目标域中真实风格密切匹配的表格。通过利用中国金融公告中的实际结构和内容,我们在该领域开发了第一个广泛的表格标注数据集。我们使用这个数据集来训练几种最新的基于深度学习的端到端表格识别模型。此外,我们建立了中国金融公告领域实际复杂表格的首个基准,用它来评估在我们的合成数据上训练的模型的性能,从而有效验证了我们方法的实用性和有效性。此外,我们将我们的合成方法应用于增强从英文金融公告中提取的FinTabNet数据集,通过增加具有多个跨越单元格的表格的比例来引入更大的复杂性。我们的实验证明,训练在这个增强数据集上的模型在性能上取得了全面的改进,特别是在识别具有多个跨越单元格的表格方面。
更新时间: 2024-04-17 06:36:17
领域: cs.CV,cs.LG
Partial Rankings of Optimizers
We introduce a framework for benchmarking optimizers according to multiple criteria over various test functions. Based on a recently introduced union-free generic depth function for partial orders/rankings, it fully exploits the ordinal information and allows for incomparability. Our method describes the distribution of all partial orders/rankings, avoiding the notorious shortcomings of aggregation. This permits to identify test functions that produce central or outlying rankings of optimizers and to assess the quality of benchmarking suites.
Updated: 2024-04-17 06:31:27
标题: 优化器的部分排名
摘要: 我们介绍了一个基于多个标准对各种测试函数进行优化器基准测试的框架。基于最近引入的用于偏序/排名的无联盟通用深度函数,它充分利用了序数信息并允许不可比性。我们的方法描述了所有偏序/排名的分布,避免了聚合的臭名昭著的缺点。这使得我们能够识别产生优化器中心或离群排名的测试函数,并评估基准测试套件的质量。
更新时间: 2024-04-17 06:31:27
领域: cs.LG,stat.ML
Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues
Aligning large language models (LLMs) with human expectations requires high-quality instructional dialogues, which can be achieved by raising diverse, in-depth, and insightful instructions that deepen interactions. Existing methods target instructions from real instruction dialogues as a learning goal and fine-tune a user simulator for posing instructions. However, the user simulator struggles to implicitly model complex dialogue flows and pose high-quality instructions. In this paper, we take inspiration from the cognitive abilities inherent in human learning and propose the explicit modeling of complex dialogue flows through instructional strategy reuse. Specifically, we first induce high-level strategies from various real instruction dialogues. These strategies are applied to new dialogue scenarios deductively, where the instructional strategies facilitate high-quality instructions. Experimental results show that our method can generate diverse, in-depth, and insightful instructions for a given dialogue history. The constructed multi-turn instructional dialogues can outperform competitive baselines on the downstream chat model.
Updated: 2024-04-17 06:26:32
标题: 归纳-演绎策略复用在多轮指导对话中的应用
摘要: 将大型语言模型(LLMs)与人类期望保持一致需要高质量的指导对话,这可以通过提出多样化、深入和富有洞察力的指导来实现,从而加深互动。现有方法以真实指导对话中的指导为学习目标,并对用户模拟器进行微调,以提出指导。然而,用户模拟器难以隐式模拟复杂的对话流程并提出高质量的指导。在本文中,我们从人类学习中固有的认知能力中汲取灵感,提出通过指导策略重用来明确建模复杂的对话流程。具体而言,我们首先从各种真实指导对话中归纳出高级策略。这些策略在新的对话场景中被演绎应用,其中指导策略促进了高质量的指导。实验结果表明,我们的方法可以为给定的对话历史生成多样化、深入和富有洞察力的指导。构建的多轮指导对话可以在下游聊天模型上胜过竞争基线。
更新时间: 2024-04-17 06:26:32
领域: cs.CL,cs.AI
DeepVARwT: Deep Learning for a VAR Model with Trend
The vector autoregressive (VAR) model has been used to describe the dependence within and across multiple time series. This is a model for stationary time series which can be extended to allow the presence of a deterministic trend in each series. Detrending the data either parametrically or nonparametrically before fitting the VAR model gives rise to more errors in the latter part. In this study, we propose a new approach called DeepVARwT that employs deep learning methodology for maximum likelihood estimation of the trend and the dependence structure at the same time. A Long Short-Term Memory (LSTM) network is used for this purpose. To ensure the stability of the model, we enforce the causality condition on the autoregressive coefficients using the transformation of Ansley & Kohn (1986). We provide a simulation study and an application to real data. In the simulation study, we use realistic trend functions generated from real data and compare the estimates with true function/parameter values. In the real data application, we compare the prediction performance of this model with state-of-the-art models in the literature.
Updated: 2024-04-17 06:24:16
标题: DeepVARwT:具有趋势的VAR模型的深度学习
摘要: 矢量自回归(VAR)模型被用来描述多个时间序列之间和内部的依赖关系。这是一个适用于平稳时间序列的模型,可以扩展以允许每个序列中存在确定性趋势。在拟合VAR模型之前,对数据进行参数化或非参数化去趋势化会导致后期出现更多的错误。在本研究中,我们提出了一种新方法,称为DeepVARwT,它采用深度学习方法同时最大似然估计趋势和依赖结构。为此目的使用了长短期记忆(LSTM)网络。为了确保模型的稳定性,我们使用Ansley&Kohn(1986)的变换来强制自回归系数上的因果关系条件。我们进行了一项模拟研究和一个实际数据应用。在模拟研究中,我们使用从真实数据生成的现实趋势函数,并与真实函数/参数值进行比较。在实际数据应用中,我们将该模型的预测性能与文献中的最新模型进行比较。
更新时间: 2024-04-17 06:24:16
领域: stat.ME,cs.AI
Leave-one-out Distinguishability in Machine Learning
We introduce an analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a notion we define as leave-one-out distinguishability (LOOD). This is key to measuring data **memorization** and information **leakage** as well as the **influence** of training data points in machine learning. We illustrate how our method broadens and refines existing empirical measures of memorization and privacy risks associated with training data. We use Gaussian processes to model the randomness of machine learning algorithms, and validate LOOD with extensive empirical analysis of leakage using membership inference attacks. Our analytical framework enables us to investigate the causes of leakage and where the leakage is high. For example, we analyze the influence of activation functions, on data memorization. Additionally, our method allows us to identify queries that disclose the most information about the training data in the leave-one-out setting. We illustrate how optimal queries can be used for accurate **reconstruction** of training data.
Updated: 2024-04-17 06:17:59
标题: 机器学习中的留一法可分辨性
摘要: 我们引入了一个分析框架,用于量化在机器学习算法的训练集中包含少量数据点后输出分布的变化,我们将这一概念定义为留一法可区分性(LOOD)。这对于衡量数据记忆和信息泄漏以及训练数据点在机器学习中的影响至关重要。我们阐述了我们的方法如何扩展和完善现有的关于训练数据相关的记忆和隐私风险的经验性度量。我们使用高斯过程来模拟机器学习算法的随机性,并通过广泛的实证分析使用成员推断攻击验证LOOD。我们的分析框架使我们能够调查泄漏的原因以及泄漏高发的位置。例如,我们分析了激活函数对数据记忆的影响。此外,我们的方法还允许我们识别在留一法设置中揭示训练数据最多信息的查询。我们阐述了如何利用最优查询来准确重建训练数据。
更新时间: 2024-04-17 06:17:59
领域: cs.LG
Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems
Simulating the dynamics of open quantum systems coupled to non-Markovian environments remains an outstanding challenge due to exponentially scaling computational costs. We present an artificial intelligence strategy to overcome this obstacle by integrating the neural quantum states approach into the dissipaton-embedded quantum master equation in second quantization (DQME-SQ). Our approach utilizes restricted Boltzmann machines (RBMs) to compactly represent the reduced density tensor, explicitly encoding the combined effects of system-environment correlations and nonMarkovian memory. Applied to model systems exhibiting prominent effects of system-environment correlation and non-Markovian memory, our approach achieves comparable accuracy to conventional hierarchical equations of motion, while requiring significantly fewer dynamical variables. The novel RBM-based DQME-SQ approach paves the way for investigating non-Markovian open quantum dynamics in previously intractable regimes, with implications spanning various frontiers of modern science.
Updated: 2024-04-17 06:17:08
标题: 神经网络方法用于多体开放量子系统的非马尔可夫耗散动力学
摘要: 模拟与非马尔可夫环境耦合的开放量子系统动力学仍然是一个巨大挑战,因为计算成本呈指数增长。我们提出了一种人工智能策略,通过将神经量子态方法整合到第二量子化中的耗散嵌入量子主方程(DQME-SQ)中,来克服这一障碍。我们的方法利用受限玻尔兹曼机(RBMs)来紧凑表示减少密度张量,明确地编码系统-环境相关性和非马尔可夫记忆的综合效应。应用于展现系统-环境相关性和非马尔可夫记忆显著效应的模型系统,我们的方法在需要较少动力变量的同时实现了与传统层次方程相当的准确性。基于新颖的基于RBM的DQME-SQ方法为研究以前难以处理的非马尔可夫开放量子动力学开辟了道路,具有横跨现代科学各个领域的意义。
更新时间: 2024-04-17 06:17:08
领域: quant-ph,cs.LG
Ciphertext-Only Attack on a Secure $k$-NN Computation on Cloud
The rise of cloud computing has spurred a trend of transferring data storage and computational tasks to the cloud. To protect confidential information such as customer data and business details, it is essential to encrypt this sensitive data before cloud storage. Implementing encryption can prevent unauthorized access, data breaches, and the resultant financial loss, reputation damage, and legal issues. Moreover, to facilitate the execution of data mining algorithms on the cloud-stored data, the encryption needs to be compatible with domain computation. The $k$-nearest neighbor ($k$-NN) computation for a specific query vector is widely used in fields like location-based services. Sanyashi et al. (ICISS 2023) proposed an encryption scheme to facilitate privacy-preserving $k$-NN computation on the cloud by utilizing Asymmetric Scalar-Product-Preserving Encryption (ASPE). In this work, we identify a significant vulnerability in the aforementioned encryption scheme of Sanyashi et al. Specifically, we give an efficient algorithm and also empirically demonstrate that their encryption scheme is vulnerable to the ciphertext-only attack (COA).
Updated: 2024-04-17 06:09:03
标题: 对云上安全$k$-NN计算的密文攻击
摘要: 云计算的兴起推动了将数据存储和计算任务转移到云端的趋势。为了保护诸如客户数据和业务细节等机密信息,在将这些敏感数据存储到云端之前加密是至关重要的。实施加密可以防止未经授权的访问、数据泄露以及由此导致的财务损失、声誉损害和法律问题。此外,为了促进在云端存储数据上执行数据挖掘算法,加密需要与领域计算兼容。$k$-最近邻($k$-NN)计算用于特定查询向量在诸如基于位置的服务等领域被广泛使用。Sanyashi等人(2023年ICISS)提出了一种利用非对称标量积保持加密(ASPE)来促进在云端进行隐私保护的$k$-NN计算的加密方案。 在这项工作中,我们发现了Sanyashi等人之前提出的加密方案存在重大漏洞。具体来说,我们提出了一种有效的算法,并经验性地证明他们的加密方案容易受到仅密文攻击(COA)的攻击。
更新时间: 2024-04-17 06:09:03
领域: cs.CR
Strings from the Library of Babel: Random Sampling as a Strong Baseline for Prompt Optimisation
Recent prompt optimisation approaches use the generative nature of language models to produce prompts -- even rivaling the performance of human-curated prompts. In this paper, we demonstrate that randomly sampling tokens from the model vocabulary as ``separators'' can be as effective as language models for prompt-style text classification. Our experiments show that random separators are competitive baselines, having less than a 1% difference compared to previous self-optimisation methods and showing a 12% average relative improvement over strong human baselines across nine text classification tasks and eight language models. We further analyse this phenomenon in detail using three different random generation strategies, establishing that the language space is rich with potentially good separators, with a greater than 40% average chance that a randomly drawn separator performs better than human-curated separators. These observations challenge the common assumption that an effective prompt should be human readable or task relevant and establish a strong baseline for prompt optimisation research.
Updated: 2024-04-17 06:00:51
标题: 巴别图书馆的字符串:随机抽样作为提示优化的强基准
摘要: 最近的快速优化方法利用语言模型的生成性质生成提示 - 甚至可以与人工筛选的提示性能相媲美。在本文中,我们证明了从模型词汇中随机抽样标记作为“分隔符”可以像语言模型一样有效地用于提示风格的文本分类。我们的实验表明,随机分隔符是竞争基线,与先前的自我优化方法相比,差异小于1%,并且与九个文本分类任务和八个语言模型中的强人类基线相比,平均相对改进率为12%。我们进一步详细分析了这一现象,使用三种不同的随机生成策略,建立了语言空间富有潜力的好分隔符,一个随机选择的分隔符表现优于人工筛选的分隔符的概率平均超过40%。这些观察结果挑战了一个常见的假设,即有效的提示应该是人类可读或任务相关的,并为提示优化研究建立了一个强大的基线。
更新时间: 2024-04-17 06:00:51
领域: cs.CL,cs.AI
ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models
The rapid advancement of large language models (LLMs) necessitates the development of new benchmarks to accurately assess their capabilities. To address this need for Vietnamese, this work aims to introduce ViLLM-Eval, the comprehensive evaluation suite designed to measure the advanced knowledge and reasoning abilities of foundation models within a Vietnamese context. ViLLM-Eval consists of multiple-choice questions and predict next word tasks spanning various difficulty levels and diverse disciplines, ranging from humanities to science and engineering. A thorough evaluation of the most advanced LLMs on ViLLM-Eval revealed that even the best performing models have significant room for improvement in understanding and responding to Vietnamese language tasks. ViLLM-Eval is believed to be instrumental in identifying key strengths and weaknesses of foundation models, ultimately promoting their development and enhancing their performance for Vietnamese users.
Updated: 2024-04-17 05:57:17
标题: ViLLM-Eval:用于越南大型语言模型的全面评估套件
摘要: 大型语言模型(LLMs)的快速发展需要开发新的基准来准确评估它们的能力。为了满足越南语的这一需求,本研究旨在引入ViLLM-Eval,这是一个全面的评估套件,旨在衡量基础模型在越南语境内的高级知识和推理能力。ViLLM-Eval包括多项选择题和预测下一个词任务,涵盖各种难度级别和不同学科,从人文学科到科学工程学科。对ViLLM-Eval上最先进的LLMs进行深入评估表明,即使表现最佳的模型在理解和回应越南语言任务方面仍有很大的改进空间。人们相信ViLLM-Eval在识别基础模型的主要优势和劣势方面起着关键作用,最终促进它们的发展并提升它们在越南用户中的性能。
更新时间: 2024-04-17 05:57:17
领域: cs.CL,cs.AI
Dependency-based Anomaly Detection: a General Framework and Comprehensive Evaluation
Anomaly detection is crucial for understanding unusual behaviors in data, as anomalies offer valuable insights. This paper introduces Dependency-based Anomaly Detection (DepAD), a general framework that utilizes variable dependencies to uncover meaningful anomalies with better interpretability. DepAD reframes unsupervised anomaly detection as supervised feature selection and prediction tasks, which allows users to tailor anomaly detection algorithms to their specific problems and data. We extensively evaluate representative off-the-shelf techniques for the DepAD framework. Two DepAD algorithms emerge as all-rounders and superior performers in handling a wide range of datasets compared to nine state-of-the-art anomaly detection methods. Additionally, we demonstrate that DepAD algorithms provide new and insightful interpretations for detected anomalies.
Updated: 2024-04-17 05:44:10
标题: 基于依赖关系的异常检测:一个通用框架和全面评估
摘要: 异常检测对于理解数据中的异常行为至关重要,因为异常提供了有价值的见解。本文介绍了基于依赖关系的异常检测(DepAD),这是一个利用变量依赖关系来揭示具有更好可解释性的有意义异常的通用框架。DepAD将无监督异常检测重新构建为监督特征选择和预测任务,这使用户能够根据其特定问题和数据定制异常检测算法。我们对DepAD框架的代表性现成技术进行了广泛评估。与九种最先进的异常检测方法相比,两种DepAD算法 emerge为全能型和优秀表现者,在处理各种数据集时表现更佳。此外,我们展示了DepAD算法为检测到的异常提供了新颖和有见地的解释。
更新时间: 2024-04-17 05:44:10
领域: cs.LG,cs.AI
MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
Recent advancements in post-hoc and inherently interpretable methods have markedly enhanced the explanations of black box classifier models. These methods operate either through post-analysis or by integrating concept learning during model training. Although being effective in bridging the semantic gap between a model's latent space and human interpretation, these explanation methods only partially reveal the model's decision-making process. The outcome is typically limited to high-level semantics derived from the last feature map. We argue that the explanations lacking insights into the decision processes at low and mid-level features are neither fully faithful nor useful. Addressing this gap, we introduce the Multi-Level Concept Prototypes Classifier (MCPNet), an inherently interpretable model. MCPNet autonomously learns meaningful concept prototypes across multiple feature map levels using Centered Kernel Alignment (CKA) loss and an energy-based weighted PCA mechanism, and it does so without reliance on predefined concept labels. Further, we propose a novel classifier paradigm that learns and aligns multi-level concept prototype distributions for classification purposes via Class-aware Concept Distribution (CCD) loss. Our experiments reveal that our proposed MCPNet while being adaptable to various model architectures, offers comprehensive multi-level explanations while maintaining classification accuracy. Additionally, its concept distribution-based classification approach shows improved generalization capabilities in few-shot classification scenarios.
Updated: 2024-04-17 05:42:52
标题: MCPNet: 通过多层次概念原型实现可解释的分类器
摘要: 最近在事后和固有可解释方法方面取得了显著进展,大大增强了黑匣子分类器模型的解释能力。这些方法通过事后分析或在模型训练过程中整合概念学习来运作。尽管这些解释方法在模型的潜在空间和人类解释之间的语义差距方面效果显著,但这些解释方法只能部分揭示模型的决策过程。其结果通常仅限于从最后的特征图中得出的高级语义。我们认为,缺乏对低级和中级特征决策过程洞察力的解释既不完全忠实也不实用。为了解决这一差距,我们引入了多级概念原型分类器(MCPNet),这是一个固有可解释的模型。MCPNet使用中心核对齐(CKA)损失和基于能量的加权主成分分析机制自主学习跨多个特征图级别的有意义的概念原型,而无需依赖预定义的概念标签。此外,我们提出了一个新的分类器范式,通过类感知概念分布(CCD)损失学习和对齐多级概念原型分布以用于分类目的。我们的实验表明,我们提出的MCPNet虽适用于各种模型架构,但在保持分类准确性的同时提供了全面的多级解释。此外,其基于概念分布的分类方法表现出在少样本分类场景中的改进泛化能力。
更新时间: 2024-04-17 05:42:52
领域: cs.CV,cs.LG
Sisu: Decentralized Trustless Bridge For Full Ethereum Node
In this paper, we present a detailed approach and implementation to prove Ethereum full node using recursive SNARK, distributed general GKR and Groth16. Our protocol's name is Sisu whose architecture is based on distributed Virgo in zkBridge with some major improvements. Besides proving signature aggregation, we provide solutions to 2 hard problems in proving Ethereum full node: 1) any public key is valid under previous beacon state and 2) all public keys are pairwise distinct. Our solution does not require worker-to-worker communication and therefore reduce total worker-to-worker network traffic from terabyte of data to zero compared to zkBridge. This makes our approach suitable for emerging distributed prover markets and more decentralized compared to zkBridge. Our design is highly parallelable and capable of running on GPU for most parts.
Updated: 2024-04-17 05:42:14
标题: Sisu:用于完整以太坊节点的分散式无信任桥梁
摘要: 在本文中,我们提出了一种详细的方法和实现,以证明使用递归SNARK、分布式通用GKR和Groth16的以太坊全节点。我们的协议名为Sisu,其架构基于在zkBridge中具有一些重大改进的分布式Virgo。除了证明签名聚合,我们还提供了解决以太坊全节点证明中的两个难题的解决方案:1)任何公钥在先前的信标状态下都是有效的,2)所有公钥是两两不同的。我们的解决方案不需要工人之间的通信,因此与zkBridge相比,将工人之间的总网络流量从千兆字节的数据减少到零。这使得我们的方法适用于新兴的分布式证明市场,并且与zkBridge相比更为分散化。我们的设计高度可并行化,并且大部分可以在GPU上运行。
更新时间: 2024-04-17 05:42:14
领域: cs.CR
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
Large Language Models (LLMs) such as GPT & Llama have demonstrated significant achievements in summarization tasks but struggle with factual inaccuracies, a critical issue in clinical NLP applications where errors could lead to serious consequences. To counter the high costs and limited availability of expert-annotated data for factual alignment, this study introduces an innovative pipeline that utilizes >100B parameter GPT variants like GPT-3.5 & GPT-4 to act as synthetic experts to generate high-quality synthetics feedback aimed at enhancing factual consistency in clinical note summarization. Our research primarily focuses on edit feedback generated by these synthetic feedback experts without additional human annotations, mirroring and optimizing the practical scenario in which medical professionals refine AI system outputs. Although such 100B+ parameter GPT variants have proven to demonstrate expertise in various clinical NLP tasks, such as the Medical Licensing Examination, there is scant research on their capacity to act as synthetic feedback experts and deliver expert-level edit feedback for improving the generation quality of weaker (<10B parameter) LLMs like GPT-2 (1.5B) & Llama 2 (7B) in clinical domain. So in this work, we leverage 100B+ GPT variants to act as synthetic feedback experts offering expert-level edit feedback, that is used to reduce hallucinations and align weaker (<10B parameter) LLMs with medical facts using two distinct alignment algorithms (DPO & SALT), endeavoring to narrow the divide between AI-generated content and factual accuracy. This highlights the substantial potential of LLM-based synthetic edits in enhancing the alignment of clinical factuality.
Updated: 2024-04-17 05:27:25
标题: SYNFAC-EDIT: 用于临床总结中事实对齐的合成模拟编辑反馈
摘要: 大型语言模型(LLMs),如GPT和Llama,在摘要任务中取得了显著成就,但在事实准确性方面存在困难,这在临床NLP应用中是一个关键问题,因为错误可能导致严重后果。为了应对事实对齐的高成本和有限的专家注释数据的可用性,本研究引入了一种创新的流程,利用>100B参数的GPT变体,如GPT-3.5和GPT-4,充当合成专家,生成旨在增强临床笔记摘要中事实一致性的高质量合成反馈。我们的研究主要关注这些合成反馈专家生成的编辑反馈,而无需额外的人工注释,模拟并优化医疗专业人员改进AI系统输出的实际情景。尽管这种100B+参数的GPT变体已经证明在各种临床NLP任务中展示出专业知识,比如医学执照考试,但很少有研究探讨它们作为合成反馈专家的能力,以及为改善临床领域中较弱(<10B参数)的LLMs,如GPT-2(1.5B)和Llama 2(7B)生成质量的专业级编辑反馈。因此,在这项工作中,我们利用100B+的GPT变体充当合成反馈专家,提供专业级编辑反馈,用于减少幻觉并使用两种不同的对齐算法(DPO和SALT)将较弱的(<10B参数)LLMs与医疗事实对齐,努力缩小AI生成内容和事实准确性之间的差距。这突显了基于LLM的合成编辑在增强临床事实对齐方面的巨大潜力。
更新时间: 2024-04-17 05:27:25
领域: cs.CL,cs.AI
Web 3.0 and Quantum Security: Long-Distance Free-Space QSDC for Global Web 3.0 Networks
With the advent of Web 3.0, the swift advancement of technology confronts an imminent threat from quantum computing. Security protocols safeguarding the integrity of Web 2.0 and Web 3.0 are growing more susceptible to both quantum attacks and sophisticated classical threats. The article introduces our novel long-distance free-space quantum secure direct communication (LF QSDC) as a method to safeguard against security breaches in both quantum and classical contexts. Differing from techniques like quantum key distribution (QKD), LF QSDC surpasses constraints by facilitating encrypted data transmission sans key exchanges, thus diminishing the inherent weaknesses of key-based systems. The distinctiveness of this attribute, coupled with its quantum mechanics base, protects against quantum computer assaults and advanced non-quantum dangers, harmonizing seamlessly with the untrustworthy tenets of the Web 3.0 age. The focus of our study is the technical design and incorporation of LF QSDC into web 3.0 network infrastructures, highlighting its efficacy for extended-range communication. LF QSDC is based on the memory DL04 protocol and enhanced with our novel Quantum-Aware Low-Density Parity Check (LDPC), Pointing, Acquisition, and Tracking (PAT) technologies, and Atmospheric Quantum Correction Algorithm (AQCA). Utilizing this method not only bolsters the security of worldwide Web 3.0 networks but also guarantees their endurance in a time when quantum and sophisticated classical threats exist simultaneously. Consequently, LF QSDC stands out as a robust security solution, well-suited for Web 3.0 systems amidst the constantly evolving digital environment.
Updated: 2024-04-17 05:25:56
标题: Web 3.0和量子安全:全球Web 3.0网络的远距离自由空间QSDC
摘要: 随着Web 3.0的到来,技术的快速发展面临着来自量子计算的即将到来的威胁。保护Web 2.0和Web 3.0完整性的安全协议越来越容易受到量子攻击和复杂的经典威胁的影响。本文介绍了我们的新颖的长距离自由空间量子安全直接通信(LF QSDC)作为一种方法,以防范量子和经典环境中的安全漏洞。 LF QSDC与量子密钥分发(QKD)等技术有所不同,通过促进无需密钥交换的加密数据传输,从而减弱了基于密钥系统的固有弱点。这一属性的独特性,结合其量子力学基础,可保护免受量子计算机攻击和先进的非量子危险,与Web 3.0时代不可信任的原则无缝融合。我们的研究重点是将LF QSDC技术设计和整合到Web 3.0网络基础设施中,凸显其在长距离通信中的有效性。LF QSDC基于内存DL04协议,并结合我们的新颖的量子感知低密度奇偶检验(LDPC),指向、获取和跟踪(PAT)技术以及大气量子校正算法(AQCA)进行增强。利用这种方法不仅增强了全球Web 3.0网络的安全性,还保证了它们在量子和复杂的经典威胁同时存在的时代的持久性。因此,LF QSDC成为一个强大的安全解决方案,非常适合在不断发展的数字环境中的Web 3.0系统中使用。
更新时间: 2024-04-17 05:25:56
领域: quant-ph,cs.CR
EEG_GLT-Net: Optimising EEG Graphs for Real-time Motor Imagery Signals Classification
Brain-Computer Interfaces connect the brain to external control devices, necessitating the accurate translation of brain signals such as from electroencephalography (EEG) into executable commands. Graph Neural Networks (GCN) have been increasingly applied for classifying EEG Motor Imagery signals, primarily because they incorporates the spatial relationships among EEG channels, resulting in improved accuracy over traditional convolutional methods. Recent advances by GCNs-Net in real-time EEG MI signal classification utilised Pearson Coefficient Correlation (PCC) for constructing adjacency matrices, yielding significant results on the PhysioNet dataset. Our paper introduces the EEG Graph Lottery Ticket (EEG_GLT) algorithm, an innovative technique for constructing adjacency matrices for EEG channels. It does not require pre-existing knowledge of inter-channel relationships, and it can be tailored to suit both individual subjects and GCN model architectures. Our findings demonstrated that the PCC method outperformed the Geodesic approach by 9.65% in mean accuracy, while our EEG_GLT matrix consistently exceeded the performance of the PCC method by a mean accuracy of 13.39%. Also, we found that the construction of the adjacency matrix significantly influenced accuracy, to a greater extent than GCN model configurations. A basic GCN configuration utilising our EEG_GLT matrix exceeded the performance of even the most complex GCN setup with a PCC matrix in average accuracy. Our EEG_GLT method also reduced MACs by up to 97% compared to the PCC method, while maintaining or enhancing accuracy. In conclusion, the EEG_GLT algorithm marks a breakthrough in the development of optimal adjacency matrices, effectively boosting both computational accuracy and efficiency, making it well-suited for real-time classification of EEG MI signals that demand intensive computational resources.
Updated: 2024-04-17 05:16:12
标题: EEG_GLT-Net:优化实时运动想象信号分类的脑电图网络
摘要: 脑机接口将大脑连接到外部控制设备,需要准确地将脑信号(如来自脑电图(EEG)的信号)转换为可执行命令。图神经网络(GCN)越来越多地被应用于分类EEG运动想象信号,主要是因为它们包含了EEG通道之间的空间关系,从而比传统的卷积方法具有更高的准确性。最近GCNs-Net在实时EEG MI信号分类方面取得了重要进展,利用Pearson相关系数(PCC)构建邻接矩阵,在PhysioNet数据集上取得了显著的结果。我们的论文介绍了EEG图彩票(EEG_GLT)算法,这是一种创新的技术,用于构建EEG通道的邻接矩阵。它不需要预先存在的通道间关系知识,可以根据个体受试者和GCN模型架构进行调整。我们的研究结果表明,PCC方法在平均准确度上比测地线方法表现优越了9.65%,而我们的EEG_GLT矩阵的平均准确度始终超过PCC方法13.39%。此外,我们发现邻接矩阵的构建显著影响准确度,比GCN模型配置的影响更大。基本的GCN配置利用我们的EEG_GLT矩阵在平均准确度上甚至超过了使用PCC矩阵的最复杂的GCN设置。与PCC方法相比,我们的EEG_GLT方法还可以将MACs减少高达97%,同时保持或增强准确度。总之,EEG_GLT算法标志着发展最佳邻接矩阵的突破,有效提高了计算准确性和效率,使其非常适合对需要密集计算资源的实时分类EEG MI信号。
更新时间: 2024-04-17 05:16:12
领域: cs.LG,cs.AI,eess.SP
Large Language Models Meet User Interfaces: The Case of Provisioning Feedback
Incorporating Generative AI (GenAI) and Large Language Models (LLMs) in education can enhance teaching efficiency and enrich student learning. Current LLM usage involves conversational user interfaces (CUIs) for tasks like generating materials or providing feedback. However, this presents challenges including the need for educator expertise in AI and CUIs, ethical concerns with high-stakes decisions, and privacy risks. CUIs also struggle with complex tasks. To address these, we propose transitioning from CUIs to user-friendly applications leveraging LLMs via API calls. We present a framework for ethically incorporating GenAI into educational tools and demonstrate its application in our tool, Feedback Copilot, which provides personalized feedback on student assignments. Our evaluation shows the effectiveness of this approach, with implications for GenAI researchers, educators, and technologists. This work charts a course for the future of GenAI in education.
Updated: 2024-04-17 05:05:05
标题: 大型语言模型与用户界面的结合:反馈提供的案例
摘要: 在教育中,将生成式人工智能(GenAI)和大型语言模型(LLMs)整合可以提高教学效率并丰富学生学习。目前LLM的使用涉及用于生成材料或提供反馈等任务的对话用户界面(CUIs)。然而,这带来了一些挑战,包括需要教育工作者具备人工智能和CUIs方面的专业知识,以及与高风险决策相关的道德关切和隐私风险。CUIs在处理复杂任务时也存在困难。为了解决这些问题,我们提出了从CUIs过渡到利用LLMs通过API调用的用户友好应用程序的建议。我们提出了一个框架,用于在教育工具中道德地整合GenAI,并展示了其在我们的工具Feedback Copilot中的应用,该工具为学生作业提供个性化反馈。我们的评估显示了这种方法的有效性,并对GenAI研究人员、教育工作者和技术人员产生了影响。这项工作为教育中GenAI的未来制定了发展方向。
更新时间: 2024-04-17 05:05:05
领域: cs.HC,cs.AI
ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours
AlphaFold2 has been hailed as a breakthrough in protein folding. It can rapidly predict protein structures with lab-grade accuracy. However, its implementation does not include the necessary training code. OpenFold is the first trainable public reimplementation of AlphaFold. AlphaFold training procedure is prohibitively time-consuming, and gets diminishing benefits from scaling to more compute resources. In this work, we conducted a comprehensive analysis on the AlphaFold training procedure based on Openfold, identified that inefficient communications and overhead-dominated computations were the key factors that prevented the AlphaFold training from effective scaling. We introduced ScaleFold, a systematic training method that incorporated optimizations specifically for these factors. ScaleFold successfully scaled the AlphaFold training to 2080 NVIDIA H100 GPUs with high resource utilization. In the MLPerf HPC v3.0 benchmark, ScaleFold finished the OpenFold benchmark in 7.51 minutes, shown over $6\times$ speedup than the baseline. For training the AlphaFold model from scratch, ScaleFold completed the pretraining in 10 hours, a significant improvement over the seven days required by the original AlphaFold pretraining baseline.
Updated: 2024-04-17 04:55:33
标题: ScaleFold:将AlphaFold初始训练时间缩短至10小时
摘要: AlphaFold2被誉为蛋白质折叠领域的突破。它可以快速预测具有实验室级精度的蛋白质结构。然而,其实现并未包含必要的训练代码。OpenFold是第一个可训练的AlphaFold的公共重新实现。AlphaFold的训练过程耗时严重,并且从更多计算资源的扩展中获益逐渐减少。在这项工作中,我们基于Openfold对AlphaFold的训练过程进行了全面分析,发现低效的通信和以开销为主导的计算是阻止AlphaFold训练有效扩展的关键因素。我们引入了ScaleFold,这是一种系统的训练方法,专门针对这些因素进行了优化。ScaleFold成功将AlphaFold的训练扩展到2080个NVIDIA H100 GPU,资源利用率很高。在MLPerf HPC v3.0基准测试中,ScaleFold在7.51分钟内完成了OpenFold基准测试,比基准线快了6倍。对于从头开始训练AlphaFold模型,ScaleFold在10小时内完成了预训练,比原始AlphaFold预训练基准线所需的7天有了显著改进。
更新时间: 2024-04-17 04:55:33
领域: cs.LG,cs.AI,cs.DC,q-bio.QM
The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement
Reinforcement learning (RL) for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims that raised eyebrows and drew critical media coverage. The paper withheld critical methodology steps and most inputs needed to reproduce results. Our meta-analysis shows how two separate evaluations filled in the gaps and demonstrated that Google RL lags behind (i) human designers, (ii) a well-known algorithm (Simulated Annealing), and (iii) generally-available commercial software, while being slower; and in a 2023 open research contest, RL methods weren't in top 5. Crosschecked data indicate that the integrity of the Nature paper is substantially undermined owing to errors in conduct, analysis and reporting. Before publishing, Google rebuffed internal allegations of fraud, which still stand. We note policy implications and conclusions for chip design.
Updated: 2024-04-17 04:55:19
标题: 虚假的黎明:重新评估Google的强化学习在芯片宏观布局中的应用
摘要: 2021年《自然》杂志上一篇有关谷歌在硅片物理设计方面的强化学习(RL)引发了争议,因为文中声称的内容缺乏充分的论证,引起了人们的质疑并受到了媒体的批评报道。该论文没有提供关键的方法步骤和大部分需要重现结果的输入。我们的元分析显示,两个独立的评估填补了这些空白,证明谷歌的RL技术落后于(i)人类设计师,(ii)一个众所周知的算法(模拟退火),以及(iii)普遍可用的商业软件,同时速度较慢;在2023年的一次开放性研究竞赛中,RL方法未能跻身前五名。交叉核查的数据表明,《自然》论文的完整性在很大程度上受到危害,原因是在行为、分析和报告方面存在错误。在发表前,谷歌否认了内部对欺诈行为的指控,这些指控仍然存在。我们注意到了芯片设计的政策影响和结论。
更新时间: 2024-04-17 04:55:19
领域: cs.LG,cs.AI,cs.AR,cs.CY
MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator
Offline reinforcement learning (RL) faces a significant challenge of distribution shift. Model-free offline RL penalizes the Q value for out-of-distribution (OOD) data or constrains the policy closed to the behavior policy to tackle this problem, but this inhibits the exploration of the OOD region. Model-based offline RL, which uses the trained environment model to generate more OOD data and performs conservative policy optimization within that model, has become an effective method for this problem. However, the current model-based algorithms rarely consider agent robustness when incorporating conservatism into policy. Therefore, the new model-based offline algorithm with a conservative Bellman operator (MICRO) is proposed. This method trades off performance and robustness via introducing the robust Bellman operator into the algorithm. Compared with previous model-based algorithms with robust adversarial models, MICRO can significantly reduce the computation cost by only choosing the minimal Q value in the state uncertainty set. Extensive experiments demonstrate that MICRO outperforms prior RL algorithms in offline RL benchmark and is considerably robust to adversarial perturbations.
Updated: 2024-04-17 04:54:39
标题: 微观:基于模型的保守Bellman算子的离线强化学习
摘要: 离线强化学习(RL)面临着一个重大的挑战,即分布转移。无模型离线RL对分布之外(OOD)数据的Q值进行惩罚,或者将策略限制在行为策略附近以解决这个问题,但这抑制了OOD区域的探索。采用训练好的环境模型生成更多OOD数据,并在该模型内执行保守策略优化,已成为解决这个问题的有效方法。然而,当前的基于模型的算法很少考虑在将保守性引入策略时的代理鲁棒性。因此,提出了一种新的基于模型的离线算法,名为保守贝尔曼算子(MICRO)。这种方法通过将鲁棒贝尔曼算子引入算法来权衡性能和鲁棒性。与以前基于模型的算法使用鲁棒对抗模型相比,MICRO可以通过仅选择状态不确定性集中的最小Q值来显著减少计算成本。大量实验证明,MICRO在离线RL基准测试中优于先前的RL算法,并且对对抗性扰动具有相当的鲁棒性。
更新时间: 2024-04-17 04:54:39
领域: cs.LG,cs.AI
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization
3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in various 3D applications, which require both shared and complementary information in localization and visual-language relationships. Therefore, existing approaches adopt the two-stage "detect-then-describe/discriminate" pipeline, which relies heavily on the performance of the detector, resulting in suboptimal performance. Inspired by DETR, we propose a unified framework, 3DGCTR, to jointly solve these two distinct but closely related tasks in an end-to-end fashion. The key idea is to reconsider the prompt-based localization ability of the 3DVG model. In this way, the 3DVG model with a well-designed prompt as input can assist the 3DDC task by extracting localization information from the prompt. In terms of implementation, we integrate a Lightweight Caption Head into the existing 3DVG network with a Caption Text Prompt as a connection, effectively harnessing the existing 3DVG model's inherent localization capacity, thereby boosting 3DDC capability. This integration facilitates simultaneous multi-task training on both tasks, mutually enhancing their performance. Extensive experimental results demonstrate the effectiveness of this approach. Specifically, on the ScanRefer dataset, 3DGCTR surpasses the state-of-the-art 3DDC method by 4.3% in CIDEr@0.5IoU in MLE training and improves upon the SOTA 3DVG method by 3.16% in Acc@0.25IoU.
Updated: 2024-04-17 04:46:27
标题: 重新思考3D密集字幕和视觉定位在一个统一框架中通过基于提示的定位
摘要: 3D视觉定位(3DVG)和3D密集字幕生成(3DDC)是各种3D应用中的两项关键任务,需要在定位和视觉-语言关系中同时利用共享和互补信息。因此,现有方法采用两阶段的“检测-描述/区分”流程,严重依赖检测器的性能,导致性能不佳。受DETR启发,我们提出了一个统一框架3DGCTR,以端到端的方式共同解决这两个不同但密切相关的任务。关键思想是重新考虑3DVG模型的基于提示的定位能力。通过这种方式,具有精心设计提示作为输入的3DVG模型可以通过从提示中提取定位信息来辅助3DDC任务。在实现方面,我们将轻量级字幕头部集成到现有的3DVG网络中,并使用字幕文本提示作为连接,有效利用现有3DVG模型固有的定位能力,从而提升3DDC能力。这种集成有助于同时对两个任务进行多任务训练,相互增强其性能。广泛的实验结果证明了这种方法的有效性。具体而言,在ScanRefer数据集上,3DGCTR在MLE训练中的CIDEr@0.5IoU方面超过了最先进的3DDC方法4.3%,并在Acc@0.25IoU方面提高了3.16%,超越了SOTA 3DVG方法。
更新时间: 2024-04-17 04:46:27
领域: cs.CV,cs.AI
LLM Agents can Autonomously Exploit One-day Vulnerabilities
LLMs have becoming increasingly powerful, both in their benign and malicious uses. With the increase in capabilities, researchers have been increasingly interested in their ability to exploit cybersecurity vulnerabilities. In particular, recent work has conducted preliminary studies on the ability of LLM agents to autonomously hack websites. However, these studies are limited to simple vulnerabilities. In this work, we show that LLM agents can autonomously exploit one-day vulnerabilities in real-world systems. To show this, we collected a dataset of 15 one-day vulnerabilities that include ones categorized as critical severity in the CVE description. When given the CVE description, GPT-4 is capable of exploiting 87% of these vulnerabilities compared to 0% for every other model we test (GPT-3.5, open-source LLMs) and open-source vulnerability scanners (ZAP and Metasploit). Fortunately, our GPT-4 agent requires the CVE description for high performance: without the description, GPT-4 can exploit only 7% of the vulnerabilities. Our findings raise questions around the widespread deployment of highly capable LLM agents.
Updated: 2024-04-17 04:34:39
标题: LLM代理可以自主利用一日漏洞
摘要: LLM(Large Language Models)在其良性和恶意使用方面变得越来越强大。随着功能的增强,研究人员对它们利用网络安全漏洞的能力越来越感兴趣。特别是,最近的研究对LLM代理自主黑客网站的能力进行了初步研究。然而,这些研究仅限于简单的漏洞。 在这项工作中,我们展示了LLM代理可以自主利用实际系统中的一日漏洞。为了证明这一点,我们收集了一个包含在CVE描述中被分类为严重程度的15个一日漏洞的数据集。当给定CVE描述时,相较于我们测试的其他模型(GPT-3.5、开源LLM)和开源漏洞扫描器(ZAP和Metasploit)的0%,GPT-4能够利用87%的这些漏洞。幸运的是,我们的GPT-4代理需要CVE描述才能高效运行:没有描述,GPT-4只能利用7%的漏洞。我们的研究结果引发了对高度能力的LLM代理广泛部署的疑问。
更新时间: 2024-04-17 04:34:39
领域: cs.CR,cs.AI
Can Large Language Models Infer Causation from Correlation?
Causal inference is one of the hallmarks of human intelligence. While the field of CausalNLP has attracted much interest in the recent years, existing causal inference datasets in NLP primarily rely on discovering causality from empirical knowledge (e.g., commonsense knowledge). In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (LLMs). Specifically, we formulate a novel task Corr2Cause, which takes a set of correlational statements and determines the causal relationship between the variables. We curate a large-scale dataset of more than 200K samples, on which we evaluate seventeen existing LLMs. Through our experiments, we identify a key shortcoming of LLMs in terms of their causal inference skills, and show that these models achieve almost close to random performance on the task. This shortcoming is somewhat mitigated when we try to re-purpose LLMs for this skill via finetuning, but we find that these models still fail to generalize -- they can only perform causal inference in in-distribution settings when variable names and textual expressions used in the queries are similar to those in the training set, but fail in out-of-distribution settings generated by perturbing these queries. Corr2Cause is a challenging task for LLMs, and would be helpful in guiding future research on improving LLMs' pure reasoning skills and generalizability. Our data is at https://huggingface.co/datasets/causalnlp/corr2cause. Our code is at https://github.com/causalNLP/corr2cause.
Updated: 2024-04-17 04:27:10
标题: 大型语言模型能否从相关性推断因果关系?
摘要: 因果推断是人类智能的标志之一。虽然因果NLP领域近年来引起了广泛关注,但现有的NLP因果推断数据集主要依赖于从经验知识(例如常识知识)中发现因果关系。在这项工作中,我们提出了第一个基准数据集,以测试大型语言模型(LLMs)的纯因果推断能力。具体来说,我们制定了一个新颖的任务Corr2Cause,该任务接受一组相关语句并确定变量之间的因果关系。我们精心策划了一个超过20万个样本的大规模数据集,对其中的十七个现有LLMs进行了评估。通过我们的实验,我们发现LLMs在因果推断能力方面存在一个关键缺陷,并且表明这些模型在任务中几乎接近随机性能。当我们尝试通过微调重新利用LLMs进行这种技能时,这种缺陷在一定程度上得到缓解,但我们发现这些模型仍然无法泛化——只能在变量名称和查询中使用的文本表达与训练集中的相似时才能进行因果推断,在通过扰动这些查询生成的分布之外的设置中失败。Corr2Cause对LLMs来说是一个具有挑战性的任务,可以帮助指导未来改进LLMs纯推理技能和泛化能力的研究。我们的数据位于https://huggingface.co/datasets/causalnlp/corr2cause。我们的代码位于https://github.com/causalNLP/corr2cause。
更新时间: 2024-04-17 04:27:10
领域: cs.CL,cs.AI,cs.LG
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text. We propose modeling factual queries as constraint satisfaction problems and use this framework to investigate how the LLM interacts internally with factual constraints. We find a strong positive relationship between the LLM's attention to constraint tokens and the factual accuracy of generations. We curate a suite of 10 datasets containing over 40,000 prompts to study the task of predicting factual errors with the Llama-2 family across all scales (7B, 13B, 70B). We propose SAT Probe, a method probing attention patterns, that can predict factual errors and fine-grained constraint satisfaction, and allow early error identification. The approach and findings take another step towards using the mechanistic understanding of LLMs to enhance their reliability.
Updated: 2024-04-17 04:25:21
标题: 关注满足:对语言模型事实错误的约束满足视角
摘要: 我们研究了基于Transformer的大型语言模型(LLMs)在生成事实错误文本时的内部行为。我们提出将事实查询建模为约束满足问题,并利用这一框架来研究LLM如何在内部与事实约束进行交互。我们发现LLM对约束标记的关注与生成的事实准确性之间存在强烈的正向关系。我们策划了一套包含超过40,000个提示的10个数据集,以研究使用Llama-2家族在所有规模(7B、13B、70B)上进行预测事实错误的任务。我们提出了SAT Probe,一种探究注意模式的方法,可以预测事实错误和精细约束满足,并允许早期错误识别。这一方法和发现进一步推动了利用LLMs的机械理解来提高它们的可靠性。
更新时间: 2024-04-17 04:25:21
领域: cs.CL,cs.AI,cs.LG
Use of a Structured Knowledge Base Enhances Metadata Curation by Large Language Models
Metadata play a crucial role in ensuring the findability, accessibility, interoperability, and reusability of datasets. This paper investigates the potential of large language models (LLMs), specifically GPT-4, to improve adherence to metadata standards. We conducted experiments on 200 random data records describing human samples relating to lung cancer from the NCBI BioSample repository, evaluating GPT-4's ability to suggest edits for adherence to metadata standards. We computed the adherence accuracy of field name-field value pairs through a peer review process, and we observed a marginal average improvement in adherence to the standard data dictionary from 79% to 80% (p<0.01). We then prompted GPT-4 with domain information in the form of the textual descriptions of CEDAR templates and recorded a significant improvement to 97% from 79% (p<0.01). These results indicate that, while LLMs may not be able to correct legacy metadata to ensure satisfactory adherence to standards when unaided, they do show promise for use in automated metadata curation when integrated with a structured knowledge base.
Updated: 2024-04-17 04:17:12
标题: 使用结构化知识库增强大型语言模型的元数据整理
摘要: 元数据在确保数据集的可发现性、可访问性、互操作性和可重复使用性方面起着至关重要的作用。本文研究了大型语言模型(LLMs),特别是GPT-4,提高遵循元数据标准的潜力。我们在描述与肺癌相关的人类样本的200个随机数据记录上进行了实验,这些数据记录来自NCBI BioSample存储库,评估了GPT-4提出的遵守元数据标准的编辑建议的能力。我们通过同行评审过程计算了字段名称-字段值对的遵守准确性,并观察到遵守标准数据字典的平均改进程度从79%提高到80%(p<0.01)。然后,我们以CEDAR模板的文本描述形式提示GPT-4领域信息,并记录了从79%到97%的显着改进(p<0.01)。这些结果表明,虽然LLMs在无辅助情况下可能无法纠正遗留的元数据以确保满足标准,但它们显示出在与结构化知识库集成时用于自动化元数据编辑的潜力。
更新时间: 2024-04-17 04:17:12
领域: cs.AI,cs.CL,cs.IR
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach
Differentially Private Stochastic Gradient Descent with Gradient Clipping (DPSGD-GC) is a powerful tool for training deep learning models using sensitive data, providing both a solid theoretical privacy guarantee and high efficiency. However, using DPSGD-GC to ensure Differential Privacy (DP) comes at the cost of model performance degradation due to DP noise injection and gradient clipping. Existing research has extensively analyzed the theoretical convergence of DPSGD-GC, and has shown that it only converges when using large clipping thresholds that are dependent on problem-specific parameters. Unfortunately, these parameters are often unknown in practice, making it hard to choose the optimal clipping threshold. Therefore, in practice, DPSGD-GC suffers from degraded performance due to the {\it constant} bias introduced by the clipping. In our work, we propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC, which not only offers a diminishing utility bound without inducing a constant clipping bias, but more importantly, it allows for an arbitrary choice of clipping threshold that is independent of the problem. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R{\'e}nyi DP. Additionally, we demonstrate that under mild conditions, our algorithm can achieve nearly the same utility bound as DPSGD without gradient clipping. Our empirical results on Cifar-10/100 and E2E datasets, show that the proposed algorithm achieves higher accuracies than DPSGD while maintaining the same level of DP guarantee.
Updated: 2024-04-17 04:16:15
标题: 无偏剪辑的差分隐私随机梯度下降:一种误差反馈方法
摘要: 差分隐私随机梯度下降与梯度剪切(DPSGD-GC)是使用敏感数据训练深度学习模型的强大工具,提供了坚实的理论隐私保障和高效性。然而,使用DPSGD-GC来确保差分隐私(DP)会导致模型性能下降,因为DP噪声注入和梯度剪切。现有研究已经广泛分析了DPSGD-GC的理论收敛性,并表明只有在使用依赖于问题特定参数的大剪切阈值时才会收敛。不幸的是,这些参数在实践中通常是未知的,因此很难选择最佳的剪切阈值。因此,在实践中,DPSGD-GC由于剪切引入的{\it 常数}偏差而导致性能下降。 在我们的工作中,我们提出了一种新的错误反馈(EF)差分隐私算法作为DPSGD-GC的替代方案,不仅提供一个逐渐减少的效用界限,而且没有引入常数剪切偏差,更重要的是,它允许选择一个与问题无关的任意剪切阈值。我们为我们提出的算法建立了一种特定于算法的差分隐私分析,基于R{\'e}nyi差分隐私提供隐私保障。此外,我们证明在温和条件下,我们的算法可以实现几乎与无梯度剪切的DPSGD相同的效用界限。我们在Cifar-10/100和E2E数据集上的实证结果表明,所提出的算法在保持相同DP保证水平的同时实现了比DPSGD更高的准确性。
更新时间: 2024-04-17 04:16:15
领域: cs.LG,cs.CR
LMEraser: Large Model Unlearning through Adaptive Prompt Tuning
To address the growing demand for privacy protection in machine learning, we propose a novel and efficient machine unlearning approach for \textbf{L}arge \textbf{M}odels, called \textbf{LM}Eraser. Existing unlearning research suffers from entangled training data and complex model architectures, incurring extremely high computational costs for large models. LMEraser takes a divide-and-conquer strategy with a prompt tuning architecture to isolate data influence. The training dataset is partitioned into public and private datasets. Public data are used to train the backbone of the model. Private data are adaptively clustered based on their diversity, and each cluster is used to optimize a prompt separately. This adaptive prompt tuning mechanism reduces unlearning costs and maintains model performance. Experiments demonstrate that LMEraser achieves a $100$-fold reduction in unlearning costs without compromising accuracy compared to prior work. Our code is available at: \url{https://github.com/lmeraser/lmeraser}.
Updated: 2024-04-17 04:08:38
标题: LMEraser: 通过自适应提示调整进行大型模型遗忘
摘要: 为了应对机器学习中对隐私保护日益增长的需求,我们提出了一种针对大型模型的新颖高效的机器遗忘方法,称为LMEraser。现有的遗忘研究存在训练数据混乱和复杂模型架构的问题,导致大型模型的计算成本极高。LMEraser采用分而治之的策略,配合一个快速调整的架构来隔离数据影响。训练数据集被划分为公共数据集和私有数据集。公共数据用于训练模型的骨干部分。私有数据根据它们的多样性自适应地进行聚类,每个簇单独用于优化一个提示。这种自适应提示调整机制降低了遗忘成本并保持了模型性能。实验表明,与之前的工作相比,LMEraser实现了遗忘成本的100倍降低,同时保持了准确性。我们的代码可在以下链接获取:\url{https://github.com/lmeraser/lmeraser}。
更新时间: 2024-04-17 04:08:38
领域: cs.LG,cs.AI,cs.CR
Policy Learning with Competing Agents
Decision makers often aim to learn a treatment assignment policy under a capacity constraint on the number of agents that they can treat. When agents can respond strategically to such policies, competition arises, complicating estimation of the optimal policy. In this paper, we study capacity-constrained treatment assignment in the presence of such interference. We consider a dynamic model where the decision maker allocates treatments at each time step and heterogeneous agents myopically best respond to the previous treatment assignment policy. When the number of agents is large but finite, we show that the threshold for receiving treatment under a given policy converges to the policy's mean-field equilibrium threshold. Based on this result, we develop a consistent estimator for the policy gradient. In a semi-synthetic experiment with data from the National Education Longitudinal Study of 1988, we demonstrate that this estimator can be used for learning capacity-constrained policies in the presence of strategic behavior.
Updated: 2024-04-17 04:06:03
标题: 与竞争对手代理人一起进行政策学习
摘要: 决策者通常希望在能够治疗的代理人数量受限的情况下学习治疗分配政策。当代理人可以对这些政策做出战略性反应时,竞争会产生,使得最优政策的估计变得复杂。在本文中,我们研究了在存在干扰的情况下的容量受限治疗分配。我们考虑一个动态模型,决策者在每个时间步骤分配治疗,并且异质代理人根据以往的治疗分配政策做出最佳反应。当代理人数量很大但有限时,我们表明在给定政策下接受治疗的阈值会收敛到政策的均场平衡阈值。基于这一结果,我们开发了一种一致估计器用于政策梯度。在一个半合成实验中,使用来自1988年全国教育纵向研究的数据,我们展示了这个估计器可以用于学习在存在战略行为的情况下受容量限制的政策。
更新时间: 2024-04-17 04:06:03
领域: stat.ML,cs.LG,econ.EM
The Use of Binary Choice Forests to Model and Estimate Discrete Choices
Problem definition. In retailing, discrete choice models (DCMs) are commonly used to capture the choice behavior of customers when offered an assortment of products. When estimating DCMs using transaction data, flexible models (such as machine learning models or nonparametric models) are typically not interpretable and hard to estimate, while tractable models (such as the multinomial logit model) tend to misspecify the complex behavior represeted in the data. Methodology/results. In this study, we use a forest of binary decision trees to represent DCMs. This approach is based on random forests, a popular machine learning algorithm. The resulting model is interpretable: the decision trees can explain the decision-making process of customers during the purchase. We show that our approach can predict the choice probability of any DCM consistently and thus never suffers from misspecification. Moreover, our algorithm predicts assortments unseen in the training data. The mechanism and errors can be theoretically analyzed. We also prove that the random forest can recover preference rankings of customers thanks to the splitting criterion such as the Gini index and information gain ratio. Managerial implications. The framework has unique practical advantages. It can capture customers' behavioral patterns such as irrationality or sequential searches when purchasing a product. It handles nonstandard formats of training data that result from aggregation. It can measure product importance based on how frequently a random customer would make decisions depending on the presence of the product. It can also incorporate price information and customer features. Our numerical experiments using synthetic and real data show that using random forests to estimate customer choices can outperform existing methods.
Updated: 2024-04-17 04:02:41
标题: 使用二元选择树模拟和估计离散选择的文献
摘要: 问题定义。在零售业中,离散选择模型(DCMs)通常用于捕捉顾客在提供一系列产品时的选择行为。在使用交易数据估计DCMs时,灵活的模型(如机器学习模型或非参数模型)通常难以解释和估计,而可操作的模型(如多项式逻辑模型)往往会错误地描述数据中表示的复杂行为。方法学/结果。在这项研究中,我们使用一组二进制决策树来表示DCMs。这种方法基于随机森林,这是一种流行的机器学习算法。结果模型是可解释的:决策树可以解释顾客在购买过程中的决策过程。我们展示了我们的方法可以一致地预测任何DCM的选择概率,因此从不受规范错误的影响。此外,我们的算法可以预测在训练数据中看不到的组合。机制和错误可以在理论上进行分析。我们还证明了随机森林可以通过诸如基尼指数和信息增益比这样的分割标准恢复顾客的偏好排名。管理意义。该框架具有独特的实际优势。它可以捕捉顾客的行为模式,如购买产品时的非理性或顺序搜索。它处理由聚合产生的非标准格式的训练数据。它可以根据随机顾客根据产品的存在频率做出决策来衡量产品的重要性。它还可以结合价格信息和顾客特征。我们使用合成和真实数据进行的数值实验表明,使用随机森林来估计顾客选择可以胜过现有方法。
更新时间: 2024-04-17 04:02:41
领域: cs.LG,econ.EM,stat.ML
Supervised Contrastive Vision Transformer for Breast Histopathological Image Classification
Invasive ductal carcinoma (IDC) is the most prevalent form of breast cancer. Breast tissue histopathological examination is critical in diagnosing and classifying breast cancer. Although existing methods have shown promising results, there is still room for improvement in the classification accuracy and generalization of IDC using histopathology images. We present a novel approach, Supervised Contrastive Vision Transformer (SupCon-ViT), for improving the classification of invasive ductal carcinoma in terms of accuracy and generalization by leveraging the inherent strengths and advantages of both transfer learning, i.e., pre-trained vision transformer, and supervised contrastive learning. Our results on a benchmark breast cancer dataset demonstrate that SupCon-Vit achieves state-of-the-art performance in IDC classification, with an F1-score of 0.8188, precision of 0.7692, and specificity of 0.8971, outperforming existing methods. In addition, the proposed model demonstrates resilience in scenarios with minimal labeled data, making it highly efficient in real-world clinical settings where labelled data is limited. Our findings suggest that supervised contrastive learning in conjunction with pre-trained vision transformers appears to be a viable strategy for an accurate classification of IDC, thus paving the way for a more efficient and reliable diagnosis of breast cancer through histopathological image analysis.
Updated: 2024-04-17 03:51:55
标题: 监督对比视觉Transformer用于乳腺组织病理图像分类
摘要: 浸润性导管癌(IDC)是乳腺癌中最常见的形式。乳腺组织的组织病理学检查对于诊断和分类乳腺癌至关重要。尽管现有方法已经显示出有希望的结果,但在使用组织病理学图像对IDC进行分类的准确性和泛化方面仍有改进空间。我们提出了一种新方法,即监督对比视觉变换器(SupCon-ViT),通过利用迁移学习(即预训练视觉变换器)和监督对比学习的固有优势,来改善浸润性导管癌的分类准确性和泛化能力。我们在一个基准乳腺癌数据集上的结果表明,SupCon-Vit在IDC分类方面实现了最先进的性能,F1分数为0.8188,精确度为0.7692,特异性为0.8971,优于现有方法。此外,所提出的模型在标记数据较少的情况下表现出韧性,使其在标记数据有限的真实世界临床环境中高效。我们的研究结果表明,监督对比学习结合预训练视觉变换器似乎是一种可行的策略,用于准确分类IDC,从而为通过组织病理学图像分析更有效和可靠地诊断乳腺癌铺平道路。
更新时间: 2024-04-17 03:51:55
领域: cs.CV,cs.LG
Stepwise Alignment for Constrained Language Model Policy Optimization
Safety and trustworthiness are indispensable requirements for applying AI systems based on large language models (LLMs) in real-world applications. This paper formulates a human value alignment as a language model policy optimization problem to maximize reward under a safety constraint and then proposes an algorithm called Stepwise Alignment for Constrained Policy Optimization (SACPO). A key idea behind SACPO, supported by theory, is that the optimal policy incorporating both reward and safety can be directly obtained from a reward-aligned policy. Based on this key idea, SACPO aligns the LLMs with each metric step-wise while leveraging simple yet powerful alignment algorithms such as direct preference optimization (DPO). SACPO provides many benefits such as simplicity, stability, computational efficiency, and flexibility regarding algorithms and dataset selection. Under mild assumption, our theoretical analysis provides the upper bounds regarding near-optimality and safety constraint violation. Our experimental results show that SACPO can fine-tune Alpaca-7B better than the state-of-the-art method in terms of both helpfulness and harmlessness
Updated: 2024-04-17 03:44:58
标题: 逐步对齐用于受限语言模型策略优化
摘要: 安全性和可信度是在实际应用中基于大型语言模型(LLMs)的人工智能系统中不可或缺的要求。本文将人类价值对准问题形式化为一种语言模型策略优化问题,以在安全约束下最大化奖励,然后提出了一种名为Stepwise Alignment for Constrained Policy Optimization(SACPO)的算法。 SACPO背后的一个关键想法是,根据理论支持,将奖励和安全性结合的最优策略可以直接从与奖励对齐的策略中获得。基于这一关键思想,SACPO通过逐步对齐LLMs,并利用简单而强大的对齐算法(如直接偏好优化(DPO)),提供了许多好处,如简单性、稳定性、计算效率和算法及数据集选择的灵活性。在温和假设下,我们的理论分析提供了关于近乎最优性和安全约束违反的上界。我们的实验结果表明,SACPO可以比最先进的方法更好地微调Alpaca-7B,无论是在帮助性还是无害性方面。
更新时间: 2024-04-17 03:44:58
领域: cs.LG,cs.AI,cs.CL
Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model
Federated learning aims to tackle the ``isolated data island" problem, where it trains a collective model from physically isolated clients while safeguarding the privacy of users' data. However, supervised federated learning necessitates that each client labels their data for training, which can be both time-consuming and resource-intensive, and may even be impractical for edge devices. Moreover, the training and transmission of deep models present challenges to the computation and communication capabilities of the clients. To address these two inherent challenges in supervised federated learning, we propose a novel lightweight unsupervised federated learning approach that leverages unlabeled data on each client to perform lightweight model training and communication by harnessing pretrained vision-language models, such as CLIP. By capitalizing on the zero-shot prediction capability and the well-trained image encoder of the pre-trained CLIP model, we have carefully crafted an efficient and resilient self-training approach. This method refines the initial zero-shot predicted pseudo-labels of unlabeled instances through the sole training of a linear classifier on top of the fixed image encoder. Additionally, to address data heterogeneity within each client, we propose a class-balanced text feature sampling strategy for generating synthetic instances in the feature space to support local training. Experiments are conducted on multiple benchmark datasets. The experimental results demonstrate that our proposed method greatly enhances model performance in comparison to CLIP's zero-shot predictions and even outperforms supervised federated learning benchmark methods given limited computational and communication overhead.
Updated: 2024-04-17 03:42:48
标题: 轻量级无监督预训练视觉语言模型的联邦学习
摘要: 联邦学习旨在解决“孤立数据岛”问题,即在物理上隔离的客户端上训练集体模型,同时保护用户数据的隐私。然而,监督式联邦学习要求每个客户端为训练标记其数据,这既可能耗时又可能消耗资源,甚至对边缘设备来说可能不切实际。此外,深度模型的训练和传输对客户端的计算和通信能力提出挑战。为了解决监督式联邦学习中这两个固有挑战,我们提出了一种新颖的轻量级无监督联邦学习方法,利用每个客户端的未标记数据通过利用预训练的视觉语言模型(如CLIP)进行轻量级模型训练和通信。通过充分利用预训练CLIP模型的零样本预测能力和训练良好的图像编码器,我们精心设计了一种高效且具有弹性的自训练方法。该方法通过仅在固定图像编码器之上训练线性分类器,来完善未标记实例的初始零样本预测伪标签。此外,为了解决每个客户端内部的数据异质性,我们提出了一种基于类平衡的文本特征采样策略,用于在特征空间中生成合成实例以支持本地训练。我们在多个基准数据集上进行了实验。实验结果表明,与CLIP的零样本预测相比,我们提出的方法极大地提升了模型性能,甚至在计算和通信开销有限的情况下,超越了监督式联邦学习基准方法。
更新时间: 2024-04-17 03:42:48
领域: cs.AI,cs.CV,cs.LG
On the Empirical Complexity of Reasoning and Planning in LLMs
Large Language Models (LLMs) work surprisingly well for some complex reasoning problems via chain-of-thought (CoT) or tree-of-thought (ToT), but the underlying reasons remain unclear. We seek to understand the performance of these methods by conducting experimental case studies and linking the outcomes to sample and computational complexity in machine learning. We found that if problems can be decomposed into a sequence of reasoning steps and learning to predict the next step has a low sample and computational complexity, explicitly outlining the reasoning chain with all necessary information for predicting the next step may improve performance. Conversely, for problems where predicting the next step is computationally hard, adopting ToT may yield better reasoning outcomes than attempting to formulate a short reasoning chain.
Updated: 2024-04-17 03:34:27
标题: 关于LLMs中推理和规划的经验复杂性
摘要: 大型语言模型(LLMs)通过思维链(CoT)或思维树(ToT)出人意料地适用于一些复杂的推理问题,但其潜在原因仍不清楚。我们通过进行实验案例研究并将结果与机器学习中的样本和计算复杂性联系起来,试图理解这些方法的表现。我们发现,如果问题可以分解为一系列推理步骤,并且学习预测下一步的样本和计算复杂性较低,明确列出推理链并提供预测下一步所需的所有必要信息可能会提高性能。相反,对于预测下一步计算难度较大的问题,采用思维树可能会产生比尝试构建短推理链更好的推理结果。
更新时间: 2024-04-17 03:34:27
领域: cs.AI,cs.LG
Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement
Content moderation faces a challenging task as social media's ability to spread hate speech contrasts with its role in promoting global connectivity. With rapidly evolving slang and hate speech, the adaptability of conventional deep learning to the fluid landscape of online dialogue remains limited. In response, causality inspired disentanglement has shown promise by segregating platform specific peculiarities from universal hate indicators. However, its dependency on available ground truth target labels for discerning these nuances faces practical hurdles with the incessant evolution of platforms and the mutable nature of hate speech. Using confidence based reweighting and contrastive regularization, this study presents HATE WATCH, a novel framework of weakly supervised causal disentanglement that circumvents the need for explicit target labeling and effectively disentangles input features into invariant representations of hate. Empirical validation across platforms two with target labels and two without positions HATE WATCH as a novel method in cross platform hate speech detection with superior performance. HATE WATCH advances scalable content moderation techniques towards developing safer online communities.
Updated: 2024-04-17 03:25:54
标题: 跨平台仇恨言论检测与弱监督因果分解
摘要: 内容审查面临着艰巨的任务,因为社交媒体传播仇恨言论的能力与其促进全球连接的作用形成对比。随着俚语和仇恨言论的快速演变,传统深度学习对在线对话流动性的适应性仍然有限。作为回应,因果启发式解缠法表现出了将平台特定特征与通用仇恨指标分隔开的潜力。然而,由于平台不断演化和仇恨言论的可变性,其依赖于可用的地面真实目标标签来辨别这些微妙之处面临实际障碍。本研究利用基于置信度的重新加权和对比正则化,提出了HATE WATCH,一种弱监督因果解缠框架,它避免了对明确目标标签的需求,并有效地将输入特征解缠成仇恨的不变表现。通过跨两个具有目标标签和两个不具有目标标签的平台进行的经验验证将HATE WATCH定位为跨平台仇恨言论检测中具有卓越性能的新方法。HATE WATCH推动了可扩展的内容审查技术,促进了更安全的在线社区的发展。
更新时间: 2024-04-17 03:25:54
领域: cs.LG,cs.CL
LaVy: Vietnamese Multimodal Large Language Model
Large Language Models (LLMs) and Multimodal Large language models (MLLMs) have taken the world by storm with impressive abilities in complex reasoning and linguistic comprehension. Meanwhile there are plethora of works related to Vietnamese Large Language Models, the lack of high-quality resources in multimodality limits the progress of Vietnamese MLLMs. In this paper, we pioneer in address this by introducing LaVy, a state-of-the-art Vietnamese MLLM, and we also introduce LaVy-Bench benchmark designated for evaluating MLLMs's understanding on Vietnamese visual language tasks. Our project is public at https://github.com/baochi0212/LaVy
Updated: 2024-04-17 03:23:33
标题: LaVy: 越南多模大型语言模型
摘要: 大型语言模型(LLMs)和多模态大型语言模型(MLLMs)以其在复杂推理和语言理解方面令人印象深刻的能力风靡全球。与此同时,有大量与越南大型语言模型相关的作品,但多模态资源的缺乏限制了越南MLLMs的进展。在本文中,我们通过引入LaVy,一款最先进的越南MLLM,以及我们还介绍了LaVy-Bench基准测试,用于评估MLLMs对越南视觉语言任务的理解能力。我们的项目在https://github.com/baochi0212/LaVy 上公开。
更新时间: 2024-04-17 03:23:33
领域: cs.CL,cs.CV,cs.LG
CORE: Data Augmentation for Link Prediction via Information Bottleneck
Link prediction (LP) is a fundamental task in graph representation learning, with numerous applications in diverse domains. However, the generalizability of LP models is often compromised due to the presence of noisy or spurious information in graphs and the inherent incompleteness of graph data. To address these challenges, we draw inspiration from the Information Bottleneck principle and propose a novel data augmentation method, COmplete and REduce (CORE) to learn compact and predictive augmentations for LP models. In particular, CORE aims to recover missing edges in graphs while simultaneously removing noise from the graph structures, thereby enhancing the model's robustness and performance. Extensive experiments on multiple benchmark datasets demonstrate the applicability and superiority of CORE over state-of-the-art methods, showcasing its potential as a leading approach for robust LP in graph representation learning.
Updated: 2024-04-17 03:20:42
标题: 核心:通过信息瓶颈进行链接预测的数据增强
摘要: 链路预测(LP)是图表示学习中的一个基本任务,在各种领域中有许多应用。然而,LP模型的泛化能力经常受到图中存在的嘈杂或虚假信息以及图数据固有的不完整性的影响。为了解决这些挑战,我们从信息瓶颈原理中汲取灵感,提出了一种新的数据增强方法,称为COmplete and REduce (CORE),用于为LP模型学习紧凑且具有预测能力的增强。具体来说,CORE旨在在图中恢复缺失的边,同时从图结构中去除噪音,从而增强模型的鲁棒性和性能。对多个基准数据集进行的大量实验表明,CORE相对于最先进的方法具有适用性和优越性,展示了它作为图表示学习中鲁棒LP的领先方法的潜力。
更新时间: 2024-04-17 03:20:42
领域: cs.LG,cs.SI
GBSD: Generative Bokeh with Stage Diffusion
The bokeh effect is an artistic technique that blurs out-of-focus areas in a photograph and has gained interest due to recent developments in text-to-image synthesis and the ubiquity of smart-phone cameras and photo-sharing apps. Prior work on rendering bokeh effects have focused on post hoc image manipulation to produce similar blurring effects in existing photographs using classical computer graphics or neural rendering techniques, but have either depth discontinuity artifacts or are restricted to reproducing bokeh effects that are present in the training data. More recent diffusion based models can synthesize images with an artistic style, but either require the generation of high-dimensional masks, expensive fine-tuning, or affect global image characteristics. In this paper, we present GBSD, the first generative text-to-image model that synthesizes photorealistic images with a bokeh style. Motivated by how image synthesis occurs progressively in diffusion models, our approach combines latent diffusion models with a 2-stage conditioning algorithm to render bokeh effects on semantically defined objects. Since we can focus the effect on objects, this semantic bokeh effect is more versatile than classical rendering techniques. We evaluate GBSD both quantitatively and qualitatively and demonstrate its ability to be applied in both text-to-image and image-to-image settings.
Updated: 2024-04-17 03:14:21
标题: GBSD:具有舞台扩散的生成虚化
摘要: 这个文献摘要主要介绍了bokeh效果,这是一种艺术技术,可以模糊照片中的焦点区域。近年来,随着文本到图像合成和智能手机相机以及照片分享应用程序的普及,bokeh效果引起了人们的兴趣。先前关于渲染bokeh效果的工作主要集中在事后图像处理,使用经典的计算机图形学或神经渲染技术来产生类似的模糊效果,但存在深度不连续的伪影或者仅限于重现训练数据中存在的bokeh效果。最近的扩散模型能够合成具有艺术风格的图像,但要么需要生成高维度掩模,要么需要昂贵的微调,或者会影响全局图像特征。在本文中,我们提出了GBSD,这是第一个生成的文本到图像模型,可以合成具有bokeh风格的具有照片逼真感的图像。受到扩散模型中图像合成逐步发生的启发,我们的方法将潜在扩散模型与两阶段条件算法结合起来,在语义上定义的对象上渲染bokeh效果。由于我们可以将效果集中在对象上,这种语义bokeh效果比传统的渲染技术更加灵活。我们定量和定性评估了GBSD,并展示了它在文本到图像和图像到图像设置中的应用能力。
更新时间: 2024-04-17 03:14:21
领域: cs.CV,cs.AI
Empowering Large Language Models on Robotic Manipulation with Affordance Prompting
While large language models (LLMs) are successful in completing various language processing tasks, they easily fail to interact with the physical world by generating control sequences properly. We find that the main reason is that LLMs are not grounded in the physical world. Existing LLM-based approaches circumvent this problem by relying on additional pre-defined skills or pre-trained sub-policies, making it hard to adapt to new tasks. In contrast, we aim to address this problem and explore the possibility to prompt pre-trained LLMs to accomplish a series of robotic manipulation tasks in a training-free paradigm. Accordingly, we propose a framework called LLM+A(ffordance) where the LLM serves as both the sub-task planner (that generates high-level plans) and the motion controller (that generates low-level control sequences). To ground these plans and control sequences on the physical world, we develop the affordance prompting technique that stimulates the LLM to 1) predict the consequences of generated plans and 2) generate affordance values for relevant objects. Empirically, we evaluate the effectiveness of LLM+A in various language-conditioned robotic manipulation tasks, which show that our approach substantially improves performance by enhancing the feasibility of generated plans and control and can easily generalize to different environments.
Updated: 2024-04-17 03:06:32
标题: 用机械操作的可承受提示强化大型语言模型
摘要: 尽管大型语言模型(LLMs)在完成各种语言处理任务方面很成功,但它们很容易在生成控制序列时与物理世界进行交互失败。我们发现主要原因是LLMs没有基于物理世界。现有的基于LLM的方法通过依赖额外预定义的技能或预先训练的子策略来规避这个问题,使其难以适应新任务。相比之下,我们旨在解决这个问题,并探索提示预训练的LLMs在无需训练的范式下完成一系列机器人操作任务的可能性。因此,我们提出了一个名为LLM+A(affordance)的框架,其中LLM既充当子任务规划器(生成高层计划)又充当运动控制器(生成低层控制序列)。为了将这些计划和控制序列与物理世界联系起来,我们开发了一种促发技术,刺激LLM来1)预测生成计划的后果和2)为相关对象生成可负担性值。在经验上,我们评估了LLM+A在各种语言条件下的机器人操作任务中的有效性,结果显示我们的方法通过增强生成计划和控制的可行性显著提高了性能,并且可以轻松推广到不同的环境中。
更新时间: 2024-04-17 03:06:32
领域: cs.AI
ESFL: Efficient Split Federated Learning over Resource-Constrained Heterogeneous Wireless Devices
Federated learning (FL) allows multiple parties (distributed devices) to train a machine learning model without sharing raw data. How to effectively and efficiently utilize the resources on devices and the central server is a highly interesting yet challenging problem. In this paper, we propose an efficient split federated learning algorithm (ESFL) to take full advantage of the powerful computing capabilities at a central server under a split federated learning framework with heterogeneous end devices (EDs). By splitting the model into different submodels between the server and EDs, our approach jointly optimizes user-side workload and server-side computing resource allocation by considering users' heterogeneity. We formulate the whole optimization problem as a mixed-integer non-linear program, which is an NP-hard problem, and develop an iterative approach to obtain an approximate solution efficiently. Extensive simulations have been conducted to validate the significantly increased efficiency of our ESFL approach compared with standard federated learning, split learning, and splitfed learning.
Updated: 2024-04-17 02:59:30
标题: ESFL:资源受限异构无线设备上的高效分割式联邦学习
摘要: 联合学习(FL)允许多个方(分布式设备)在不共享原始数据的情况下训练机器学习模型。如何有效且高效地利用设备和中央服务器的资源是一个非常有趣但具有挑战性的问题。本文提出了一种高效的分割联合学习算法(ESFL),以充分利用在具有异构端设备(EDs)的分割联合学习框架下中央服务器的强大计算能力。通过在服务器和EDs之间将模型拆分为不同的子模型,我们的方法通过考虑用户的异质性来联合优化用户端工作负载和服务器端计算资源分配。我们将整个优化问题制定为一个混合整数非线性规划问题,这是一个NP困难问题,并开发了一种迭代方法以有效地获得近似解。进行了大量模拟实验,验证了我们的ESFL方法相对于标准联合学习、分割学习和分割联合学习的显著提高效率。
更新时间: 2024-04-17 02:59:30
领域: cs.LG,cs.AI,cs.NI
Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions
Building socially-intelligent AI agents (Social-AI) is a multidisciplinary, multimodal research goal that involves creating agents that can sense, perceive, reason about, learn from, and respond to affect, behavior, and cognition of other agents (human or artificial). Progress towards Social-AI has accelerated in the past decade across several computing communities, including natural language processing, machine learning, robotics, human-machine interaction, computer vision, and speech. Natural language processing, in particular, has been prominent in Social-AI research, as language plays a key role in constructing the social world. In this position paper, we identify a set of underlying technical challenges and open questions for researchers across computing communities to advance Social-AI. We anchor our discussion in the context of social intelligence concepts and prior progress in Social-AI research.
Updated: 2024-04-17 02:57:42
标题: 推进AI智能代理的社交智能:技术挑战和开放问题
摘要: 构建具有社交智能的人工智能代理(Social-AI)是一个跨学科、多模态的研究目标,涉及创建能够感知、理解、推理、学习并回应其他代理(人类或人工)的情感、行为和认知的代理。过去十年来,社交智能人工智能的进展在几个计算领域加速发展,包括自然语言处理、机器学习、机器人技术、人机交互、计算机视觉和语音。其中,自然语言处理在社交智能人工智能研究中占据重要地位,因为语言在构建社会世界中起着关键作用。在这篇立场论文中,我们确定了一系列潜在的技术挑战和开放性问题,供计算领域的研究人员推进社交智能人工智能。我们将讨论框架放在社交智能概念和先前社交智能人工智能研究的进展中。
更新时间: 2024-04-17 02:57:42
领域: cs.HC,cs.CL,cs.LG
You do not have to train Graph Neural Networks at all on text-attributed graphs
Graph structured data, specifically text-attributed graphs (TAG), effectively represent relationships among varied entities. Such graphs are essential for semi-supervised node classification tasks. Graph Neural Networks (GNNs) have emerged as a powerful tool for handling this graph-structured data. Although gradient descent is commonly utilized for training GNNs for node classification, this study ventures into alternative methods, eliminating the iterative optimization processes. We introduce TrainlessGNN, a linear GNN model capitalizing on the observation that text encodings from the same class often cluster together in a linear subspace. This model constructs a weight matrix to represent each class's node attribute subspace, offering an efficient approach to semi-supervised node classification on TAG. Extensive experiments reveal that our trainless models can either match or even surpass their conventionally trained counterparts, demonstrating the possibility of refraining from gradient descent in certain configurations.
Updated: 2024-04-17 02:52:11
标题: 您根本不需要在文本属性图上对图神经网络进行训练
摘要: 图结构化数据,特别是文本属性图(TAG),有效地表示各种实体之间的关系。这样的图对于半监督节点分类任务至关重要。图神经网络(GNNs)已经成为处理这种图结构化数据的强大工具。虽然梯度下降通常用于训练GNNs进行节点分类,但本研究尝试了替代方法,消除了迭代优化过程。我们引入了TrainlessGNN,一个利用观察到同一类别的文本编码通常在线性子空间中聚集在一起的线性GNN模型。该模型构建了一个权重矩阵来表示每个类别的节点属性子空间,提供了一种有效的方法来进行TAG上的半监督节点分类。大量实验表明,我们的无训练模型可以与甚至超越其常规训练的对应模型相匹配,展示了在某些配置中避免梯度下降的可能性。
更新时间: 2024-04-17 02:52:11
领域: cs.LG
Distributed Random Reshuffling Methods with Improved Convergence
This paper proposes two distributed random reshuffling methods, namely Gradient Tracking with Random Reshuffling (GT-RR) and Exact Diffusion with Random Reshuffling (ED-RR), to solve the distributed optimization problem over a connected network, where a set of agents aim to minimize the average of their local cost functions. Both algorithms invoke random reshuffling (RR) update for each agent, inherit favorable characteristics of RR for minimizing smooth nonconvex objective functions, and improve the performance of previous distributed random reshuffling methods both theoretically and empirically. Specifically, both GT-RR and ED-RR achieve the convergence rate of $O(1/[(1-\lambda)^{1/3}m^{1/3}T^{2/3}])$ in driving the (minimum) expected squared norm of the gradient to zero, where $T$ denotes the number of epochs, $m$ is the sample size for each agent, and $1-\lambda$ represents the spectral gap of the mixing matrix. When the objective functions further satisfy the Polyak-{\L}ojasiewicz (PL) condition, we show GT-RR and ED-RR both achieve $O(1/[(1-\lambda)mT^2])$ convergence rate in terms of the averaged expected differences between the agents' function values and the global minimum value. Notably, both results are comparable to the convergence rates of centralized RR methods (up to constant factors depending on the network topology) and outperform those of previous distributed random reshuffling algorithms. Moreover, we support the theoretical findings with a set of numerical experiments.
Updated: 2024-04-17 02:51:49
标题: 改进收敛性的分布式随机重排方法
摘要: 本文提出了两种分布式随机重排方法,分别是带有随机重排的梯度跟踪(GT-RR)和带有随机重排的精确扩散(ED-RR),用于解决连接网络上的分布式优化问题。在这种问题中,一组代理人的目标是最小化其本地成本函数的平均值。这两种算法都为每个代理人调用随机重排(RR)更新,继承了RR在最小化平滑非凸目标函数方面的有利特性,并在理论和实践中提高了先前分布式随机重排方法的性能。具体而言,GT-RR和ED-RR在将(最小)期望梯度的平方范数驱动到零时实现了$O(1/[(1-\lambda)^{1/3}m^{1/3}T^{2/3}])$的收敛速度,其中$T$表示迭代次数,$m$是每个代理人的样本大小,$1-\lambda$表示混合矩阵的谱间隙。当目标函数进一步满足Polyak-{\L}ojasiewicz(PL)条件时,我们展示了GT-RR和ED-RR在代理人函数值与全局最小值之间的平均预期差异方面都实现了$O(1/[(1-\lambda)mT^2])$的收敛速度。值得注意的是,这两个结果与集中式RR方法的收敛速度相媲美(取决于网络拓扑的常数因子)并且优于先前的分布式随机重排算法。此外,我们通过一系列数值实验支持了理论发现。
更新时间: 2024-04-17 02:51:49
领域: math.OC,cs.LG,cs.MA
Many-Shot In-Context Learning
Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, without any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated examples. To mitigate this limitation, we explore two new settings: Reinforced and Unsupervised ICL. Reinforced ICL uses model-generated chain-of-thought rationales in place of human examples. Unsupervised ICL removes rationales from the prompt altogether, and prompts the model only with domain-specific questions. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases and can learn high-dimensional functions with numerical inputs. Our analysis also reveals the limitations of next-token prediction loss as an indicator of downstream ICL performance.
Updated: 2024-04-17 02:49:26
标题: 多次拍摄背景学习
摘要: 大型语言模型(LLMs)在少样本上下文学习(ICL)方面表现出色 - 在推理过程中从上下文中提供的少量示例中学习,而无需进行任何权重更新。新扩展的上下文窗口使我们能够研究具有数百或数千个示例的ICL - 多样本制度。从少样本到多样本,我们观察到在各种生成和判别任务中显著的性能增益。尽管有希望,多样本ICL可能会受限于人类生成示例的数量。为了缓解这一限制,我们探索了两种新设置:强化和无监督ICL。强化ICL使用模型生成的思维链理由来替代人类示例。无监督ICL从提示中完全删除了理由,并仅向模型提供领域特定问题。我们发现,强化和无监督ICL在多样本制度中可能非常有效,特别是对于复杂的推理任务。最后,我们证明,与少样本学习不同,多样本学习能够有效地覆盖预训练偏见,并且可以学习具有数值输入的高维函数。我们的分析还揭示了下一个令牌预测损失作为下游ICL性能指标的局限性。
更新时间: 2024-04-17 02:49:26
领域: cs.LG,cs.AI,cs.CL
MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training
In this research, we introduce MaeFuse, a novel autoencoder model designed for infrared and visible image fusion (IVIF). The existing approaches for image fusion often rely on training combined with downstream tasks to obtain high-level visual information, which is effective in emphasizing target objects and delivering impressive results in visual quality and task-specific applications. MaeFuse, however, deviates from the norm. Instead of being driven by downstream tasks, our model utilizes a pretrained encoder from Masked Autoencoders (MAE), which facilities the omni features extraction for low-level reconstruction and high-level vision tasks, to obtain perception friendly features with a low cost. In order to eliminate the domain gap of different modal features and the block effect caused by the MAE encoder, we further develop a guided training strategy. This strategy is meticulously crafted to ensure that the fusion layer seamlessly adjusts to the feature space of the encoder, gradually enhancing the fusion effect. It facilitates the comprehensive integration of feature vectors from both infrared and visible modalities, preserving the rich details inherent in each. MaeFuse not only introduces a novel perspective in the realm of fusion techniques but also stands out with impressive performance across various public datasets.
Updated: 2024-04-17 02:47:39
标题: MaeFuse:利用预训练的掩蔽自动编码器将全方位特征转移,通过引导训练实现红外和可见图像融合
摘要: 在这项研究中,我们引入了一种新颖的自编码器模型MaeFuse,专为红外和可见图像融合(IVIF)而设计。现有的图像融合方法通常依赖于训练以及下游任务,以获得高级视觉信息,这在强调目标对象并在视觉质量和特定任务应用方面取得令人印象深刻的结果。然而,MaeFuse与众不同。我们的模型不是受下游任务驱动,而是利用来自Masked Autoencoders(MAE)的预训练编码器,有助于低级重建和高级视觉任务的omni特征提取,以低成本获得友好的感知特征。为了消除不同模态特征的域差异和MAE编码器引起的块效应,我们进一步开发了一种引导训练策略。这种策略经过精心设计,确保融合层无缝调整到编码器的特征空间,逐渐增强融合效果。它促进了来自红外和可见模态的特征向量的全面整合,保留了每种模态固有的丰富细节。MaeFuse不仅在融合技术领域引入了一种新颖的视角,而且在各种公共数据集上表现出色。
更新时间: 2024-04-17 02:47:39
领域: cs.CV,cs.AI
FedFa: A Fully Asynchronous Training Paradigm for Federated Learning
Federated learning has been identified as an efficient decentralized training paradigm for scaling the machine learning model training on a large number of devices while guaranteeing the data privacy of the trainers. FedAvg has become a foundational parameter update strategy for federated learning, which has been promising to eliminate the effect of the heterogeneous data across clients and guarantee convergence. However, the synchronization parameter update barriers for each communication round during the training significant time on waiting, slowing down the training procedure. Therefore, recent state-of-the-art solutions propose using semi-asynchronous approaches to mitigate the waiting time cost with guaranteed convergence. Nevertheless, emerging semi-asynchronous approaches are unable to eliminate the waiting time completely. We propose a full asynchronous training paradigm, called FedFa, which can guarantee model convergence and eliminate the waiting time completely for federated learning by using a few buffered results on the server for parameter updating. Further, we provide theoretical proof of the convergence rate for our proposed FedFa. Extensive experimental results indicate our approach effectively improves the training performance of federated learning by up to 6x and 4x speedup compared to the state-of-the-art synchronous and semi-asynchronous strategies while retaining high accuracy in both IID and Non-IID scenarios.
Updated: 2024-04-17 02:46:59
标题: FedFa:用于联邦学习的完全异步训练范式
摘要: 联邦学习被认为是一种高效的分散式训练范式,可在大量设备上扩展机器学习模型的训练,同时保证训练者的数据隐私。FedAvg已成为联邦学习的基础参数更新策略,有望消除客户端之间异构数据的影响,并确保收敛。然而,在训练过程中,每轮通信时的同步参数更新障碍需要大量等待时间,降低了训练速度。因此,最近的最新解决方案提出使用半异步方法来减少等待时间成本,并保证收敛。然而,新兴的半异步方法无法完全消除等待时间。 我们提出了一种名为FedFa的全异步训练范式,通过在服务器上使用少量缓存结果进行参数更新,可以保证模型的收敛性,并完全消除等待时间,用于联邦学习。此外,我们为我们提出的FedFa的收敛速率提供了理论证明。大量实验结果表明,与最新的同步和半异步策略相比,我们的方法有效地提高了联邦学习的训练性能,速度提高了6倍和4倍,并在IID和Non-IID场景下保持高准确性。
更新时间: 2024-04-17 02:46:59
领域: cs.LG,cs.AI,cs.DC
Towards Multi-agent Reinforcement Learning based Traffic Signal Control through Spatio-temporal Hypergraphs
Traffic signal control systems (TSCSs) are integral to intelligent traffic management, fostering efficient vehicle flow. Traditional approaches often simplify road networks into standard graphs, which results in a failure to consider the dynamic nature of traffic data at neighboring intersections, thereby neglecting higher-order interconnections necessary for real-time control. To address this, we propose a novel TSCS framework to realize intelligent traffic control. This framework collaborates with multiple neighboring edge computing servers to collect traffic information across the road network. To elevate the efficiency of traffic signal control, we have crafted a multi-agent soft actor-critic (MA-SAC) reinforcement learning algorithm. Within this algorithm, individual agents are deployed at each intersection with a mandate to optimize traffic flow across the entire road network collectively. Furthermore, we introduce hypergraph learning into the critic network of MA-SAC to enable the spatio-temporal interactions from multiple intersections in the road network. This method fuses hypergraph and spatio-temporal graph structures to encode traffic data and capture the complex spatial and temporal correlations between multiple intersections. Our empirical evaluation, tested on varied datasets, demonstrates the superiority of our framework in minimizing average vehicle travel times and sustaining high-throughput performance. This work facilitates the development of more intelligent and reactive urban traffic management solutions.
Updated: 2024-04-17 02:46:18
标题: 朝着基于时空超图的多智能体强化学习交通信号控制方向
摘要: 交通信号控制系统(TSCSs)是智能交通管理中不可或缺的部分,促进了有效的车辆流动。传统方法通常将道路网络简化为标准图,导致未能考虑相邻交叉口的交通数据的动态性,从而忽视了实时控制所必需的更高级别的互连。为了解决这个问题,我们提出了一个新颖的TSCS框架,以实现智能交通控制。该框架与多个相邻边缘计算服务器合作,收集整个道路网络的交通信息。为了提高交通信号控制的效率,我们设计了一个多智能体软演员-评论家(MA-SAC)强化学习算法。在这个算法中,单个代理被部署在每个交叉口,旨在共同优化整个道路网络的交通流量。此外,我们在MA-SAC的评论家网络中引入了超图学习,以实现道路网络中多个交叉口的时空交互。这种方法融合了超图和时空图结构,对交通数据进行编码,并捕捉多个交叉口之间的复杂空间和时间相关性。我们在各种数据集上进行的实证评估表明,我们的框架在减少平均车辆行驶时间和维持高吞吐性能方面具有卓越性能。这项工作促进了更智能和反应迅速的城市交通管理解决方案的发展。
更新时间: 2024-04-17 02:46:18
领域: cs.MA,cs.AI
Control Theoretic Approach to Fine-Tuning and Transfer Learning
Given a training set in the form of a paired $(\mathcal{X},\mathcal{Y})$, we say that the control system $\dot{x} = f(x,u)$ has learned the paired set via the control $u^*$ if the system steers each point of $\mathcal{X}$ to its corresponding target in $\mathcal{Y}$. Most existing methods for finding a control function $u^*$ require learning of a new control function if the training set is updated. To overcome this limitation, we introduce the concept of $\textit{tuning without forgetting}$. We develop $\textit{an iterative algorithm}$ to tune the control function $u^*$ when the training set expands, whereby points already in the paired set are still matched, and new training samples are learned. More specifically, at each update of our method, the control $u^*$ is projected onto the kernel of the end-point mapping generated by the controlled dynamics at the learned samples. It ensures keeping the end points for the previously learned samples constant while iteratively learning additional samples. Our work contributes to the scalability of control methods, offering a novel approach to adaptively handle training set expansions.
Updated: 2024-04-17 02:44:25
标题: 控制理论方法用于微调和迁移学习
摘要: 给定一个以配对形式$(\mathcal{X},\mathcal{Y})$表示的训练集,我们说控制系统$\dot{x} = f(x,u)$通过控制$u^*$已经学会了这个配对集,如果系统将$\mathcal{X}$中的每个点引导到其对应的目标$\mathcal{Y}$中。大多数现有的寻找控制函数$u^*$的方法需要在训练集更新时学习一个新的控制函数。为了克服这一局限性,我们引入了“忘却无遗忘”的概念。当训练集扩展时,我们开发了一种迭代算法来调整控制函数$u^*$,使已经在配对集中的点仍然匹配,并学习新的训练样本。更具体地,在我们方法的每次更新中,控制$u^*$被投影到由受控动态在学习样本处生成的终点映射的核上。这确保在迭代学习额外样本时保持先前学习样本的终点不变。我们的工作有助于控制方法的可扩展性,提供了一种新颖的方法来自适应地处理训练集的扩展。
更新时间: 2024-04-17 02:44:25
领域: cs.LG,math.OC
AKGNet: Attribute Knowledge-Guided Unsupervised Lung-Infected Area Segmentation
Lung-infected area segmentation is crucial for assessing the severity of lung diseases. However, existing image-text multi-modal methods typically rely on labour-intensive annotations for model training, posing challenges regarding time and expertise. To address this issue, we propose a novel attribute knowledge-guided framework for unsupervised lung-infected area segmentation (AKGNet), which achieves segmentation solely based on image-text data without any mask annotation. AKGNet facilitates text attribute knowledge learning, attribute-image cross-attention fusion, and high-confidence-based pseudo-label exploration simultaneously. It can learn statistical information and capture spatial correlations between image and text attributes in the embedding space, iteratively refining the mask to enhance segmentation. Specifically, we introduce a text attribute knowledge learning module by extracting attribute knowledge and incorporating it into feature representations, enabling the model to learn statistical information and adapt to different attributes. Moreover, we devise an attribute-image cross-attention module by calculating the correlation between attributes and images in the embedding space to capture spatial dependency information, thus selectively focusing on relevant regions while filtering irrelevant areas. Finally, a self-training mask improvement process is employed by generating pseudo-labels using high-confidence predictions to iteratively enhance the mask and segmentation. Experimental results on a benchmark medical image dataset demonstrate the superior performance of our method compared to state-of-the-art segmentation techniques in unsupervised scenarios.
Updated: 2024-04-17 02:36:02
标题: AKGNet: 属性知识引导的无监督肺部感染区域分割
摘要: 肺部感染区域分割对于评估肺部疾病的严重程度至关重要。然而,现有的图像-文本多模态方法通常依赖于繁重的注释来进行模型训练,这对时间和专业知识提出了挑战。为了解决这个问题,我们提出了一种新颖的属性知识引导框架用于无监督肺部感染区域分割(AKGNet),该框架仅基于图像-文本数据实现分割,无需任何蒙版注释。AKGNet促进了文本属性知识学习,属性-图像交叉注意力融合以及基于高置信度的伪标签探索。它可以在嵌入空间中学习统计信息并捕捉图像和文本属性之间的空间相关性,通过迭代优化蒙版来增强分割。具体而言,我们引入了一个文本属性知识学习模块,通过提取属性知识并将其融入特征表示中,使模型能够学习统计信息并适应不同属性。此外,我们设计了一个属性-图像交叉注意力模块,通过在嵌入空间中计算属性和图像之间的相关性来捕捉空间依赖信息,从而有选择地关注相关区域并过滤无关区域。最后,通过使用高置信度预测生成伪标签来进行自训练蒙版改进过程,从而迭代地增强蒙版和分割。在基准医学图像数据集上的实验结果显示,与无监督情况下的最新分割技术相比,我们的方法表现出更优越的性能。
更新时间: 2024-04-17 02:36:02
领域: cs.CV,cs.AI
ToDA: Target-oriented Diffusion Attacker against Recommendation System
Recommendation systems (RS) have become indispensable tools for web services to address information overload, thus enhancing user experiences and bolstering platforms' revenues. However, with their increasing ubiquity, security concerns have also emerged. As the public accessibility of RS, they are susceptible to specific malicious attacks where adversaries can manipulate user profiles, leading to biased recommendations. Recent research often integrates additional modules using generative models to craft these deceptive user profiles, ensuring them are imperceptible while causing the intended harm. Albeit their efficacy, these models face challenges of unstable training and the exploration-exploitation dilemma, which can lead to suboptimal results. In this paper, we pioneer to investigate the potential of diffusion models (DMs), for shilling attacks. Specifically, we propose a novel Target-oriented Diffusion Attack model (ToDA). It incorporates a pre-trained autoencoder that transforms user profiles into a high dimensional space, paired with a Latent Diffusion Attacker (LDA)-the core component of ToDA. LDA introduces noise into the profiles within this latent space, adeptly steering the approximation towards targeted items through cross-attention mechanisms. The global horizon, implemented by a bipartite graph, is involved in LDA and derived from the encoded user profile feature. This makes LDA possible to extend the generation outwards the on-processing user feature itself, and bridges the gap between diffused user features and target item features. Extensive experiments compared to several SOTA baselines demonstrate ToDA's effectiveness. Specific studies exploit the elaborative design of ToDA and underscore the potency of advanced generative models in such contexts.
Updated: 2024-04-17 02:34:40
标题: ToDA:针对推荐系统的目标导向扩散攻击者
摘要: 推荐系统(RS)已经成为网络服务不可或缺的工具,用于应对信息过载,从而增强用户体验和增加平台收入。然而,随着它们日益普及,安全性问题也开始出现。作为RS的公共可访问性,它们容易受到特定恶意攻击的影响,攻击者可以操纵用户资料,导致推荐结果偏向。最近的研究经常使用生成模型集成额外模块来制作这些欺骗性用户资料,确保它们在造成预期危害的同时不易察觉。尽管它们有效,这些模型面临着训练不稳定和探索利用困境的挑战,这可能导致亚最优结果。在本文中,我们首次探究了扩散模型(DMs)在短刀攻击中的潜力。具体来说,我们提出了一种新颖的面向目标的扩散攻击模型(ToDA)。它包含一个预训练的自动编码器,将用户资料转换为高维空间,并与潜在扩散攻击者(LDA)- ToDA的核心组件相配合。LDA在这个潜在空间中向资料引入噪声,通过交叉注意机制巧妙地将近似值引导到目标项目。全局视角由一个二分图实现,参与LDA并从编码用户资料特征中衍生。这使得LDA能够将生成扩展到处理中的用户特征本身,并弥合扩散用户特征和目标项目特征之间的差距。与几个SOTA基线相比的大量实验证明了ToDA的有效性。具体研究利用了ToDA的精心设计,并强调了在这种情境下先进生成模型的潜力。
更新时间: 2024-04-17 02:34:40
领域: cs.CR
Shaping Realities: Enhancing 3D Generative AI with Fabrication Constraints
Generative AI tools are becoming more prevalent in 3D modeling, enabling users to manipulate or create new models with text or images as inputs. This makes it easier for users to rapidly customize and iterate on their 3D designs and explore new creative ideas. These methods focus on the aesthetic quality of the 3D models, refining them to look similar to the prompts provided by the user. However, when creating 3D models intended for fabrication, designers need to trade-off the aesthetic qualities of a 3D model with their intended physical properties. To be functional post-fabrication, 3D models have to satisfy structural constraints informed by physical principles. Currently, such requirements are not enforced by generative AI tools. This leads to the development of aesthetically appealing, but potentially non-functional 3D geometry, that would be hard to fabricate and use in the real world. This workshop paper highlights the limitations of generative AI tools in translating digital creations into the physical world and proposes new augmentations to generative AI tools for creating physically viable 3D models. We advocate for the development of tools that manipulate or generate 3D models by considering not only the aesthetic appearance but also using physical properties as constraints. This exploration seeks to bridge the gap between digital creativity and real-world applicability, extending the creative potential of generative AI into the tangible domain.
Updated: 2024-04-17 02:33:32
标题: 塑造现实:通过制造约束增强3D生成人工智能
摘要: 生成式人工智能工具在3D建模中变得更加普遍,使用户能够通过文本或图像输入来操纵或创建新模型。这使用户能够更容易地快速定制和迭代他们的3D设计,并探索新的创意思路。这些方法专注于3D模型的美学质量,将其精炼以使其看起来类似于用户提供的提示。然而,当设计师创建用于制造的3D模型时,他们需要权衡3D模型的美学特性与其预期的物理特性。为了在制造后保持功能性,3D模型必须满足由物理原理提供的结构约束。目前,这些要求并未被生成式人工智能工具强制执行。这导致开发出在审美上吸引人但可能非功能性的3D几何图形,这些图形在现实世界中很难制造和使用。本文强调了生成式人工智能工具在将数字创作转化为物理世界中的局限性,并提出了对生成式人工智能工具进行新增强以创建具有物理可行性的3D模型。我们倡导开发考虑美学外观以及使用物理特性作为约束条件来操纵或生成3D模型的工具。这种探索旨在弥合数字创造力和现实世界适用性之间的差距,将生成式人工智能的创造潜力延伸到有形领域。
更新时间: 2024-04-17 02:33:32
领域: cs.HC,cs.AI
LLMBind: A Unified Modality-Task Integration Framework
In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress. To address this challenge, we introduce \textbf{LLMBind}, a novel framework designed to unify a diverse array of multi-modal tasks. By harnessing a Mixture-of-Experts (MoE) Large Language Model (LLM), LLMBind processes multi-modal inputs and generates task-specific tokens, enabling the invocation of corresponding models to accomplish tasks. This unique approach empowers LLMBind to interpret inputs and generate outputs across various modalities, including image, text, video, and audio. Furthermore, we have constructed an interaction dataset comprising 400k instructions, which unlocks the ability of LLMBind for interactive visual generation and editing tasks. Extensive experimentation demonstrates that LLMBind achieves very superior performance across diverse tasks and outperforms existing models in user evaluations conducted in real-world scenarios. Moreover, the adaptability of LLMBind allows for seamless integration with the latest models and extension to new modality tasks, highlighting its potential to serve as a unified AI agent for modeling universal modalities.
Updated: 2024-04-17 02:23:29
标题: LLMBind:一种统一的模态-任务集成框架
摘要: 在多模态领域,各种模型对特定输入格式的依赖导致用户混乱并阻碍进展。为了解决这一挑战,我们引入了LLMBind,这是一个旨在统一多样化多模态任务的新型框架。通过利用Mixture-of-Experts (MoE) Large Language Model (LLM),LLMBind处理多模态输入并生成特定任务的标记,从而实现调用相应模型来完成任务。这种独特的方法赋予了LLMBind解释多种模式输入和生成输出的能力,包括图像、文本、视频和音频。此外,我们构建了一个包含40万条指令的交互数据集,这使LLMBind能够进行交互式视觉生成和编辑任务。广泛的实验表明,LLMBind在各种任务中表现非常优异,并在真实场景中的用户评估中胜过现有模型。此外,LLMBind的适应性使其能够与最新模型无缝集成,并扩展到新的模态任务,突显了其作为建模通用模态的统一人工智能代理的潜力。
更新时间: 2024-04-17 02:23:29
领域: cs.CL,cs.AI
Online Algorithms with Limited Data Retention
We introduce a model of online algorithms subject to strict constraints on data retention. An online learning algorithm encounters a stream of data points, one per round, generated by some stationary process. Crucially, each data point can request that it be removed from memory $m$ rounds after it arrives. To model the impact of removal, we do not allow the algorithm to store any information or calculations between rounds other than a subset of the data points (subject to the retention constraints). At the conclusion of the stream, the algorithm answers a statistical query about the full dataset. We ask: what level of performance can be guaranteed as a function of $m$? We illustrate this framework for multidimensional mean estimation and linear regression problems. We show it is possible to obtain an exponential improvement over a baseline algorithm that retains all data as long as possible. Specifically, we show that $m = \textsc{Poly}(d, \log(1/\epsilon))$ retention suffices to achieve mean squared error $\epsilon$ after observing $O(1/\epsilon)$ $d$-dimensional data points. This matches the error bound of the optimal, yet infeasible, algorithm that retains all data forever. We also show a nearly matching lower bound on the retention required to guarantee error $\epsilon$. One implication of our results is that data retention laws are insufficient to guarantee the right to be forgotten even in a non-adversarial world in which firms merely strive to (approximately) optimize the performance of their algorithms. Our approach makes use of recent developments in the multidimensional random subset sum problem to simulate the progression of stochastic gradient descent under a model of adversarial noise, which may be of independent interest.
Updated: 2024-04-17 02:17:23
标题: 具有有限数据保留的在线算法
摘要: 我们介绍了一个在线算法模型,受到严格的数据保留限制。一个在线学习算法遇到一系列数据点,每一轮一个,由某个固定过程生成。关键是,每个数据点可以要求在到达后$m$轮后从内存中删除。为了模拟删除的影响,我们不允许算法在轮之间存储任何信息或计算,除了数据点的一个子集(受保留约束)。在流的结束时,算法回答关于整个数据集的统计查询。我们问:作为$m$的函数,可以保证什么水平的性能? 我们用多维均值估计和线性回归问题说明了这个框架。我们展示了相对于保留所有数据尽可能长的基准算法,可以获得指数级的改进。具体地,我们展示了$m = \textsc{Poly}(d, \log(1/\epsilon))$的保留足以在观察到$O(1/\epsilon)$个$d$维数据点后实现均方误差$\epsilon$。这与保留所有数据直到永远的最优但不可行算法的误差界相匹配。我们还展示了保留所需的误差$\epsilon$的近乎匹配下界。我们结果的一个含义是,即使在企业仅努力(近似)优化其算法性能的非对抗世界中,数据保留法令也无法保证被遗忘的权利。 我们的方法利用了多维随机子集和问题的最新发展,用来模拟在对抗性噪声模型下随机梯度下降的进展,这可能是独立感兴趣的。
更新时间: 2024-04-17 02:17:23
领域: cs.LG,cs.DS
Clipped SGD Algorithms for Privacy Preserving Performative Prediction: Bias Amplification and Remedies
Clipped stochastic gradient descent (SGD) algorithms are among the most popular algorithms for privacy preserving optimization that reduces the leakage of users' identity in model training. This paper studies the convergence properties of these algorithms in a performative prediction setting, where the data distribution may shift due to the deployed prediction model. For example, the latter is caused by strategical users during the training of loan policy for banks. Our contributions are two-fold. First, we show that the straightforward implementation of a projected clipped SGD (PCSGD) algorithm may converge to a biased solution compared to the performative stable solution. We quantify the lower and upper bound for the magnitude of the bias and demonstrate a bias amplification phenomenon where the bias grows with the sensitivity of the data distribution. Second, we suggest two remedies to the bias amplification effect. The first one utilizes an optimal step size design for PCSGD that takes the privacy guarantee into account. The second one uses the recently proposed DiceSGD algorithm [Zhang et al., 2024]. We show that the latter can successfully remove the bias and converge to the performative stable solution. Numerical experiments verify our analysis.
Updated: 2024-04-17 02:17:05
标题: 修剪的随机梯度下降算法用于隐私保护性能预测:偏见放大和解决方法
摘要: 修剪的随机梯度下降(SGD)算法是最受欢迎的隐私保护优化算法之一,可以减少模型训练中用户身份泄露。本文研究了这些算法在预测性预测设置中的收敛性质,其中数据分布可能会因部署的预测模型而发生变化。例如,在银行贷款政策培训期间由战略用户造成的后者。我们的贡献有两个方面。首先,我们展示了投影修剪的SGD(PCSGD)算法的直接实现可能会收敛到一个有偏差的解决方案,与表现稳定的解决方案相比。我们量化了偏差的大小下限和上限,并展示了一个偏差放大现象,其中偏差随数据分布的敏感性增长而增加。其次,我们提出了两种消除偏差放大效应的方法。第一种利用了考虑隐私保证的PCSGD的最优步长设计。第二种使用了最近提出的DiceSGD算法[Zhang等,2024年]。我们展示了后者可以成功消除偏差,并收敛到表现稳定的解决方案。数值实验验证了我们的分析。
更新时间: 2024-04-17 02:17:05
领域: math.OC,cs.CR,cs.LG
Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science
In the domain of data science, the predictive tasks of classification, regression, and imputation of missing values are commonly encountered challenges associated with tabular data. This research endeavors to apply Large Language Models (LLMs) towards addressing these predictive tasks. Despite their proficiency in comprehending natural language, LLMs fall short in dealing with structured tabular data. This limitation stems from their lacking exposure to the intricacies of tabular data during their foundational training. Our research aims to mitigate this gap by compiling a comprehensive corpus of tables annotated with instructions and executing large-scale training of Llama-2 on this enriched dataset. Furthermore, we investigate the practical application of applying the trained model to zero-shot prediction, few-shot prediction, and in-context learning scenarios. Through extensive experiments, our methodology has shown significant improvements over existing benchmarks. These advancements highlight the efficacy of tailoring LLM training to solve table-related problems in data science, thereby establishing a new benchmark in the utilization of LLMs for enhancing tabular intelligence.
Updated: 2024-04-17 02:11:56
标题: 释放大型语言模型在数据科学中用于预测性表格任务的潜力
摘要: 在数据科学领域,分类、回归和缺失值填补等预测任务是与表格数据相关的常见挑战。本研究旨在应用大型语言模型(LLMs)来解决这些预测任务。尽管LLMs在理解自然语言方面表现出色,但在处理结构化表格数据方面存在不足。这种限制源于它们在基础训练过程中未接触到表格数据的复杂性。我们的研究旨在通过编制一份包含指令注释的广泛语料库,并在此丰富数据集上进行大规模训练,来弥补这一差距。此外,我们还研究了将训练模型应用于零样本预测、少样本预测和上下文学习场景的实际应用。通过广泛的实验,我们的方法论已显示出相对现有基准的显著改进。这些进展突显了将LLM训练定制化以解决数据科学中与表格相关的问题的有效性,从而建立了LLMs用于增强表格智能的新基准。
更新时间: 2024-04-17 02:11:56
领域: cs.LG,cs.AI
Function Approximation for Reinforcement Learning Controller for Energy from Spread Waves
The industrial multi-generator Wave Energy Converters (WEC) must handle multiple simultaneous waves coming from different directions called spread waves. These complex devices in challenging circumstances need controllers with multiple objectives of energy capture efficiency, reduction of structural stress to limit maintenance, and proactive protection against high waves. The Multi-Agent Reinforcement Learning (MARL) controller trained with the Proximal Policy Optimization (PPO) algorithm can handle these complexities. In this paper, we explore different function approximations for the policy and critic networks in modeling the sequential nature of the system dynamics and find that they are key to better performance. We investigated the performance of a fully connected neural network (FCN), LSTM, and Transformer model variants with varying depths and gated residual connections. Our results show that the transformer model of moderate depth with gated residual connections around the multi-head attention, multi-layer perceptron, and the transformer block (STrXL) proposed in this paper is optimal and boosts energy efficiency by an average of 22.1% for these complex spread waves over the existing spring damper (SD) controller. Furthermore, unlike the default SD controller, the transformer controller almost eliminated the mechanical stress from the rotational yaw motion for angled waves. Demo: https://tinyurl.com/yueda3jh
Updated: 2024-04-17 02:04:10
标题: 能源波传动中强化学习控制器的函数逼近
摘要: 工业多发电机波能转换器(WEC)必须处理来自不同方向的多个同时波浪,称为传播波。这些复杂设备在具有挑战性的环境中需要具有多个目标的控制器,包括能源捕获效率、减少结构应力以限制维护,并主动保护免受高波浪的影响。使用Proximal Policy Optimization(PPO)算法训练的多智能体强化学习(MARL)控制器可以处理这些复杂性。在本文中,我们探讨了用于模拟系统动态顺序性的策略和评论网络的不同函数逼近,并发现它们对于性能更好至关重要。我们研究了具有不同深度和门控残差连接的全连接神经网络(FCN)、LSTM和Transformer模型变体的性能。我们的结果表明,本文提出的具有适度深度、在多头注意力、多层感知器和Transformer块(STrXL)周围具有门控残差连接的Transformer模型是最佳的,并且对于这些复杂的传播波浪,其平均提高了22.1%的能源效率,超过了现有的弹簧阻尼器(SD)控制器。此外,与默认SD控制器不同,Transformer控制器几乎消除了对于倾斜波浪的旋转偏航运动造成的机械应力。演示:https://tinyurl.com/yueda3jh
更新时间: 2024-04-17 02:04:10
领域: cs.AI,cs.LG,cs.SY,eess.SY
Vision Augmentation Prediction Autoencoder with Attention Design (VAPAAD)
Recent advancements in sequence prediction have significantly improved the accuracy of video data interpretation; however, existing models often overlook the potential of attention-based mechanisms for next-frame prediction. This study introduces the Vision Augmentation Prediction Autoencoder with Attention Design (VAPAAD), an innovative approach that integrates attention mechanisms into sequence prediction, enabling nuanced analysis and understanding of temporal dynamics in video sequences. Utilizing the Moving MNIST dataset, we demonstrate VAPAAD's robust performance and superior handling of complex temporal data compared to traditional methods. VAPAAD combines data augmentation, ConvLSTM2D layers, and a custom-built self-attention mechanism to effectively focus on salient features within a sequence, enhancing predictive accuracy and context-aware analysis. This methodology not only adheres to human cognitive processes during video interpretation but also addresses limitations in conventional models, which often struggle with the variability inherent in video sequences. The experimental results confirm that VAPAAD outperforms existing models, especially in integrating attention mechanisms, which significantly improve predictive performance.
Updated: 2024-04-17 02:02:33
标题: 视觉增强预测自动编码器与关注设计(VAPAAD)
摘要: 最近在序列预测方面取得了重大进展,显著提高了视频数据解释的准确性;然而,现有模型往往忽视了基于注意力机制的下一帧预测的潜力。本研究介绍了具有关注设计的视觉增强预测自动编码器(VAPAAD),这是一种创新方法,将注意力机制整合到序列预测中,实现对视频序列中时间动态的细致分析和理解。利用Moving MNIST数据集,我们展示了VAPAAD相对于传统方法对复杂时间数据的稳健性能和优越处理能力。VAPAAD结合了数据增强、ConvLSTM2D层和自定义构建的自注意机制,有效地聚焦于序列中的显著特征,提高了预测准确性和上下文感知分析。这种方法不仅符合人类在视频解释过程中的认知过程,还解决了传统模型中的局限性,这些模型往往难以处理视频序列固有的变异性。实验结果证实,VAPAAD在集成注意力机制方面表现优于现有模型,显著提高了预测性能。
更新时间: 2024-04-17 02:02:33
领域: cs.CV,cs.AI
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
In practice, preference learning from human feedback depends on incomplete data with hidden context. Hidden context refers to data that affects the feedback received, but which is not represented in the data used to train a preference model. This captures common issues of data collection, such as having human annotators with varied preferences, cognitive processes that result in seemingly irrational behavior, and combining data labeled according to different criteria. We prove that standard applications of preference learning, including reinforcement learning from human feedback (RLHF), implicitly aggregate over hidden contexts according to a well-known voting rule called Borda count. We show this can produce counter-intuitive results that are very different from other methods which implicitly aggregate via expected utility. Furthermore, our analysis formalizes the way that preference learning from users with diverse values tacitly implements a social choice function. A key implication of this result is that annotators have an incentive to misreport their preferences in order to influence the learned model, leading to vulnerabilities in the deployment of RLHF. As a step towards mitigating these problems, we introduce a class of methods called distributional preference learning (DPL). DPL methods estimate a distribution of possible score values for each alternative in order to better account for hidden context. Experimental results indicate that applying DPL to RLHF for LLM chatbots identifies hidden context in the data and significantly reduces subsequent jailbreak vulnerability. Our code and data are available at https://github.com/cassidylaidlaw/hidden-context
Updated: 2024-04-17 01:58:09
标题: 分布式偏好学习:理解和考虑在RLHF中隐藏的上下文
摘要: 在实践中,从人类反馈中学习偏好取决于具有隐藏背景的不完整数据。隐藏背景指的是影响所收到反馈的数据,但这些数据并未包含在用于训练偏好模型的数据中。这涵盖了数据收集中常见的问题,比如具有不同偏好的人类注释员、导致看似不合理行为的认知过程,以及合并根据不同标准标记的数据。我们证明了偏好学习的标准应用,包括从人类反馈中进行强化学习(RLHF),隐含地根据一个名为波达计数的著名投票规则对隐藏背景进行汇总。我们展示了这可能产生与通过期望效用隐式聚合的其他方法非常不同的反直觉结果。此外,我们的分析正式描述了从具有多元价值观的用户学习偏好如何隐含地实现社会选择功能。这一结果的一个关键含义是,注释员有动机错误报告他们的偏好以影响学习模型,导致RLHF的部署中存在漏洞。为了缓解这些问题,我们引入了一类称为分布偏好学习(DPL)的方法。DPL方法估计了每个备选方案可能分数值的分布,以更好地考虑隐藏背景。实验结果表明,将DPL应用于LLM chatbots的RLHF可以识别数据中的隐藏背景,并显著降低随后的越狱漏洞。我们的代码和数据可在https://github.com/cassidylaidlaw/hidden-context 上找到。
更新时间: 2024-04-17 01:58:09
领域: cs.LG,cs.AI,stat.ML
FairSSD: Understanding Bias in Synthetic Speech Detectors
Methods that can generate synthetic speech which is perceptually indistinguishable from speech recorded by a human speaker, are easily available. Several incidents report misuse of synthetic speech generated from these methods to commit fraud. To counter such misuse, many methods have been proposed to detect synthetic speech. Some of these detectors are more interpretable, can generalize to detect synthetic speech in the wild and are robust to noise. However, limited work has been done on understanding bias in these detectors. In this work, we examine bias in existing synthetic speech detectors to determine if they will unfairly target a particular gender, age and accent group. We also inspect whether these detectors will have a higher misclassification rate for bona fide speech from speech-impaired speakers w.r.t fluent speakers. Extensive experiments on 6 existing synthetic speech detectors using more than 0.9 million speech signals demonstrate that most detectors are gender, age and accent biased, and future work is needed to ensure fairness. To support future research, we release our evaluation dataset, models used in our study and source code at https://gitlab.com/viper-purdue/fairssd.
Updated: 2024-04-17 01:53:03
标题: 公平SSD:理解合成语音检测器中的偏见
摘要: 目前有多种方法可以生成与人类演讲录音在感知上无法区分的合成语音。有多起事件报告了滥用这些方法生成的合成语音进行欺诈行为。为了对抗此类滥用,已经提出了许多方法来检测合成语音。其中一些检测器更易解释,可以泛化到野外检测合成语音,并且对噪音具有鲁棒性。然而,对这些检测器中的偏见的研究工作有限。在这项工作中,我们研究了现有合成语音检测器中的偏见,以确定它们是否会不公平地针对特定性别、年龄和口音群体。我们还检查这些检测器是否会在识别言语受损者的真实语音时出现更高的误分类率,相对于流利的演讲者。对6种现有合成语音检测器进行了广泛实验,使用了超过90万次语音信号,结果显示大多数检测器存在性别、年龄和口音偏见,未来工作需要确保公平性。为了支持未来研究,我们在https://gitlab.com/viper-purdue/fairssd上发布了我们的评估数据集、研究中使用的模型和源代码。
更新时间: 2024-04-17 01:53:03
领域: cs.CV,cs.LG,cs.MM,cs.SD,eess.AS
From Paper to Platform: Evolution of a Novel Learning Environment for Tabletop Exercises
For undergraduate students of computing, learning to solve complex practical problems in a team is an essential skill for their future careers. This skill is needed in various fields, such as in cybersecurity and IT governance. Tabletop exercises are an innovative teaching method used in practice for training teams in incident response and evaluation of contingency plans. However, tabletop exercises are not yet widely established in university education. This paper presents data and teaching experience from a cybersecurity course that introduces tabletop exercises in classrooms using a novel technology: INJECT Exercise Platform (IXP), a web-based learning environment for delivering and evaluating the exercises. This technology substantially improves the prior practice, since tabletop exercises worldwide have usually been conducted using pen and paper. Unlike in traditional tabletop exercises, which are difficult to evaluate manually, IXP provides insights into students' behavior and learning based on automated analysis of interaction data. We demonstrate IXP's capabilities and evolution by comparing exercise sessions hosted throughout three years at different stages of the platform's readiness. The analysis of student data is supplemented by the discussion of the lessons learned from employing IXP in computing education contexts. The data analytics enabled a detailed comparison of the teams' performance and behavior. Instructors who consider innovating their classes with tabletop exercises may use IXP and benefit from the insights in this paper.
Updated: 2024-04-17 01:52:48
标题: 从纸质到平台:桌面练习新学习环境的演变
摘要: 对于计算机专业的本科生来说,学习在团队中解决复杂实际问题是未来职业发展中必不可少的技能。这种技能在各个领域都是必需的,比如在网络安全和IT治理中。桌面练习是一种创新的教学方法,在实践中用于培训团队进行事故响应和评估应急计划。然而,桌面练习在大学教育中还没有得到广泛应用。本文介绍了一门网络安全课程中的数据和教学经验,该课程引入了桌面练习,使用了一种新颖的技术:INJECT Exercise Platform (IXP),一个基于网络的学习环境,用于交付和评估练习。这项技术显著改进了先前的做法,因为全球范围内的桌面练习通常是使用纸笔进行的。与传统的桌面练习不同,传统练习很难手动评估,IXP通过自动分析互动数据提供了对学生行为和学习的深入洞察。我们通过比较不同阶段平台准备情况下进行的三年间的练习会话,展示了IXP的能力和发展。学生数据分析得到了授课环境中使用IXP所得的经验教训的讨论补充。数据分析使得能够详细比较团队的表现和行为。考虑在课堂中创新使用桌面练习的教师可以使用IXP,并从本文中获得见解。
更新时间: 2024-04-17 01:52:48
领域: cs.CY,cs.CR,K.3
Graph Continual Learning with Debiased Lossless Memory Replay
Real-life graph data often expands continually, rendering the learning of graph neural networks (GNNs) on static graph data impractical. Graph continual learning (GCL) tackles this problem by continually adapting GNNs to the expanded graph of the current task while maintaining the performance over the graph of previous tasks. Memory replay-based methods, which aim to replay data of previous tasks when learning new tasks, have been explored as one principled approach to mitigate the forgetting of the knowledge learned from the previous tasks. In this paper we extend this methodology with a novel framework, called Debiased Lossless Memory replay (DeLoMe). Unlike existing methods that sample nodes/edges of previous graphs to construct the memory, DeLoMe learns small lossless synthetic node representations as the memory. The learned memory can not only preserve the graph data privacy but also capture the holistic graph information, for which the sampling-based methods are not viable. Further, prior methods suffer from bias toward the current task due to the data imbalance between the classes in the memory data and the current data. A debiased GCL loss function is devised in DeLoMe to effectively alleviate this bias. Extensive experiments on four graph datasets show the effectiveness of DeLoMe under both class- and task-incremental learning settings.
Updated: 2024-04-17 01:31:00
标题: 使用去偏差的无损记忆重放进行图形持续学习
摘要: 真实的图数据通常会不断扩展,使得在静态图数据上学习图神经网络(GNNs)变得不切实际。图持续学习(GCL)通过持续调整GNNs以适应当前任务的扩展图,同时保持对先前任务图的性能,来解决这个问题。基于记忆重播的方法旨在在学习新任务时重播先前任务的数据,已被探索作为减轻从先前任务中学到的知识遗忘的一种原则性方法。本文扩展了这一方法,提出了一种新颖的框架,称为偏差无损记忆重播(DeLoMe)。与现有方法不同,DeLoMe学习小型无损合成节点表示作为记忆,而不是对先前图的节点/边进行采样来构建记忆。学习的记忆不仅可以保护图数据隐私,还能够捕获整体图信息,这是采样方法无法实现的。此外,先前的方法由于记忆数据和当前数据中类之间的数据不平衡而存在偏见。DeLoMe设计了一个去偏差的GCL损失函数,有效缓解这种偏见。对四个图数据集进行的广泛实验表明,在类增量学习和任务增量学习设置下,DeLoMe的有效性。
更新时间: 2024-04-17 01:31:00
领域: cs.LG
A Survey on Retrieval-Augmented Text Generation for Large Language Models
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of large language models (LLMs) by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but incorrect responses by LLMs, thereby enhancing the accuracy and reliability of their outputs through the use of real-world data. As RAG grows in complexity and incorporates multiple concepts that can influence its performance, this paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation, offering a detailed perspective from the retrieval viewpoint. It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies. Additionally, the paper introduces evaluation methods for RAG, addressing the challenges faced and proposing future research directions. By offering an organized framework and categorization, the study aims to consolidate existing research on RAG, clarify its technological underpinnings, and highlight its potential to broaden the adaptability and applications of LLMs.
Updated: 2024-04-17 01:27:42
标题: 大语言模型中用于检索增强文本生成的调查
摘要: 检索增强生成(RAG)将检索方法与深度学习进展相结合,以解决大型语言模型(LLMs)的静态限制,通过实现最新外部信息的动态集成。这种方法主要关注文本领域,为LLMs生成可能但不正确的回答提供了一种经济有效的解决方案,从而通过使用真实世界数据增强其输出的准确性和可靠性。随着RAG的复杂性的增长和涵盖多个可能影响其性能的概念,本文将RAG范式分为四类:预检索、检索、后检索和生成,并从检索视角提供了详细的视角。它概述了RAG的发展,并通过对重要研究的分析讨论了该领域的进展。此外,本文介绍了RAG的评估方法,解决了面临的挑战,并提出了未来的研究方向。通过提供一个有组织的框架和分类,本研究旨在巩固现有关于RAG的研究,澄清其技术基础,并强调其扩大LLMs的适应性和应用潜力。
更新时间: 2024-04-17 01:27:42
领域: cs.IR,cs.AI,cs.CL
Hyper Evidential Deep Learning to Quantify Composite Classification Uncertainty
Deep neural networks (DNNs) have been shown to perform well on exclusive, multi-class classification tasks. However, when different classes have similar visual features, it becomes challenging for human annotators to differentiate them. This scenario necessitates the use of composite class labels. In this paper, we propose a novel framework called Hyper-Evidential Neural Network (HENN) that explicitly models predictive uncertainty due to composite class labels in training data in the context of the belief theory called Subjective Logic (SL). By placing a grouped Dirichlet distribution on the class probabilities, we treat predictions of a neural network as parameters of hyper-subjective opinions and learn the network that collects both single and composite evidence leading to these hyper-opinions by a deterministic DNN from data. We introduce a new uncertainty type called vagueness originally designed for hyper-opinions in SL to quantify composite classification uncertainty for DNNs. Our results demonstrate that HENN outperforms its state-of-the-art counterparts based on four image datasets. The code and datasets are available at: https://github.com/Hugo101/HyperEvidentialNN.
Updated: 2024-04-17 01:26:15
标题: 超证据深度学习用于量化复合分类不确定性
摘要: 深度神经网络(DNNs)已被证明在独占性多类分类任务上表现良好。然而,当不同类别具有相似的视觉特征时,人类注释者很难区分它们。这种情况需要使用复合类标签。在本文中,我们提出了一个名为Hyper-Evidential神经网络(HENN)的新框架,该框架在主观逻辑(SL)理论的背景下显式地对训练数据中由于复合类标签而产生的预测不确定性进行建模。通过在类概率上放置分组狄利克雷分布,我们将神经网络的预测视为超主观意见的参数,并学习通过确定性DNN从数据中收集导致这些超意见的单个和复合证据的网络。我们引入了一种新的不确定性类型,称为模糊性,最初设计用于SL中的超意见,以量化DNN的复合分类不确定性。我们的结果表明,HENN在基于四个图像数据集的最新对手中表现出色。代码和数据集可在以下网址获取:https://github.com/Hugo101/HyperEvidentialNN。
更新时间: 2024-04-17 01:26:15
领域: cs.CV,cs.LG
Leveraging 3D LiDAR Sensors to Enable Enhanced Urban Safety and Public Health: Pedestrian Monitoring and Abnormal Activity Detection
The integration of Light Detection and Ranging (LiDAR) and Internet of Things (IoT) technologies offers transformative opportunities for public health informatics in urban safety and pedestrian well-being. This paper proposes a novel framework utilizing these technologies for enhanced 3D object detection and activity classification in urban traffic scenarios. By employing elevated LiDAR, we obtain detailed 3D point cloud data, enabling precise pedestrian activity monitoring. To overcome urban data scarcity, we create a specialized dataset through simulated traffic environments in Blender, facilitating targeted model training. Our approach employs a modified Point Voxel-Region-based Convolutional Neural Network (PV-RCNN) for robust 3D detection and PointNet for classifying pedestrian activities, significantly benefiting urban traffic management and public health by offering insights into pedestrian behavior and promoting safer urban environments. Our dual-model approach not only enhances urban traffic management but also contributes significantly to public health by providing insights into pedestrian behavior and promoting safer urban environment.
Updated: 2024-04-17 01:23:49
标题: 利用3D激光雷达传感器实现增强的城市安全和公共卫生:行人监测和异常活动检测
摘要: 激光雷达(LiDAR)和物联网(IoT)技术的整合为城市安全和行人福祉的公共卫生信息学提供了变革性机会。本文提出了一个利用这些技术进行增强3D对象检测和活动分类的新框架,用于城市交通场景。通过使用高架LiDAR,我们获得了详细的3D点云数据,实现了精确的行人活动监测。为了克服城市数据稀缺性,我们通过在Blender中模拟交通环境创建了一个专门的数据集,促进了针对性模型训练。我们的方法采用了一种修改过的基于点体素区域的卷积神经网络(PV-RCNN)进行强大的3D检测和PointNet用于分类行人活动,显著促进了城市交通管理和公共卫生,通过提供对行人行为的洞察和促进更安全的城市环境。我们的双模型方法不仅增强了城市交通管理,而且通过提供对行人行为的洞察和促进更安全的城市环境,对公共卫生做出了重大贡献。
更新时间: 2024-04-17 01:23:49
领域: cs.CV,cs.AI,cs.LG
Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning
Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG's superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.
Updated: 2024-04-17 01:17:10
标题: 多智能体强化学习中的群体感知协调图
摘要: 多智能体协作强化学习(MARL)需要代理之间的无缝协作,通常由底层关系图表示。现有的学习该图的方法主要集中在代理对关系上,忽略了高阶关系。虽然有几种方法试图将合作建模扩展到包括群体内的行为相似性,但它们通常在同时学习潜在图方面表现不佳,从而限制了部分观察代理之间的信息交流。为了克服这些限制,我们提出了一种新方法来推断群体感知协调图(GACG),旨在捕捉基于当前观察的代理对之间的合作以及从轨迹观察到的行为模式中的群体级依赖关系。这个图进一步用于图卷积,在决策过程中促进代理之间的信息交换。为了进一步确保同一组内的代理之间的行为一致性,我们引入了一个群体距离损失,促进了群体凝聚力并鼓励了群体之间的专业化。我们在StarCraft II微观管理任务上进行的评估表明了GACG的优越性能。一个消融研究进一步提供了我们方法的每个组件的有效性的实验证据。
更新时间: 2024-04-17 01:17:10
领域: cs.LG,cs.AI,cs.MA
Runner re-identification from single-view running video in the open-world setting
In many sports, player re-identification is crucial for automatic video processing and analysis. However, most of the current studies on player re-identification in multi- or single-view sports videos focus on re-identification in the closed-world setting using labeled image dataset, and player re-identification in the open-world setting for automatic video analysis is not well developed. In this paper, we propose a runner re-identification system that directly processes single-view video to address the open-world setting. In the open-world setting, we cannot use labeled dataset and have to process video directly. The proposed system automatically processes raw video as input to identify runners, and it can identify runners even when they are framed out multiple times. For the automatic processing, we first detect the runners in the video using the pre-trained YOLOv8 and the fine-tuned EfficientNet. We then track the runners using ByteTrack and detect their shoes with the fine-tuned YOLOv8. Finally, we extract the image features of the runners using an unsupervised method with the gated recurrent unit autoencoder and global and local features mixing. To improve the accuracy of runner re-identification, we use shoe images as local image features and dynamic features of running sequence images. We evaluated the system on a running practice video dataset and showed that the proposed method identified runners with higher accuracy than some state-of-the-art models in unsupervised re-identification. We also showed that our proposed local image feature and running dynamic feature were effective for runner re-identification. Our runner re-identification system can be useful for the automatic analysis of running videos.
Updated: 2024-04-17 01:04:07
标题: 在开放世界环境中从单视角跑步视频中重新识别跑步者
摘要: 在许多运动中,球员重新识别对于自动视频处理和分析至关重要。然而,目前大多数关于多视角或单视角运动视频中球员重新识别的研究都集中在使用标记图像数据集的封闭世界设置中进行重新识别,而在开放世界设置中进行自动视频分析的球员重新识别并没有得到很好的发展。在本文中,我们提出了一个直接处理单视角视频以解决开放世界设置的跑步者重新识别系统。在开放世界设置中,我们无法使用标记数据集,必须直接处理视频。所提出的系统自动处理原始视频作为输入来识别跑步者,即使他们多次被裁剪也能识别出来。为了进行自动处理,我们首先使用预训练的YOLOv8和微调后的EfficientNet在视频中检测跑步者。然后使用ByteTrack跟踪跑步者,并使用微调后的YOLOv8检测他们的鞋子。最后,我们使用门控循环单元自编码器和全局和本地特征混合的无监督方法提取跑步者的图像特征。为了提高跑步者重新识别的准确性,我们使用鞋子图像作为本地图像特征和跑步序列图像的动态特征。我们在一个跑步练习视频数据集上评估了该系统,并显示所提出的方法在无监督重新识别中识别出跑步者的准确性高于一些最先进的模型。我们还展示了我们提出的本地图像特征和跑步动态特征对于跑步者重新识别的有效性。我们的跑步者重新识别系统对于自动分析跑步视频可能会有用。
更新时间: 2024-04-17 01:04:07
领域: cs.CV,cs.AI,cs.LG
Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
The neural semi-Markov Conditional Random Field (semi-CRF) framework has demonstrated promise for event-based piano transcription. In this framework, all events (notes or pedals) are represented as closed intervals tied to specific event types. The neural semi-CRF approach requires an interval scoring matrix that assigns a score for every candidate interval. However, designing an efficient and expressive architecture for scoring intervals is not trivial. In this paper, we introduce a simple method for scoring intervals using scaled inner product operations that resemble how attention scoring is done in transformers. We show theoretically that, due to the special structure from encoding the non-overlapping intervals, under a mild condition, the inner product operations are expressive enough to represent an ideal scoring matrix that can yield the correct transcription result. We then demonstrate that an encoder-only non-hierarchical transformer backbone, operating only on a low-time-resolution feature map, is capable of transcribing piano notes and pedals with high accuracy and time precision. The experiment shows that our approach achieves the new state-of-the-art performance across all subtasks in terms of the F1 measure on the Maestro dataset.
Updated: 2024-04-17 01:03:00
标题: 使用非层次Transformer进行音乐分数间隔的自动钢琴转录
摘要: 神经半Markov条件随机场(半-CRF)框架已经展示出在基于事件的钢琴转录中具有潜力。在这个框架中,所有事件(音符或踏板)都表示为与特定事件类型相关联的闭合区间。神经半-CRF方法需要一个分配得分给每个候选区间的区间评分矩阵。然而,设计一个有效和富有表现力的架构来评分区间并不是轻松的事情。在本文中,我们介绍了一种简单的方法,使用类似于transformers中的注意力评分是如何完成的缩放内积操作来评分区间。我们在理论上证明,由于编码非重叠区间的特殊结构,根据一个温和条件,内积操作足够表达出一个可以产生正确转录结果的理想评分矩阵。然后,我们展示了一个仅基于低时间分辨率特征图操作的仅编码器非分层transformer骨干,能够以高准确性和时间精度转录钢琴音符和踏板。实验表明,我们的方法在Maestro数据集的F1度量上在所有子任务中实现了新的最先进性能。
更新时间: 2024-04-17 01:03:00
领域: cs.SD,cs.LG,eess.AS
GenSERP: Large Language Models for Whole Page Presentation
The advent of large language models (LLMs) brings an opportunity to minimize the effort in search engine result page (SERP) organization. In this paper, we propose GenSERP, a framework that leverages LLMs with vision in a few-shot setting to dynamically organize intermediate search results, including generated chat answers, website snippets, multimedia data, knowledge panels into a coherent SERP layout based on a user's query. Our approach has three main stages: (1) An information gathering phase where the LLM continuously orchestrates API tools to retrieve different types of items, and proposes candidate layouts based on the retrieved items, until it's confident enough to generate the final result. (2) An answer generation phase where the LLM populates the layouts with the retrieved content. In this phase, the LLM adaptively optimize the ranking of items and UX configurations of the SERP. Consequently, it assigns a location on the page to each item, along with the UX display details. (3) A scoring phase where an LLM with vision scores all the generated SERPs based on how likely it can satisfy the user. It then send the one with highest score to rendering. GenSERP features two generation paradigms. First, coarse-to-fine, which allow it to approach optimal layout in a more manageable way, (2) beam search, which give it a better chance to hit the optimal solution compared to greedy decoding. Offline experimental results on real-world data demonstrate how LLMs can contextually organize heterogeneous search results on-the-fly and provide a promising user experience.
Updated: 2024-04-17 00:55:09
标题: GenSERP:用于整页展示的大型语言模型
摘要: 大型语言模型(LLMs)的出现为最小化搜索引擎结果页面(SERP)组织工作带来了机会。在本文中,我们提出了GenSERP,一个利用LLMs和视觉在少样本设置中动态组织中间搜索结果的框架,包括生成的聊天答案、网站摘要、多媒体数据、知识面板,以便根据用户的查询创建连贯的SERP布局。我们的方法主要有三个阶段:(1)信息收集阶段,LLM持续协调API工具检索不同类型的项目,并根据检索到的项目提出候选布局,直到足够自信生成最终结果。(2)答案生成阶段,LLM用检索到的内容填充布局。在此阶段,LLM自适应优化SERP的项目排名和UX配置。因此,它为每个项目分配一个页面位置,以及UX显示细节。(3)评分阶段,具有视觉功能的LLM根据其满足用户的可能性对所有生成的SERP进行评分。然后将最高分的发送到呈现。GenSERP具有两种生成范例。首先是由粗粒到细粒,这使其可以更可控地接近最佳布局,其次是束搜索,与贪婪解码相比,这给它更好的命中最佳解决方案的机会。对真实世界数据的离线实验结果展示了LLMs如何在运行时上下文地组织异构搜索结果并提供有前途的用户体验。
更新时间: 2024-04-17 00:55:09
领域: cs.IR,cs.LG
Design for Trust utilizing Rareness Reduction
Increasing design complexity and reduced time-to-market have motivated manufacturers to outsource some parts of the System-on-Chip (SoC) design flow to third-party vendors. This provides an opportunity for attackers to introduce hardware Trojans by constructing stealthy triggers consisting of rare events (e.g., rare signals, states, and transitions). There are promising test generation-based hardware Trojan detection techniques that rely on the activation of rare events. In this paper, we investigate rareness reduction as a design-for-trust solution to make it harder for an adversary to hide Trojans (easier for Trojan detection). Specifically, we analyze different avenues to reduce the potential rare trigger cases, including design diversity and area optimization. While there is a good understanding of the relationship between area, power, energy, and performance, this research provides a better insight into the dependency between area and security. Our experimental evaluation demonstrates that area reduction leads to a reduction in rareness. It also reveals that reducing rareness leads to faster Trojan detection as well as improved coverage by Trojan detection methods.
Updated: 2024-04-17 00:54:51
标题: 利用稀缺性减少进行信任设计
摘要: 随着设计复杂性的增加和上市时间的缩短,制造商被激励将System-on-Chip(SoC)设计流程的一些部分外包给第三方供应商。这为攻击者提供了一个机会,通过构建由稀有事件(例如稀有信号、状态和转换)组成的隐蔽触发器来引入硬件特洛伊木马。有一些有前途的基于测试生成的硬件特洛伊木马检测技术,依赖于稀有事件的激活。在本文中,我们研究了稀有性减少作为一种设计信任解决方案,使对手更难隐藏特洛伊木马(更容易检测特洛伊木马)。具体来说,我们分析了减少潜在稀有触发案例的不同途径,包括设计多样性和面积优化。虽然人们对面积、功率、能量和性能之间的关系有很好的理解,但这项研究提供了对面积和安全之间依赖关系更深入的洞察。我们的实验评估表明,减小面积可以减少稀有性。它还揭示了减少稀有性会导致更快的特洛伊木马检测以及特洛伊木马检测方法覆盖范围的改善。
更新时间: 2024-04-17 00:54:51
领域: cs.CR,cs.AR,cs.LO
Gauging Public Acceptance of Conditionally Automated Vehicles in the United States
Public acceptance of conditionally automated vehicles is a crucial step in the realization of smart cities. Prior research in Europe has shown that the factors of hedonic motivation, social influence, and performance expectancy, in decreasing order of importance, influence acceptance. Moreover, a generally positive acceptance of the technology was reported. However, there is a lack of information regarding the public acceptance of conditionally automated vehicles in the United States. In this study, we carried out a web-based experiment where participants were provided information regarding the technology and then completed a questionnaire on their perceptions. The collected data was analyzed using PLS-SEM to examine the factors that may lead to public acceptance of the technology in the United States. Our findings showed that social influence, performance expectancy, effort expectancy, hedonic motivation, and facilitating conditions determine conditionally automated vehicle acceptance. Additionally, certain factors were found to influence the perception of how useful the technology is, the effort required to use it, and the facilitating conditions for its use. By integrating the insights gained from this study, stakeholders can better facilitate the adoption of autonomous vehicle technology, contributing to safer, more efficient, and user-friendly transportation systems in the future that help realize the vision of the smart city.
Updated: 2024-04-17 00:43:52
标题: 美国对有条件自动驾驶车辆的公众接受程度评估
摘要: 公众对有条件自动驾驶汽车的接受度是实现智慧城市的关键步骤。先前在欧洲的研究表明,享乐动机、社会影响和绩效期望这些因素依次影响接受度,且普遍对该技术持积极态度。然而,在美国缺乏关于公众对有条件自动驾驶汽车的接受度的信息。本研究进行了一项基于网络的实验,参与者获得有关技术的信息后填写问卷调查。通过使用PLS-SEM分析收集的数据,探讨可能导致该技术在美国获得公众接受的因素。研究结果显示,社会影响、绩效期望、努力期望、享乐动机和促进条件决定了对有条件自动驾驶汽车的接受度。此外,发现某些因素影响了技术的实用性、使用所需的努力以及使用的促进条件的认知。通过整合此研究所获得的见解,利益相关者可以更好地促进自动驾驶汽车技术的采用,为未来实现更安全、更高效和更用户友好的交通系统,助力实现智慧城市愿景。
更新时间: 2024-04-17 00:43:52
领域: cs.CY,cs.AI
Structured Entity Extraction Using Large Language Models
Recent advances in machine learning have significantly impacted the field of information extraction, with Large Language Models (LLMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically represent information extraction as triplet-centric and use classical metrics such as precision and recall for evaluation. We reformulate the task to be entity-centric, enabling the use of diverse metrics that can provide more insights from various perspectives. We contribute to the field by introducing Structured Entity Extraction (SEE) and proposing the Approximate Entity Set OverlaP (AESOP) metric, designed to appropriately assess model performance. Later, we introduce a new model that harnesses the power of LLMs for enhanced effectiveness and efficiency by decomposing the extraction task into multiple stages. Quantitative and human side-by-side evaluations confirm that our model outperforms baselines, offering promising directions for future advancements in structured entity extraction.
Updated: 2024-04-17 00:24:57
标题: 使用大型语言模型进行结构化实体提取
摘要: 机器学习的最新进展显著影响了信息提取领域,大型语言模型(LLMs)在从非结构化文本中提取结构化信息中发挥了关键作用。先前的研究通常将信息提取表示为三元组中心,并使用经典指标如精确度和召回率进行评估。我们重新定义了任务为实体中心,从而可以使用多样化的指标,从不同的角度提供更多见解。我们通过引入结构化实体提取(SEE)并提出适当评估模型性能的近似实体集重叠(AESOP)指标,为该领域做出了贡献。随后,我们引入了一种新模型,利用LLMs的力量提高效果和效率,将提取任务分解为多个阶段。定量和人工并行评估证实我们的模型优于基线,为结构化实体提取未来发展提供了有希望的方向。
更新时间: 2024-04-17 00:24:57
领域: cs.CL,cs.LG
An Innovative Information Theory-based Approach to Tackle and Enhance The Transparency in Phishing Detection
Phishing attacks have become a serious and challenging issue for detection, explanation, and defense. Despite more than a decade of research on phishing, encompassing both technical and non-technical remedies, phishing continues to be a serious problem. Nowadays, AI-based phishing detection stands out as one of the most effective solutions for defending against phishing attacks by providing vulnerability (i.e., phishing or benign) predictions for the data. However, it lacks explainability in terms of providing comprehensive interpretations for the predictions, such as identifying the specific information that causes the data to be classified as phishing. To this end, we propose an innovative deep learning-based approach for email (the most common phishing way) phishing attack localization. Our method can not only predict the vulnerability of the email data but also automatically learn and figure out the most important and phishing-relevant information (i.e., sentences) in the phishing email data where the selected information indicates useful and concise explanations for the vulnerability. The rigorous experiments on seven real-world diverse email datasets show the effectiveness and advancement of our proposed method in selecting crucial information, offering concise explanations (by successfully figuring out the most important and phishing-relevant information) for the vulnerability of the phishing email data. Particularly, our method achieves a significantly higher performance, ranging from approximately 1.5% to 3.5%, compared to state-of-the-art baselines, as measured by the combined average performance of two main metrics Label-Accuracy and Cognitive-True-Positive.
Updated: 2024-04-17 00:18:17
标题: 一种基于创新信息理论的方法来解决和提升钓鱼检测中的透明度问题
摘要: 网络钓鱼攻击已经成为一个严重而具有挑战性的检测、解释和防御问题。尽管在网络钓鱼方面已经有十多年的研究,包括技术和非技术方面的治疗方法,网络钓鱼仍然是一个严重的问题。如今,基于人工智能的网络钓鱼检测成为防御网络钓鱼攻击的最有效解决方案之一,通过为数据提供漏洞(即网络钓鱼或良性)预测。然而,在提供综合解释方面缺乏可解释性,比如识别导致数据被分类为网络钓鱼的具体信息。为此,我们提出了一种创新的基于深度学习的方法,用于定位电子邮件(最常见的网络钓鱼方式)网络钓鱼攻击。我们的方法不仅可以预测电子邮件数据的漏洞,还可以自动学习和找出最重要和与网络钓鱼相关的信息(即句子),在网络钓鱼电子邮件数据中选择的信息指示了漏洞的有用和简洁解释。对七个真实世界的不同电子邮件数据集进行的严格实验显示,我们提出的方法在选择关键信息、提供简洁解释(成功找出最重要和与网络钓鱼相关的信息)方面的有效性和先进性。特别是,我们的方法在两个主要指标Label-Accuracy和Cognitive-True-Positive的综合平均表现方面,相对于最先进的基线,取得了显著更高的性能,范围从约1.5%到3.5%。
更新时间: 2024-04-17 00:18:17
领域: cs.CR