Arxiv Day: Article

Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model

In recent years, the advancement of imitation learning has led to increased interest in teleoperating low-cost manipulators to collect demonstration data. However, most existing systems rely on unilateral control, which only transmits target position values. While this approach is easy to implement and suitable for slow, non-contact tasks, it struggles with fast or contact-rich operations due to the absence of force feedback. This work demonstrates that fast teleoperation with force feedback is feasible even with force-sensorless, low-cost manipulators by leveraging 4-channel bilateral control. Based on accurately identified manipulator dynamics, our method integrates nonlinear terms compensation, velocity and external force estimation, and variable gain corresponding to inertial variation. Furthermore, using data collected by 4-channel bilateral control, we show that incorporating force information into both the input and output of learned policies improves performance in imitation learning. These results highlight the practical effectiveness of our system for high-fidelity teleoperation and data collection on affordable hardware.

Updated: 2025-07-14 16:53:31

标题: 快速双边远程操作和模仿学习：利用无传感器力控制通过精准动力学模型

摘要: 近年来，模仿学习的进展导致了对远程操作低成本操作器以收集演示数据的兴趣增加。然而，大多数现有系统依赖于单边控制，只传输目标位置数值。虽然这种方法易于实现并适用于缓慢、非接触性任务，但在快速或接触丰富的操作中由于缺乏力反馈而遇到困难。本研究表明，即使在没有力传感器的低成本操作器上，通过利用4通道双边控制，也可以实现带有力反馈的快速远程操作。基于精确识别的操作器动态，我们的方法集成了非线性项补偿、速度和外部力估计，并与惯性变化相对应的可变增益。此外，利用通过4通道双边控制收集的数据，我们展示了将力信息整合到学习策略的输入和输出中可以改善模仿学习的性能。这些结果突显了我们的系统在高保真度远程操作和在廉价硬件上进行数据收集方面的实际有效性。

更新时间: 2025-07-14 16:53:31

领域: cs.RO,cs.AI,cs.SY,eess.SY

下载: http://arxiv.org/abs/2507.06174v2

BayesSDF: Surface-Based Laplacian Uncertainty Estimation for 3D Geometry with Neural Signed Distance Fields

Quantifying uncertainty in neural implicit 3D representations, particularly those utilizing Signed Distance Functions (SDFs), remains a substantial challenge due to computational inefficiencies, scalability issues, and geometric inconsistencies. Existing methods typically neglect direct geometric integration, leading to poorly calibrated uncertainty maps. We introduce BayesSDF, a novel probabilistic framework for uncertainty quantification in neural implicit SDF models, motivated by scientific simulation applications with 3D environments (e.g., forests) such as modeling fluid flow through forests, where precise surface geometry and reliable uncertainty estimates are essential. Unlike radiance-based models such as Neural Radiance Fields (NeRF) or 3D Gaussian splatting, which lack explicit surface formulations, Signed Distance Functions (SDFs) define continuous and differentiable geometry, making them better suited for physical modeling and analysis. BayesSDF leverages a Laplace approximation to quantify local surface instability using Hessian-based metrics, enabling efficient, surfaceaware uncertainty estimation. Our method shows that uncertainty predictions correspond closely with poorly reconstructed geometry, providing actionable confidence measures for downstream use. Extensive evaluations on synthetic and real-world datasets demonstrate that BayesSDF outperforms existing methods in both calibration and geometric consistency, establishing a strong foundation for uncertainty-aware 3D scene reconstruction, simulation, and robotic decision-making.

Updated: 2025-07-14 15:52:55

标题: BayesSDF：基于表面的拉普拉斯不确定性估计用于具有神经符号距离场的3D几何形状

摘要: 在神经隐式三维表示中对不确定性进行量化，特别是利用符号距离函数（SDFs）的情况下，由于计算效率低、可扩展性问题和几何不一致性，仍然是一个重大挑战。现有方法通常忽略直接几何积分，导致不准确的不确定性地图。我们引入了BayesSDF，这是一个新颖的概率框架，用于神经隐式SDF模型中的不确定性量化，受科学模拟应用的启发，这些应用涉及3D环境（例如森林）的建模，如模拟液体在森林中的流动，其中精确的表面几何和可靠的不确定性估计是必不可少的。与基于辐射的模型（如神经辐射场（NeRF）或3D高斯飞溅）不同，这些模型缺乏明确的表面公式，符号距离函数（SDFs）定义了连续和可微分的几何，使它们更适合物理建模和分析。BayesSDF利用拉普拉斯近似来利用基于Hessian的度量来量化局部表面不稳定性，从而实现高效的、面向表面的不确定性估计。我们的方法显示，不确定性预测与几何重建不佳密切对应，为下游使用提供可操作的置信度量。对合成和真实数据集的广泛评估表明，BayesSDF在校准和几何一致性方面优于现有方法，为基于不确定性的3D场景重建、模拟和机器人决策奠定了坚实基础。

更新时间: 2025-07-14 15:52:55

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06269v2

Linearly Homomorphic Signature with Tight Security on Lattice

Boyen and Li posed an open problem in their ASIACRYPT 2016 conference paper: How to construct a tightly secure homomorphic signature scheme under the Short Integer Solution (SIS) hardness assumption in the standard model. This work provides the first complete resolution of this problem under the same assumption.

Updated: 2025-07-14 14:48:11

标题: 基于格的紧密安全的线性同态签名

摘要: Boyen和Li在他们2016年ASIACRYPT会议论文中提出了一个开放性问题：在标准模型下如何构建一个在短整数解（SIS）困难假设下紧密安全的同态签名方案。本文是对这个问题在相同假设下的第一个完整解决方案。

更新时间: 2025-07-14 14:48:11

领域: cs.CR,cs.IT,math.IT,14J60 (Primary) 14F05, 14J26 (Secondary),F.2.2; I.2.7

下载: http://arxiv.org/abs/2412.01641v7

DESIGN: Encrypted GNN Inference via Server-Side Input Graph Pruning

Graph Neural Networks (GNNs) have achieved state-of-the-art performance in various graph-based learning tasks. However, enabling privacy-preserving GNNs in encrypted domains, such as under Fully Homomorphic Encryption (FHE), typically incurs substantial computational overhead, rendering real-time and privacy-preserving inference impractical. In this work, we propose DESIGN (EncrypteD GNN Inference via sErver-Side Input Graph pruNing), a novel framework for efficient encrypted GNN inference. DESIGN tackles the critical efficiency limitations of existing FHE GNN approaches, which often overlook input data redundancy and apply uniform computational strategies. Our framework achieves significant performance gains through a hierarchical optimization strategy executed entirely on the server: first, FHE-compatible node importance scores (based on encrypted degree statistics) are computed from the encrypted graph. These scores then guide a homomorphic partitioning process, generating multi-level importance masks directly under FHE. This dynamically generated mask facilitates both input graph pruning (by logically removing unimportant elements) and a novel adaptive polynomial activation scheme, where activation complexity is tailored to node importance levels. Empirical evaluations demonstrate that DESIGN substantially accelerates FHE GNN inference compared to state-of-the-art methods while maintaining competitive model accuracy, presenting a robust solution for secure graph analytics. Our implementation is publicly available at https://github.com/LabRAI/DESIGN.

Updated: 2025-07-14 14:12:59

标题: 设计：通过服务器端输入图修剪实现加密GNN推断

摘要: 图神经网络（GNNs）在各种基于图的学习任务中取得了最先进的性能。然而，在加密领域（如完全同态加密（FHE）下启用保护隐私的GNNs通常会带来大量的计算开销，使得实时和保护隐私的推理变得不切实际。在这项工作中，我们提出了DESIGN（通过服务器端输入图修剪实现加密GNN推理），这是一个用于高效加密GNN推理的新框架。DESIGN解决了现有FHE GNN方法的关键效率限制，这些方法经常忽略输入数据的冗余并应用统一的计算策略。我们的框架通过完全在服务器上执行的分层优化策略实现了显著的性能提升：首先，从加密图中计算出基于加密度统计的FHE兼容的节点重要性分数。然后，这些分数指导同态分区过程，直接在FHE下生成多级重要性掩码。这个动态生成的掩码既方便输入图修剪（通过逻辑地删除不重要的元素），又支持一种新颖的自适应多项式激活方案，其中激活复杂度根据节点重要性水平进行调整。经验评估表明，与最先进的方法相比，DESIGN显著加速了FHE GNN推理，同时保持竞争性的模型准确性，为安全图分析提供了一个稳健的解决方案。我们的实现可以在https://github.com/LabRAI/DESIGN上公开获取。

更新时间: 2025-07-14 14:12:59

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2507.05649v2

Low Resource Reconstruction Attacks Through Benign Prompts

The recent advances in generative models such as diffusion models have raised several risks and concerns related to privacy, copyright infringements and data stewardship. To better understand and control the risks, various researchers have created techniques, experiments and attacks that reconstruct images, or part of images, from the training set. While these techniques already establish that data from the training set can be reconstructed, they often rely on high-resources, excess to the training set as well as well-engineered and designed prompts. In this work, we devise a new attack that requires low resources, assumes little to no access to the actual training set, and identifies, seemingly, benign prompts that lead to potentially-risky image reconstruction. This highlights the risk that images might even be reconstructed by an uninformed user and unintentionally. For example, we identified that, with regard to one existing model, the prompt ``blue Unisex T-Shirt'' can generate the face of a real-life human model. Our method builds on an intuition from previous works which leverages domain knowledge and identifies a fundamental vulnerability that stems from the use of scraped data from e-commerce platforms, where templated layouts and images are tied to pattern-like prompts.

Updated: 2025-07-14 14:03:51

标题: 使用良性提示进行低资源重建攻击

摘要: 最近生成模型（如扩散模型）的进展引发了与隐私、版权侵权和数据管理相关的几个风险和关注。为了更好地理解和控制这些风险，各种研究人员已经开发了技术、实验和攻击，可以从训练集中重建图像或图像的部分。虽然这些技术已经证明可以重建训练集中的数据，但它们通常依赖于高资源、访问训练集以及设计精良的提示。在这项工作中，我们设计了一种新的攻击，需要低资源，假设几乎没有访问实际训练集，并识别看似无害的提示，可能导致有风险的图像重建。这凸显了即使是一个不知情的用户也可能无意中重建图像的风险。例如，我们发现，针对一个现有模型，提示“蓝色男女款T恤”可以生成一个真实人类模特的脸部。我们的方法建立在之前作品的直觉基础上，利用领域知识并识别出源于从电子商务平台中抓取的数据的基本漏洞，其中模板化布局和图像与类似模式的提示相关联。

更新时间: 2025-07-14 14:03:51

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.07947v2

Dually Hierarchical Drift Adaptation for Online Configuration Performance Learning

Modern configurable software systems need to learn models that correlate configuration and performance. However, when the system operates in dynamic environments, the workload variations, hardware changes, and system updates will inevitably introduce concept drifts at different levels - global drifts, which reshape the performance landscape of the entire configuration space; and local drifts, which only affect certain sub-regions of that space. As such, existing offline and transfer learning approaches can struggle to adapt to these implicit and unpredictable changes in real-time, rendering configuration performance learning challenging. To address this, we propose DHDA, an online configuration performance learning framework designed to capture and adapt to these drifts at different levels. The key idea is that DHDA adapts to both the local and global drifts using dually hierarchical adaptation: at the upper level, we redivide the data into different divisions, within each of which the local model is retrained, to handle global drifts only when necessary. At the lower level, the local models of the divisions can detect local drifts and adapt themselves asynchronously. To balance responsiveness and efficiency, DHDA combines incremental updates with periodic full retraining to minimize redundant computation when no drifts are detected. Through evaluating eight software systems and against state-of-the-art approaches, we show that DHDA achieves considerably better accuracy and can effectively adapt to drifts with up to 2x improvements, while incurring reasonable overhead and is able to improve different local models in handling concept drift.

Updated: 2025-07-14 13:01:52

标题: 双重分层漂移适应用于在线配置性能学习

摘要: 现代可配置软件系统需要学习模型来关联配置和性能。然而，当系统在动态环境中运行时，工作负载变化、硬件更改和系统更新都不可避免地会在不同层次引入概念漂移-全局漂移重新塑造整个配置空间的性能格局；而局部漂移则仅影响该空间的某些子区域。因此，现有的离线和迁移学习方法可能难以实时适应这些隐式和不可预测的变化，使得配置性能学习具有挑战性。为了解决这个问题，我们提出了DHDA，一个在线配置性能学习框架，旨在捕捉和适应不同层次的漂移。其关键思想是，DHDA使用双重层次适应来适应局部和全局漂移：在上层，我们将数据重新划分为不同的分区，每个分区内的局部模型在有必要时重新训练，以处理全局漂移。在下层，分区的局部模型可以异步检测局部漂移并自适应。为了平衡响应性和效率，DHDA将增量更新与周期性完全重新训练相结合，以最小化在未检测到漂移时的冗余计算。通过评估八个软件系统，并与最先进的方法进行对比，我们表明DHDA在准确性方面取得了显著的改进，并且能够有效地适应漂移，改进了处理概念漂移的不同局部模型。

更新时间: 2025-07-14 13:01:52

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2507.08730v2

Mechanistic Indicators of Understanding in Large Language Models

Recent findings in mechanistic interpretability (MI), the field probing the inner workings of Large Language Models (LLMs), challenge the view that these models rely solely on superficial statistics. We offer an accessible synthesis of these findings that doubles as an introduction to MI while integrating these findings within a novel theoretical framework for thinking about machine understanding. We argue that LLMs develop internal structures that are functionally analogous to the kind of understanding that consists in seeing connections. To sharpen this idea, we propose a three-tiered conception of understanding. First, conceptual understanding emerges when a model forms "features" as directions in latent space, learning the connections between diverse manifestations of something. Second, state-of-the-world understanding emerges when a model learns contingent factual connections between features and dynamically tracks changes in the world. Third, principled understanding emerges when a model ceases to rely on a collection of memorized facts and discovers a "circuit" connecting these facts. However, these forms of understanding remain radically different from human understanding, as the phenomenon of "parallel mechanisms" shows. We conclude that the debate should move beyond the yes-or-no question of whether LLMs understand to investigate how their strange minds work and forge conceptions that fit them.

Updated: 2025-07-14 11:46:41

标题: 大型语言模型中理解的机制指标

摘要: 最近在机制可解释性（MI）领域的发现挑战了这样一种观点，即大型语言模型（LLMs）仅仅依赖表面统计数据。我们提供了一种易于理解的综合，这种综合既是对MI的介绍，同时也将这些发现整合到一个新颖的理论框架中，用于思考机器理解。我们认为LLMs会发展出内部结构，这些结构在功能上类似于那种通过发现连接而形成的理解方式。为了深化这个想法，我们提出了一个三层次的理解概念。首先，概念理解是指当一个模型将“特征”形成为潜在空间中的方向时，学习了不同表现形式之间的连接。其次，世界状态理解是指当一个模型学习了特征之间的有条件的事实连接，并动态跟踪世界变化时。第三，原则性理解是指当一个模型不再依赖于一系列记忆的事实，并发现了连接这些事实的“电路”时。然而，这些形式的理解与人类理解仍然存在着根本的差异，正如“并行机制”现象所显示的那样。我们得出结论认为，讨论应该超越LLMs是否理解的是或非的问题，而是探究它们奇特思维的运作方式，并形成适合它们的概念。

更新时间: 2025-07-14 11:46:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.08017v2

Token-based Audio Inpainting via Discrete Diffusion

Audio inpainting refers to the task of reconstructing missing segments in corrupted audio recordings. While prior approaches-including waveform and spectrogram-based diffusion models-have shown promising results for short gaps, they often degrade in quality when gaps exceed 100 milliseconds (ms). In this work, we introduce a novel inpainting method based on discrete diffusion modeling, which operates over tokenized audio representations produced by a pre-trained audio tokenizer. Our approach models the generative process directly in the discrete latent space, enabling stable and semantically coherent reconstruction of missing audio. We evaluate the method on the MusicNet dataset using both objective and perceptual metrics across gap durations up to 300 ms. We further evaluated our approach on the MTG dataset, extending the gap duration to 500 ms. Experimental results demonstrate that our method achieves competitive or superior performance compared to existing baselines, particularly for longer gaps, offering a robust solution for restoring degraded musical recordings. Audio examples of our proposed method can be found at https://iftach21.github.io/

Updated: 2025-07-14 11:38:36

标题: 通过离散扩散的基于令牌的音频修补

摘要: 音频修补指的是在受损的音频录音中重建缺失部分的任务。尽管先前的方法（包括基于波形和谱图的扩散模型）在短缺口方面表现出有希望的结果，但当缺口超过100毫秒时，它们经常会质量下降。在这项工作中，我们引入了一种基于离散扩散建模的新型修补方法，该方法通过经过预训练的音频标记器生成的标记化音频表示来操作。我们的方法直接在离散潜在空间中建模生成过程，从而实现了对缺失音频的稳定和语义连贯的重建。我们在MusicNet数据集上使用客观和感知度量评估了该方法，跨越了长达300毫秒的缺口持续时间。我们进一步在MTG数据集上评估了我们的方法，将缺口持续时间延长到500毫秒。实验结果表明，我们的方法在长缺口方面实现了与现有基线相比具有竞争力或优越的性能，为恢复受损的音乐录音提供了稳健的解决方案。我们提出的音频示例可以在https://iftach21.github.io/找到。

更新时间: 2025-07-14 11:38:36

领域: cs.SD,cs.AI,cs.IT,cs.LG,eess.AS,math.IT

下载: http://arxiv.org/abs/2507.08333v2

Continuous Classification Aggregation

We prove that any optimal, independent, and zero unanimous fuzzy classification aggregation function of a continuum of individual classifications of $m\ge 3$ objects into $2\le p\le m$ types must be a weighted arithmetic mean. We also provide a characterization for the case when $m=p=2$.

Updated: 2025-07-14 09:53:58

标题: 持续分类聚合

摘要: 我们证明了对于将$m\ge 3$个对象的连续分类聚合成$2\le p\le m$种类型的最佳、独立且零一致的模糊分类聚合函数必须是加权算术平均值。我们还提供了当$m=p=2$时的特征描述。

更新时间: 2025-07-14 09:53:58

领域: cs.AI,econ.TH

下载: http://arxiv.org/abs/2507.05297v4

LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance

While large multi-modal models (LMMs) demonstrate promising capabilities in segmentation and comprehension, they still struggle with two limitations: inaccurate segmentation and hallucinated comprehension. These challenges stem primarily from constraints in weak visual comprehension and a lack of fine-grained perception. To alleviate these limitations, we propose LIRA, a framework that capitalizes on the complementary relationship between visual comprehension and segmentation via two key components: (1) Semantic-Enhanced Feature Extractor (SEFE) improves object attribute inference by fusing semantic and pixel-level features, leading to more accurate segmentation; (2) Interleaved Local Visual Coupling (ILVC) autoregressively generates local descriptions after extracting local features based on segmentation masks, offering fine-grained supervision to mitigate hallucinations. Furthermore, we find that the precision of object segmentation is positively correlated with the latent related semantics of the <seg> token. To quantify this relationship and the model's potential semantic inferring ability, we introduce the Attributes Evaluation (AttrEval) dataset. Our experiments show that LIRA achieves state-of-the-art performance in both segmentation and comprehension tasks. Code will be available at https://github.com/echo840/LIRA.

Updated: 2025-07-14 09:49:47

标题: LIRA：利用局部交错区域辅助推断大型多模态模型中的分割

摘要: 尽管大型多模态模型（LMMs）展示了在分割和理解方面的有希望的能力，但它们仍然面临两个限制：分割不准确和理解幻觉。这些挑战主要源于弱视觉理解和缺乏细粒度感知的限制。为了缓解这些限制，我们提出了LIRA，这是一个框架，利用了视觉理解和分割之间互补关系的两个关键组件：（1）语义增强特征提取器（SEFE）通过融合语义和像素级特征改善对象属性推理，从而实现更准确的分割；（2）交织的局部视觉耦合（ILVC）在提取基于分割蒙版的局部特征后自回归生成局部描述，为减轻幻觉提供细粒度监督。此外，我们发现对象分割的精度与<seg>标记的潜在相关语义呈正相关。为了量化这种关系和模型的潜在语义推断能力，我们引入了属性评估（AttrEval）数据集。我们的实验表明，LIRA在分割和理解任务中实现了最先进的性能。代码将在https://github.com/echo840/LIRA 上提供。

更新时间: 2025-07-14 09:49:47

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06272v2

Pruning the Tree: Rethinking RPKI Architecture From The Ground Up

Resource Public Key Infrastructure (RPKI) is a critical security mechanism for BGP, but the complexity of its architecture is a growing concern as its adoption scales. Current RPKI design heavily reuses legacy PKI components, such as X.509 EE-certificates, ASN.1 encoding, and XML-based repository protocols, which introduce excessive cryptographic validation, redundant metadata, and inefficiencies in both storage and processing. We show that these design choices, although based on established standards, create significant performance bottlenecks, increase the vulnerability surface, and hinder scalability for wide-scale Internet deployment. In this paper, we perform the first systematic analysis of the root causes of complexity in RPKI's design and experimentally quantify their real-world impact. We show that over 70\% of validation time in RPKI relying parties is spent on certificate parsing and signature verification, much of it unnecessary. Building on this insight, we introduce the improved RPKI (iRPKI), a backwards-compatible redesign that preserves all security guarantees while substantially reducing protocol overhead. iRPKI eliminates EE-certificates and ROA signatures, merges revocation and integrity objects, replaces verbose encodings with Protobuf, and restructures repository metadata for more efficient access. We experimentally demonstrate that our implementation of iRPKI in the Routinator validator achieves a 20x speed-up of processing time, 18x improvement of bandwidth requirements and 8x reduction in cache memory footprint, while also eliminating classes of vulnerabilities that have led to at least 10 vulnerabilities in RPKI software. iRPKI significantly increases the feasibility of deploying RPKI at scale in the Internet, and especially in constrained environments. Our design may be deployed incrementally without impacting existing operations.

Updated: 2025-07-14 09:45:34

标题: 修剪树：从基础重新思考RPKI体系结构

摘要: 资源公钥基础设施（RPKI）是BGP的关键安全机制，但随着其采用规模的扩大，其架构复杂性成为一个日益关注的问题。当前的RPKI设计大量重用传统PKI组件，如X.509 EE证书、ASN.1编码和基于XML的存储库协议，这些都引入了过多的加密验证、冗余元数据以及存储和处理方面的低效性。我们展示了这些设计选择，尽管基于已建立的标准，但会造成显著的性能瓶颈，增加了漏洞面，并且阻碍了广泛部署于互联网的可扩展性。在本文中，我们首次对RPKI设计中复杂性的根本原因进行系统分析，并在实验中量化它们在现实世界中的影响。我们发现，在RPKI依赖方中，超过70％的验证时间花费在证书解析和签名验证上，其中很多是不必要的。基于这一认识，我们引入了改进的RPKI（iRPKI），这是一个向后兼容的重新设计，保留了所有安全保证，同时大幅减少了协议开销。iRPKI消除了EE证书和ROA签名，合并了撤销和完整性对象，用Protobuf替换了冗长的编码，并重新组织了存储库元数据以实现更高效的访问。我们实验性地证明了我们在Routinator验证器中实现iRPKI的处理时间提速了20倍，带宽需求改善了18倍，高速缓存内存占用减少了8倍，同时也消除了导致RPKI软件至少10个漏洞的漏洞类别。iRPKI显著提高了在互联网上规模部署RPKI的可行性，特别是在受限环境中。我们的设计可以逐步部署，而不会影响现有运营。

更新时间: 2025-07-14 09:45:34

领域: cs.CR

下载: http://arxiv.org/abs/2507.01465v2

H$_2$O$_2$RAM: A High-Performance Hierarchical Doubly Oblivious RAM

The combination of Oblivious RAM (ORAM) with Trusted Execution Environments (TEE) has found numerous real-world applications due to their complementary nature. TEEs alleviate the performance bottlenecks of ORAM, such as network bandwidth and roundtrip latency, and ORAM provides general-purpose protection for TEE applications against attacks exploiting memory access patterns. The defining property of this combination, which sets it apart from traditional ORAM designs, is its ability to ensure that memory accesses, both inside and outside of TEEs, are made oblivious, thus termed doubly oblivious RAM (O$_2$RAM). Efforts to develop O$_2$RAM with enhanced performance are ongoing. In this work, we propose H$_2$O$_2$RAM, a high-performance doubly oblivious RAM construction. The distinguishing feature of our approach, compared to the existing tree-based doubly oblivious designs, is its first adoption of the hierarchical framework that enjoys inherently better data locality and parallelization. While the latest hierarchical solution, FutORAMa, achieves concrete efficiency in the classic client-server model by leveraging a relaxed assumption of sublinear-sized client-side private memory, adapting it to our scenario poses challenges due to the conflict between this relaxed assumption and our doubly oblivious requirement. To this end, we introduce several new efficient oblivious components to build a high-performance hierarchical O$_2$RAM (H$_2$O$_2$RAM). We implement our design and evaluate it on various scenarios. The results indicate that H$_2$O$_2$RAM reduces execution time by up to $\sim 10^3$ times and saves memory usage by $5\sim44$ times compared to state-of-the-art solutions.

Updated: 2025-07-14 08:02:09

标题: H$_2$O$_2$RAM：一种高性能的分层双重遗忘RAM

摘要: Oblivious RAM（ORAM）与受信任执行环境（TEE）的结合已经在许多现实世界应用中找到了应用，因为它们的互补性。TEE可以缓解ORAM的性能瓶颈，例如网络带宽和往返延迟，并且ORAM为TEE应用程序提供了通用保护，防止利用内存访问模式的攻击。这种组合的定义特性，使其与传统的ORAM设计区别开来，是其能够确保内部和外部TEE的内存访问是混淆的，因此被称为双重混淆RAM（O$_2$RAM）。为了开发性能增强的O$_2$RAM，目前正在进行努力。在这项工作中，我们提出了H$_2$O$_2$RAM，一个高性能的双重混淆RAM构建。与现有基于树的双重混淆设计相比，我们方法的显著特点是首次采用享有固有更好数据局部性和并行化的分层框架。虽然最新的分层解决方案FutORAMa通过利用对子线性大小客户端私有内存的松弛假设，在经典的客户端-服务器模型中实现了具体的效率，但是将其调整到我们的情景中面临挑战，因为这种松弛假设与我们的双重混淆要求之间存在冲突。为此，我们引入了几个新的高效混淆组件，以构建高性能的分层O$_2$RAM（H$_2$O$_2$RAM）。我们实现了我们的设计，并在各种场景中进行了评估。结果表明，与最先进的解决方案相比，H$_2$O$_2$RAM可以将执行时间缩短高达约$10^3$倍，并将内存使用量节约$5\sim44$倍。

更新时间: 2025-07-14 08:02:09

领域: cs.CR

下载: http://arxiv.org/abs/2409.07167v4

GR-LLMs: Recent Advances in Generative Recommendation Based on Large Language Models

In the past year, Generative Recommendations (GRs) have undergone substantial advancements, especially in leveraging the powerful sequence modeling and reasoning capabilities of Large Language Models (LLMs) to enhance overall recommendation performance. LLM-based GRs are forming a new paradigm that is distinctly different from discriminative recommendations, showing strong potential to replace traditional recommendation systems heavily dependent on complex hand-crafted features. In this paper, we provide a comprehensive survey aimed at facilitating further research of LLM-based GRs. Initially, we outline the general preliminaries and application cases of LLM-based GRs. Subsequently, we introduce the main considerations when LLM-based GRs are applied in real industrial scenarios. Finally, we explore promising directions for LLM-based GRs. We hope that this survey contributes to the ongoing advancement of the GR domain.

Updated: 2025-07-14 07:46:11

标题: GR-LLMs：基于大型语言模型的生成式推荐的最新进展

摘要: 在过去的一年中，生成式推荐（GRs）取得了重大进展，特别是利用大型语言模型（LLMs）强大的序列建模和推理能力来增强整体推荐性能。基于LLM的GRs正在形成一种与判别式推荐明显不同的新范式，表现出取代传统推荐系统的强大潜力，这些系统严重依赖于复杂的手工制作特征。在本文中，我们提供了一份旨在促进LLM-based GRs进一步研究的全面调查。首先，我们概述了基于LLM的GRs的一般基础知识和应用案例。随后，我们介绍了当LLM-based GRs应用于实际工业场景时的主要考虑因素。最后，我们探讨了基于LLM的GRs的有前途的方向。我们希望这项调查有助于推动GR领域的持续发展。

更新时间: 2025-07-14 07:46:11

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2507.06507v2

Access Controls Will Solve the Dual-Use Dilemma

AI safety systems face the dual-use dilemma. It is unclear whether to answer dual-use requests, since the same query could be either harmless or harmful depending on who made it and why. To make better decisions, such systems would need to examine requests' real-world context, but currently, they lack access to this information. Instead, they sometimes end up making arbitrary choices that result in refusing legitimate queries and allowing harmful ones, which hurts both utility and safety. To address this, we propose a conceptual framework based on access controls where only verified users can access dual-use outputs. We describe the framework's components, analyse its feasibility, and explain how it addresses both over-refusals and under-refusals. While only a high-level proposal, our work takes the first step toward giving model providers more granular tools for managing dual-use content. Such tools would enable users to access more capabilities without sacrificing safety, and offer regulators new options for targeted policies.

Updated: 2025-07-14 06:49:24

标题: 访问控制将解决双重用途困境

摘要: 人工智能安全系统面临双重用途困境。不清楚是否应该回应双重用途请求，因为相同的查询可能是无害的，也可能是有害的，这取决于提出请求的人以及原因。为了做出更好的决策，这样的系统需要审查请求的现实世界背景，但目前它们无法获得这些信息。相反，它们有时会做出任意选择，导致拒绝合法查询并允许有害查询，这既损害了实用性又损害了安全性。为了解决这个问题，我们提出了一个基于访问控制的概念框架，只有经过验证的用户才能访问双重用途输出。我们描述了框架的组成部分，分析了其可行性，并解释了它如何解决过度拒绝和不足拒绝的问题。尽管只是一个高层次的提议，我们的工作迈出了向模型提供者提供更精细工具以管理双重用途内容的第一步。这样的工具将使用户能够在不牺牲安全性的情况下访问更多功能，并为监管机构提供了针对性政策的新选择。

更新时间: 2025-07-14 06:49:24

领域: cs.AI

下载: http://arxiv.org/abs/2505.09341v3

PRIME: Large Language Model Personalization with Cognitive Memory and Thought Processes

Large language model (LLM) personalization aims to align model outputs with individuals' unique preferences and opinions. While recent efforts have implemented various personalization methods, a unified theoretical framework that can systematically understand the drivers of effective personalization is still lacking. In this work, we integrate the well-established cognitive dual-memory model into LLM personalization, by mirroring episodic memory to historical user engagements and semantic memory to long-term, evolving user beliefs. Specifically, we systematically investigate memory instantiations and introduce a unified framework, PRIME, using episodic and semantic memory mechanisms. We further augment PRIME with a novel personalized thinking capability inspired by the slow thinking strategy. Moreover, recognizing the absence of suitable benchmarks, we introduce a dataset using Change My View (CMV) from Reddit, specifically designed to evaluate long-context personalization. Extensive experiments validate PRIME's effectiveness across both long- and short-context scenarios. Further analysis confirms that PRIME effectively captures dynamic personalization beyond mere popularity biases.

Updated: 2025-07-14 05:54:45

标题: PRIME：具有认知记忆和思维过程的大型语言模型个性化

摘要: 大型语言模型（LLM）个性化旨在使模型输出与个体独特的偏好和观点相一致。尽管最近的努力已经实施了各种个性化方法，但仍然缺乏一个统一的理论框架，可以系统地理解有效个性化的驱动因素。在这项工作中，我们将广为人知的认知双存储模型整合到LLM个性化中，通过将情景记忆映射到历史用户参与和将语义记忆映射到长期、不断发展的用户信念。具体而言，我们系统地研究记忆实例化，并引入一个统一的框架，PRIME，使用情景和语义记忆机制。我们进一步增强PRIME的能力，灵感来自于缓慢思考策略的新颖个性化思维能力。此外，鉴于缺乏合适的基准，我们介绍了一个专门设计用于评估长篇背景个性化的Change My View（CMV）数据集，来源于Reddit。广泛的实验证实了PRIME在长篇和短篇背景场景下的有效性。进一步的分析证实，PRIME有效地捕捉到动态个性化，超越了仅仅是流行偏见。

更新时间: 2025-07-14 05:54:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2507.04607v2

PyVision: Agentic Vision with Dynamic Tooling

LLMs are increasingly deployed as agents, systems capable of planning, reasoning, and dynamically calling external tools. However, in visual reasoning, prior approaches largely remain limited by predefined workflows and static toolsets. In this report, we present PyVision, an interactive, multi-turn framework that enables MLLMs to autonomously generate, execute, and refine Python-based tools tailored to the task at hand, unlocking flexible and interpretable problem-solving. We develop a taxonomy of the tools created by PyVision and analyze their usage across a diverse set of benchmarks. Quantitatively, PyVision achieves consistent performance gains, boosting GPT-4.1 by +7.8% on V* and Claude-4.0-Sonnet by +31.1% on VLMsAreBlind-mini. These results point to a broader shift: dynamic tooling allows models not just to use tools, but to invent them, advancing toward more agentic visual reasoning.

Updated: 2025-07-14 04:36:19

标题: PyVision：具有动态工具的主体视觉

摘要: LLMs越来越多地被部署为代理，这些系统能够进行规划、推理和动态调用外部工具。然而，在视觉推理方面，先前的方法主要受限于预定义的工作流程和静态工具集。在本报告中，我们提出了PyVision，这是一个交互式、多轮的框架，可以使MLLMs自主生成、执行和完善针对特定任务的基于Python的工具，从而实现灵活且可解释的问题解决。我们开发了一个由PyVision创建的工具分类法，并分析了它们在各种基准测试中的使用情况。定量上，PyVision实现了一致的性能提升，在V*上将GPT-4.1提高了+7.8%，在VLMsAreBlind-mini上将Claude-4.0-Sonnet提高了+31.1%。这些结果指向一个更广泛的转变：动态工具不仅允许模型使用工具，还允许模型发明工具，从而推进更具代理性的视觉推理。

更新时间: 2025-07-14 04:36:19

领域: cs.CL,cs.AI,cs.CV

下载: http://arxiv.org/abs/2507.07998v2

Democratizing High-Fidelity Co-Speech Gesture Video Generation

Co-speech gesture video generation aims to synthesize realistic, audio-aligned videos of speakers, complete with synchronized facial expressions and body gestures. This task presents challenges due to the significant one-to-many mapping between audio and visual content, further complicated by the scarcity of large-scale public datasets and high computational demands. We propose a lightweight framework that utilizes 2D full-body skeletons as an efficient auxiliary condition to bridge audio signals with visual outputs. Our approach introduces a diffusion model conditioned on fine-grained audio segments and a skeleton extracted from the speaker's reference image, predicting skeletal motions through skeleton-audio feature fusion to ensure strict audio coordination and body shape consistency. The generated skeletons are then fed into an off-the-shelf human video generation model with the speaker's reference image to synthesize high-fidelity videos. To democratize research, we present CSG-405-the first public dataset with 405 hours of high-resolution videos across 71 speech types, annotated with 2D skeletons and diverse speaker demographics. Experiments show that our method exceeds state-of-the-art approaches in visual quality and synchronization while generalizing across speakers and contexts. Code, models, and CSG-405 are publicly released at https://mpi-lab.github.io/Democratizing-CSG/

Updated: 2025-07-14 04:35:26

标题: 民主化高保真共语手势视频生成

摘要: 共语言手势视频生成旨在合成讲话者的逼真、与音频同步的视频，包括同步的面部表情和身体手势。这项任务面临挑战，因为音频和视觉内容之间存在显著的一对多映射，受到大规模公共数据集稀缺和高计算需求的进一步复杂化。我们提出了一个轻量级框架，利用2D全身骨架作为高效的辅助条件，将音频信号与视觉输出连接起来。我们的方法引入了一个扩散模型，根据细粒度音频片段和从讲话者参考图像提取的骨架，通过骨架-音频特征融合来预测骨架运动，以确保严格的音频协调和身体形状一致性。生成的骨架然后输入到一个现成的人类视频生成模型中，与讲话者的参考图像一起合成高保真度的视频。为了使研究民主化，我们提出了CSG-405，这是第一个公共数据集，包括71种语音类型的405小时高分辨率视频，注释有2D骨架和多样化的讲话者人口统计信息。实验证明，我们的方法在视觉质量和同步方面超过了最先进的方法，同时在不同讲话者和环境中具有泛化能力。代码、模型和CSG-405已经在https://mpi-lab.github.io/Democratizing-CSG/ 上公开发布。

更新时间: 2025-07-14 04:35:26

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2507.06812v2

Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching

Diffusion models, a type of generative model, have shown promise in time series forecasting. But they face limitations like rigid source distributions and limited sampling paths, which hinder their performance. Flow matching offers faster generation, higher-quality outputs, and greater flexibility, while also possessing the ability to utilize valuable information from the prediction errors of prior models, which were previously inaccessible yet critically important. To address these challenges and fully unlock the untapped potential of flow matching, we propose Conditional Guided Flow Matching (CGFM). CGFM extends flow matching by incorporating the outputs of an auxiliary model, enabling a previously unattainable capability in the field: learning from the errors of the auxiliary model. For time series forecasting tasks, it integrates historical data as conditions and guidance, constructs two-sided conditional probability paths, and uses a general affine path to expand the space of probability paths, ultimately leading to improved predictions. Extensive experiments show that CGFM consistently enhances and outperforms state-of-the-art models, highlighting its effectiveness in advancing forecasting methods.

Updated: 2025-07-14 03:33:58

标题: 穿越预测的最后一英里：通过条件引导流匹配增强时间序列预测

摘要: 扩散模型是一种生成模型，在时间序列预测中表现出潜力。但是它们面临诸如严格的源分布和有限的采样路径等限制，这些限制阻碍了它们的性能。流匹配提供更快的生成速度、更高质量的输出和更大的灵活性，同时还具有利用先前模型预测误差中宝贵信息的能力，这些信息以前无法获取但至关重要。为了解决这些挑战并充分释放流匹配的潜力，我们提出了条件引导流匹配（CGFM）。CGFM通过整合辅助模型的输出扩展了流匹配，实现了领域中以前无法实现的能力：从辅助模型的错误中学习。对于时间序列预测任务，它将历史数据作为条件和指导，构建双向条件概率路径，并使用一般仿射路径扩展概率路径空间，最终导致改进的预测。大量实验证明，CGFM持续增强并优于最先进的模型，突显了其在推进预测方法方面的有效性。

更新时间: 2025-07-14 03:33:58

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2507.07192v2