👍 73
06/04 00:00
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolv
中文介绍 针对代码语言模型适应软件演化的高成本与脆弱性问题,提出Code2LoRA,用超网络为每个仓库生成LoRA适配器,避免全量微调与长输入检索,实现低成本、可演化的仓库级适配。
👍 38
06/03 00:00
Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on explicit user requests, which surface only the problems the user has noticed, while many other important problems coexist, hidden in plain sight, within the broader user context, with their
中文介绍 现有智能体仅响应显式用户请求,忽视隐含问题。TIDE通过模板引导的迭代探索主动发现多个问题,在文档、工具和代码场景中提升问题覆盖率。
👍 38
06/04 00:00
Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual c
中文介绍 现实规划需同时满足世界约束与用户约束,且约束逐步揭示。AdaPlanBench评估LLM智能体在渐进披露双重约束下的自适应规划能力,填补现有基准空白。
👍 31
06/05 00:00
We introduce MMAE, a Massive Multitask Audio Editing benchmark, serving as the first comprehensive evaluation testbed designed for general-purpose instruction-based audio editing. Spurred by the shift toward intelligent creation, interactive editing has rapidly expanded from visual domains, pioneere
中文介绍 MMAE是首个面向通用指令音频编辑的大规模多任务基准,涵盖多种编辑操作与音频类型,为评估交互式音频编辑提供系统测试平台。
👍 31
06/05 00:00
Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. However, they struggle to function as off-the-shelf embedding models, leading to suboptimal performance on massive text embedding benchmarks. In this paper, we identify a potential cause underlyi
中文介绍 发现LLM的解嵌入矩阵可视为文本嵌入的特征透镜,通过简单变换将其转化为有效嵌入模型,显著提升零样本文本嵌入性能。
👍 25
06/02 00:00
While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet,
中文介绍 家庭机器人常面临价值冲突场景(如效率 vs 自主性)。RobotValues评估机器人如何在任务完成之外平衡人类价值观,提供冲突情境下的行为准则。
👍 20
06/05 00:00
Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile controllability required by practical scenarios. To bridge this gap, we present AnchorWorld, a framework that advances egocentric simulation through enhanced interaction integrity and a flexi
中文介绍 AnchorWorld提出基于视图演化定制的具身第一人称世界模拟框架,增强交互完整性与灵活性,支持用户自定义模拟演化过程。
👍 16
06/04 00:00
Video generation models have made impressive strides in synthesizing visually compelling content, yet their outputs remain confined to the virtual domain. A natural question follows: how well do these models reflect the physical world when their generated videos leave the screen and enter reality? W
中文介绍 测试视频生成模型输出的物理可行性:将生成视频直接转换为机器人可执行动作,评估模型对物理世界的理解程度。
👍 14
06/04 00:00
Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term interactions. As these memories grow, they may reinforce one another, diverge across contexts, or directly conflict, making correct assistance depend on memory relations rather than isolated r
中文介绍 持久AI助手需管理大量相关记忆。SubtleMemory基准测试智能体对记忆间细粒度关系(强化、分歧、冲突)的判别能力,超越孤立检索。
👍 13
06/04 00:00
Existing benchmarks evaluate Tool-Integrated Reasoning (TIR) in LLMs on idealized ''happy paths'', largely overlooking real-world tool failures. We introduce ToolMaze, a benchmark for dynamic path discovery and error recovery in TIR agents. To separate systematic replanning from blind trial-and-erro
中文介绍 现有工具集成推理基准仅考虑理想路径。ToolMaze引入动态路径发现与异常恢复评估,通过迷宫任务区分系统性重规划与盲目试错。
👍 12
06/05 00:00
Video understanding is being rapidly transformed by multimodal large language models (MLLMs), as research moves from short clips to long, multimodal, and knowledge-intensive video scenarios. These scenarios require models to handle sparse evidence, long-range dependencies, multimodal alignment, and
中文介绍 多模态大模型(MLLM)视频理解从短片段扩展到长视频、多模态知识密集型场景。提出「观察-记忆-推理」框架,处理稀疏证据与长程依赖。
👍 10
06/04 00:00
Vision-Language-Action (VLA) models leverage the rich world knowledge of pretrained vision-language models (VLMs) to enable instruction-following robotic manipulation. However, the structural mismatch between VLM semantic spaces and embodied control policies often hinders the learning of precise per
中文介绍 VLA模型借助VLM世界知识实现指令跟随操作,但语义空间与控制策略存在结构不匹配。AffordanceVLA通过可交互感知理解生成精确动作,弥合语义-动作鸿沟。
👍 9
06/04 00:00
While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thought. They often struggle to infer unobserved layouts, maintain cross-view consistency, and reason fr
中文介绍 VLM空间推理局限于观察图像与文本思维链。提出「用想象思考」,结合世界模拟器进行具象空间推理,实现未观察布局推断与跨视角一致性推理。
👍 9
06/02 00:00
Selection is a core operation in interactive image editing. To be practical, a user should be able to specify and disambiguate the desired selection region through either text or click-based interactions, and the system should support selecting not only objects but also other criteria, such as mater
中文介绍 交互式图像编辑需要统一的对象与材质选择。MAOAM利用VLM同时支持文本/点击指定,实现对象与材质(如金属、布料)的灵活选择。
👍 8
06/05 00:00
We present dots.tts, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that models speech in a continuous latent space. Compared with existing continuous autoregressive models, our key innovations are threefold. First, we train an AudioVAE with multiple objectives to bui
中文介绍 dots.tts是一个2B参数连续自回归文本转语音基座模型,在连续潜空间建模语音。创新包括多目标AudioVAE与复合训练策略,提升生成质量。
👍 8
06/04 00:00
Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based
中文介绍 现有记忆化评估大多测量模型在强制提示下的数据泄露,而非普通使用。PropMe提出倾向感知框架,基于前缀与无前缀生成对比评估真实泄露倾向。
👍 7
06/04 00:00
Causal graphs provide a high-level language for making mechanisms transparent. Recent work uses Large Language Models (LLMs) to recover causal graphs of external-world processes. Instead, in this paper, we use causal graphs to model LLM inference itself, providing stakeholders with a transparent vie
中文介绍 利用因果图建模LLM推理过程本身,为利益相关者提供透明解释。通过反事实链追溯推理路径,揭示模型决策中的因果关系。
👍 7
06/04 00:00
Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint segments for a single query -- a setting we term One-to-Many Temporal
中文介绍 时序定位通常只检索单片段。提出「一对多时序定位」,针对单查询定位多个不相交片段,并构建对应基准与方法。
👍 5
06/04 00:00
Self-evolving agents requires adaptation after deployment, but existing approaches assume a usable learning loop, such as curated skills, successful trajectories, or verifier signals. Real open-world deployments may provide none of these, offering only a task prompt. In this work, we study open-worl
中文介绍 自进化智能体需在开放世界部署后适应,但可能缺乏精心设计的技能/轨迹/验证信号。OpenSkill仅靠任务提示实现开放世界自我进化,无需外部监督。
👍 5
06/01 00:00
Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existing video multimodal large language models (video MLLMs) usually encode each sampled frame as an independent RGB image, causing visual tokens to repeat content already present in earlier frame
中文介绍 视频MLLM对每帧独立编码,造成大量重复视觉token。AdaCodec提出预测性视觉压缩编码,利用帧间冗余减少token数量,提升视频处理效率。