👍 29
06/03 00:00
Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on explicit user requests, which surface only the problems the user has noticed, while many other important problems coexist, hidden in plain sight, within the broader user context, with their
中文介绍 提出TIDE框架,通过模板引导的迭代过程主动发现用户上下文中隐藏的多个问题,而非被动响应显式请求,提升代理的全面性问题发现能力。
👍 27
06/04 00:00
Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in
中文介绍 评估角色扮演语言代理(RPLA)在故事进程中角色心理轨迹的一致性。提出ArcANE,检测代理是否在适当时间保持角色演化,而非仅回忆事实。
👍 25
06/03 00:00
We introduce VideoKR, the first large-scale training corpus specifically designed to strengthen knowledge- and reasoning-intensive video understanding. It comprises 315K video reasoning examples over 145K newly collected, CC-licensed, expert-domain videos. We develop a human-in-the-loop, skill-orien
中文介绍 构建首个大规模知识推理密集型视频理解训练语料VideoKR,含31.5万推理示例和14.5万专家域视频,采用人工在环技能导向标注,提升视频深度理解。
👍 22
06/04 00:00
Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual c
中文介绍 提出AdaPlanBench,评估大语言模型代理在世界约束与用户约束渐进披露下的自适应规划能力,填补现有基准对动态双约束场景的空白。
👍 19
06/04 00:00
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To tra
中文介绍 提出强化学习方法,使大语言模型通过上下文学习翻译未见语言,避免过拟合特定语言,提升零样本迁移能力。
👍 19
06/03 00:00
Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured ob
中文介绍 提出ZipSplat,减少3D高斯泼溅中冗余高斯数量,使表示预算匹配场景复杂度,用更少高斯获得更优重建质量。
👍 18
06/02 00:00
While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet,
中文介绍 提出RobotValues基准,评估家庭机器人在价值冲突情境中(如人类自主、效率、社会适宜性)的决策行为,超越单纯任务完成指标。
👍 15
06/03 00:00
Experience internalization converts contextual experience from past interactions into reusable parametric capability, offering a promising path toward continual learning in large language models (LLMs). While prior work has predominantly focused on single-iteration transfer, we discover that under m
中文介绍 提出多次迭代经验内化方法,使大语言模型代理通过反复内化上下文经验持续演进,相比单次内化显著提升参数能力积累。
👍 15
06/04 00:00
Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs is a promising yet challenging frontier field. Existing unified frameworks predominantly rely on massive models (typically 13B parameters or more) and incorporate source video conditions for
中文介绍 提出LoomVideo,轻量级统一多模态输入的视频生成与编辑框架,不依赖13B以上大模型,支持交错多模态条件,提升效率。
👍 14
06/03 00:00
We study the personal camera roll visual question answering setting. In this setting, a conversational AI assistant can access a user's personal camera roll and retrieve relevant photos to answer queries, ranging from simple factual questions (e.g., ``Name of the food I tried yesterday?'') to more o
中文介绍 针对个人相册视觉问答场景,提出AI助手访问用户相册检索相关照片回答多样化查询,涵盖事实性到复杂推理问题。
👍 13
06/02 15:28
High-quality pretraining data is a central ingredient in modern language models, but German-language resources remain far less developed than their English counterparts: they are often smaller, less carefully curated, weakly documented, and rarely validated through controlled training experiments. W
中文介绍 构建KletterMix高质量德语预训练数据集,解决德语资源规模小、整理弱、验证少的问题,通过受控训练实验验证其提升模型性能。
👍 9
06/04 00:00
Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success cri
中文介绍 提出无监督技能发现方法,自动从推理经验中提取可重用过程性知识,增强数据分析代理,无需人工标注或更新模型参数。
👍 6
06/04 00:00
Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based
中文介绍 提出PropMe倾向感知框架,评估大语言模型在普通使用场景下无意泄露训练数据的程度,而非强制提取,更贴近实际部署风险。
👍 4
06/02 00:00
Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models' performance, yet real-world deployment is constrained by strict computational budgets. In this work, we formulate inference budget allocation as a global constrained optimization problem governed by economic
中文介绍 从经济学视角将推理预算分配建模为全局约束优化问题,提出最优分配策略,在严格计算预算下最大化大语言模型性能。
👍 4
06/03 00:00
Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key technical challenge for LLM-based deontic reasoning is that the relevant
中文介绍 提出DAR(Deontic Reasoning with Agentic Harnesses),通过代理性约束增强大语言模型在税收、移民等场景中应用规则进行道义推理的能力。
👍 3
05/28 00:00
Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajectories into compact memory. However, existing approaches typically train these memory policies using outcome-based reinforcement learning, failing to localize where intermediate memory quality
中文介绍 提出元认知记忆策略优化,通过细粒度中间奖励定位记忆质量,提升长程任务中大语言模型代理的递归记忆策略,优于纯结果强化学习。
👍 3
06/04 00:00
We propose world-language-action (WLA) models as a new class of embodied foundation models. WLA takes textual instructions, images, and robot states as inputs to jointly predict textual subtasks, subgoal images, and robot actions, conjoining the world modeling interface to learn from extensive egoce
中文介绍 提出世界-语言-动作(WLA)模型,统一世界建模、语言推理与动作合成,联合预测文本子任务、子目标图像和机器人动作,学习第一人称视频。
👍 3
06/03 00:00
Large language models (LLMs) are increasingly proposed as clinical agents, yet static, single-turn benchmarks cannot capture how a model dynamically delivers care across an encounter: gathering information, planning treatment, and adapting longitudinal management across successive patient states. Me
中文介绍 提出用标准化病人案例动态评估大语言模型在临床决策中的表现,涵盖信息收集、治疗计划和长期管理,超越静态基准。
👍 3
06/02 00:00
Large language model (LLM) agents are evolving from request-response assistants into long-running software actors: they maintain state across model calls, fork subtasks, wait for external events, request human authority, generate tools, and perform side effects that must be resumed and audited. This
中文介绍 提出Agent libOS运行时,受库操作系统启发,支持大语言模型代理长期运行、状态维护、子任务分叉、事件等待及权限控制,提升可靠性与可审计性。
👍 2
06/02 00:00
Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science. However, existing approaches remain fundamentally limited by their static action sets and lack of principled long-horizon context management, hindering their ability to accumulate reusable
中文介绍 提出EvoDS自演进数据科学代理,结合技能学习与长程上下文管理,动态积累可重用动作,突破静态动作集限制,提升自动化数据科学能力。