Deployment-Time Memorization in Foundation-Model Agents
arXiv:2606.10062v1 Announce Type: new
Abstract: Foundation-model agents are increasingly long-lived systems that remember users across interactions, making memorization an explicit deployment-time function rather than solely a property of model weights. Existing work addresses parametric memorizati...
From Senses to Decisions: The Information Flow of Auditory and Visual Perception in Multimodal LLMs
arXiv:2606.10147v1 Announce Type: new
Abstract: Multimodal Large Language Models (MLLMs) can listen and see, but how do audio and visual signals actually travel through the network to shape an answer? Despite their growing role in research and real-world applications, the internal pathways through ...
Syll: Open-Source Personal Automation with Cross-Surface Execution
arXiv:2606.07594v1 Announce Type: new
Abstract: Personal AI agents must increasingly operate across APIs, shells, web surfaces, and desktop GUIs, yet many systems remain tuned to a single interface and offer limited support for user teaching and auditability. We present Syll, an open-source, self-h...
OmniMem: Perturbation-aware Memory Compression for Streaming Audio-Visual LLMs
arXiv:2606.07577v1 Announce Type: new
Abstract: Audio-visual large language models (LLMs) hold strong promise for long-form video understanding, yet their long-video inference is fundamentally limited by the linear growth of video tokens and key-value (KV) caches. We present OmniMem, a memory-effic...
PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow
arXiv:2606.07549v1 Announce Type: new
Abstract: Recent advances in Multimodal Large Language Models (MLLMs) and agent workflows have shown strong promise for computational pathology, yet reliable patch-level reasoning remains challenging. End-to-end pathology MLLMs often hallucinate morphological f...
Emergence via Phase Transitions: Mechanism Landscapes and Universal Convergence Across Complex Systems
arXiv:2606.07563v1 Announce Type: new
Abstract: Across machine learning, biology, and physics, independently evolving systems often converge toward strikingly similar high-level structures despite radically different microscopic details. Grokking circuits converge across random seeds, evolutionary ...
Boundary Variance Inflation Causes Acquisition Bias in Gaussian Processes
arXiv:2606.07561v1 Announce Type: new
Abstract: Gaussian processes with stationary kernels on bounded domains exhibit inflated posterior variance near the boundary. Despite being a long-recognized artifact in geostatistics and a source of over-exploration in Bayesian optimization, the causes and ef...
SPIN: Decentralized Swarm Control via Tensorized Policy Coordination
arXiv:2606.07557v1 Announce Type: new
Abstract: Decentralized multi-agent swarm coordination on resource-constrained edge platforms remains fundamentally bottlenecked by the exponential scaling of joint action spaces and high-latency communication overhead. This paper introduces the Swarm Policy In...
MedicalRec: Medical recommender system for image classification without retraining
arXiv:2606.07553v1 Announce Type: new
Abstract: The emergence of machine learning and deep learning has revolutionized the efficiency of diagnostic, therapeutic, and administrative systems in healthcare. However, this rapid adoption has come at the cost of requiring significant computing power and ...
FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models
arXiv:2606.06547v1 Announce Type: new
Abstract: Diffusion Large Language Models (dLLMs) refine tokens iteratively but commit them irreversibly, leading to a "stability lag" where early decisions remain fragile even after being written. We reveal that Post-Training Quantization (PTQ) error easily fl...
MacArena: Benchmarking Computer Use Agents on an Online macOS Environment
arXiv:2606.06560v1 Announce Type: new
Abstract: Computer-use agents (CUAs) operate graphical user interfaces (GUIs) through vision and control primitives, and their capabilities have advanced rapidly, driven in part by standardized online evaluation benchmarks such as OSWorld, which serve both as e...
arXiv:2606.06518v1 Announce Type: new
Abstract: Sudoku is a representative constraint satisfaction problem that requires global structural reasoning under strict discrete constraints. The existing works of solving Sudoku mainly focus on two dominant approaches, i.e., traditional heuristic and deep ...
SafeGene: Reusable Adapters for Transferable Safety Alignment
arXiv:2606.06519v1 Announce Type: new
Abstract: Open-weight LLMs are increasingly fine-tuned into customized assistants, but downstream fine-tuning can weaken safety alignment and make models more vulnerable to malicious prompts, even when the training data is not intentionally harmful. This create...
Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory
arXiv:2606.06523v1 Announce Type: new
Abstract: Equipping Large Language Models (LLMs) to execute reliable multi-step workflows has become a central challenge in artificial intelligence. Despite recent advances in LLMs' agentic capabilities, most agent systems still lack formal methods for specifyi...
CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions
arXiv:2606.06526v1 Announce Type: new
Abstract: Large language models have made substantial progress on mathematical reasoning, but existing benchmarks typically evaluate well-specified problems with final answers, step-by-step solutions, or complete proofs. They do not capture collaborative open-p...
Elmes*: Automated Construction of Fine-Grained Evaluation Rubrics for Large Language Models in Long-Tail Educational Scenarios
arXiv:2606.06546v1 Announce Type: new
Abstract: Evaluating large language models (LLMs) for education requires measuring how models teach, not only what they know. Existing benchmarks emphasize domain-general correctness or depend on manually designed rubrics that scale poorly to long-tail pedagogi...
Introducing the Third Generation of Apple’s Foundation Models
Our next generation of Apple Intelligence is centered around our users, integrated deeply into our operating systems, and powered by a bold new architecture with privacy at its core.
At the heart of this architecture is our third generation of Apple Foundation Models (AFM), a family of five foundati...