LLM-in-Sandbox Elicits General Agentic Intelligence

Daixuan Cheng^αβ, Shaohan Huang^β, Yuxian Gu^γ, Huatong Song^α, Guoxin Chen^α

Li Dong^β, Wayne Xin Zhao^α, Ji-Rong Wen^α, Furu Wei^β

^αGSAI, Renmin University of China ^βMicrosoft Research ^γTsinghua University

We introduce LLM-in-Sandbox, enabling LLMs to explore within a code sandbox (i.e., a virtual computer), to elicit general intelligence in non-code domains. We first demonstrate that strong LLMs, without additional training, exhibit generalization capabilities to leverage the code sandbox for non-code tasks. For example, LLMs spontaneously access external resources to acquire new knowledge, leverage the file system to handle long contexts, and execute scripts to satisfy formatting requirements. We further show that these agentic capabilities can be enhanced through LLM-in-Sandbox Reinforcement Learning, which uses only non-agentic data to train models for sandbox exploration. Experiments demonstrate that LLM-in-Sandbox, in both training-free and post-trained settings, achieves robust generalization spanning mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following. Finally, we analyze LLM-in-Sandbox's efficiency from computational and system perspectives, and open-source it as a Python package to facilitate real-world deployment.

Demo Video

Watch LLM-in-Sandbox solve a chemistry problem: converting IUPAC names to SMILES notation

Task: Given a chemical compound's IUPAC name, identify the correct SMILES representation from multiple choices. The agent downloads PubChem package and uses it to convert the name to SMILES. Gold Answer: A

Example Gallery

🧪

Chemical Analysis

Convert IUPAC chemical names to SMILES notation

📥 None 📤 Answer

📐

Math Problem

Solve complex geometry problems with code assistance

📥 None 📤 Answer

📝

Instruction Following

Generate text following strict formatting constraints

📥 None 📤 Answer

📚

Long Context QA

Analyze multiple documents to answer complex questions

📥 Documents 📤 Answer

🗺️

Travel Planning

Create a 3-day Tokyo trip itinerary with interactive map

📥 None 📤 HTML Map

🎨

Poster Design

Design a promotional poster for a tech conference

📥 Event JSON 📤 SVG + PNG

🎬

Video Creation

Create a birthday countdown video with animations

📥 Theme JSON 📤 MP4 Video

🎵

Music Composition

Compose original ambient piano music with MIDI

📥 None 📤 MIDI + Audio

Citation

If you find our work helpful, please cite us:

@article{cheng2026llm,
  title={Llm-in-sandbox elicits general agentic intelligence},
  author={Cheng, Daixuan and Huang, Shaohan and Gu, Yuxian and Song, Huatong and Chen, Guoxin and Dong, Li and Zhao, Wayne Xin and Wen, Ji-Rong and Wei, Furu},
  journal={arXiv preprint arXiv:2601.16206},
  year={2026}
}

LLM-in-Sandbox Elicits General Agentic Intelligence

Quick Start

Demo Video

Results

Example Gallery

Chemical Analysis

Math Problem

Instruction Following

Long Context QA

Travel Planning

Poster Design

Video Creation

Music Composition

Citation

LLM-in-Sandbox Elicits General Agentic Intelligence

Quick Start

Demo Video

Results

Example Gallery

Chemical Analysis

Math Problem

Instruction Following

Long Context QA

Travel Planning

Poster Design

Video Creation

Music Composition

Demo Title

Citation