Training Data for

Computer Use Agents

High-fidelity human trajectories with full key, click, touch, and pixel traces for BC, reward modeling, and RL fine-tuning.

Mouse Click
9.90 s
Before
After
BeforeAfter
0115
1
Frame 1
Mouse Click
9.90 s
at (61, 206)
2
Frame 2
Mouse Click
18.56 s
at (1063, 548)
3
Frame 3
Mouse Click
27.00 s
at (859, 63)
4
Frame 4
Mouse Click
47.30 s
at (1394, 870)
5
Frame 5
Combo Key Press
51.45 s
CommandLeft + "H"
6
Frame 6
Mouse Click
58.51 s
at (1300, 666)
7
Frame 7
Mouse Click
61.00 s
at (1153, 418)
8
Frame 8
Mouse Click
64.58 s
at (950, 445)
9
Frame 9
Mouse Click
74.41 s
at (264, 313)
10
Frame 10
Mouse Click
77.31 s
at (847, 352)
11
Frame 11
Mouse Click
90.25 s
at (1322, 877)
12
Frame 12
Mouse Click
98.28 s
at (1126, 298)
13
Frame 13
Mouse Click
103.22 s
at (1348, 151)
14
Frame 14
Mouse Move
107.80 s
Start: (1279, 67) → End: (1182, 248)
15
Frame 15
Mouse Click
115.84 s
at (1058, 598)

Used by teams developing
computer use agents

ADEPT
Perplexity
rabbit
Twin

6M+ Actions Across 150+ Tools

Use off-the-shelf datasets or request custom data by task, domain, and environment.

About This Benchmark

This benchmark includes 2,500 questionscreated by subject-matter experts across multiple domains such as mathematics, humanities, and natural sciences.

Each question has a clear and verifiable solution, but requires advanced web retrieval and reasoning.

Methodology

  • Evaluation:

    Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM judge (GPT-4.1).

  • Cost Calculation:

    Cost reflects the average cost per query across all questions run. This includes both the search API call and LLM token cost.

6M+ Actions Across 150+ Tools

Use off-the-shelf datasets or request custom data by task, domain, and environment.

Dataset Overview

Demonstrations
160.9K
Action
6.04 Million
Length
1593 hours
Tools
128
Domain Distribution
Human Actions
AllDesktopMobile

Every click, scroll, and keypress — logged at millisecond granularity.

Tap
Mobile
Tap on Login button
Double Tap
Mobile
Double tap to zoom image
Long Press
Mobile
Long press on message to open options
Scroll
Mobile
Scroll down product list
Pinch Zoom
Mobile
Pinch zoom in on image
Drag
Mobile
Drag card to reorder
Text Entry
Mobile
Typed "features" in search field
Orientation Change
Mobile
Rotated screen to landscape
Back Navigation
Mobile
Navigated back to previous screen
Mouse Movement
Desktop
Mouse moved across navigation bar
Mouse Clicks
Desktop
Left click on Pricing tab
Mouse Scroll
Desktop
Scroll down landing page
Drag and Drop
Desktop
Drag file into upload area
Text Input
Desktop
Typed "pricing" in search bar
Keypress
Desktop
Pressed Ctrl+S to save

Verifiable Computer Use Trajectories

Verifiable Data Quality

Every demo includes screen recordings and screenshots, enabling full traceability.

Action-Level Reasoning Data

Demonstrations include step-level actions paired with human decision context.

Built to Scale

We can create 10,000 hours of human demonstrations per week.

What Files You Get

Each demonstration includes a complete set of synchronized files for training and evaluation.

  • Event Logs (.txt)
  • Screen Recording (.mp4)
  • Structured Trajectory (.json)
  • Frames (.jpg)

Raw Data View

Displaying raw mouse and keyboard event data with timestamps and coordinates.

Pricing

Per-task pricing with clear visibility. Custom data priced by scope.

Licensing

Purchased datasets are owned by you and licensed for model training.

Privacy

PII is excluded by default. When present, explicit user consent is obtained.

Common Questions

Ready to train better AI agents?