Training Data for

Computer Use Agents

High-fidelity human trajectories with full key, click, touch, and pixel traces for BC, reward modeling, and RL fine-tuning.

Explore Datasets

Mouse Click

9.90 s

BeforeAfter

0115

Mouse Click

9.90 s

at (61, 206)

Mouse Click

18.56 s

at (1063, 548)

Mouse Click

27.00 s

at (859, 63)

Mouse Click

47.30 s

at (1394, 870)

Combo Key Press

51.45 s

CommandLeft + "H"

Mouse Click

58.51 s

at (1300, 666)

Mouse Click

61.00 s

at (1153, 418)

Mouse Click

64.58 s

at (950, 445)

Mouse Click

74.41 s

at (264, 313)

Mouse Click

77.31 s

at (847, 352)

Mouse Click

90.25 s

at (1322, 877)

Mouse Click

98.28 s

at (1126, 298)

Mouse Click

103.22 s

at (1348, 151)

Mouse Move

107.80 s

Start: (1279, 67) → End: (1182, 248)

Mouse Click

115.84 s

at (1058, 598)

Used by teams developing
computer use agents

ADEPT

Perplexity

rabbit

Twin

6M+ Actions Across 150+ Tools

Use off-the-shelf datasets or request custom data by task, domain, and environment.

About This Benchmark

This benchmark includes 2,500 questionscreated by subject-matter experts across multiple domains such as mathematics, humanities, and natural sciences.

Each question has a clear and verifiable solution, but requires advanced web retrieval and reasoning.

Methodology

Evaluation:
Results are based on tests run using official Search MCP servers provided as an MCP tool to OpenAI's GPT-5 model using the Responses API. In all cases, the MCP tools were limited to only the appropriate web search tool. Answers were evaluated using an LLM judge (GPT-4.1).
Cost Calculation:
Cost reflects the average cost per query across all questions run. This includes both the search API call and LLM token cost.

6M+ Actions Across 150+ Tools

Use off-the-shelf datasets or request custom data by task, domain, and environment.

Dataset Overview

Demonstrations

160.9K

Action

6.04 Million

Length

1593 hours

Tools

128

Domain Distribution

Human Actions

AllDesktopMobile

Every click, scroll, and keypress — logged at millisecond granularity.

Tap

Mobile

Tap on Login button

Double Tap

Mobile

Double tap to zoom image

Long Press

Mobile

Long press on message to open options

Scroll

Mobile

Scroll down product list

Pinch Zoom

Mobile

Pinch zoom in on image

Drag

Mobile

Drag card to reorder

Text Entry

Mobile

Typed "features" in search field

Orientation Change

Mobile

Rotated screen to landscape

Back Navigation

Mobile

Navigated back to previous screen

Mouse Movement

Desktop

Mouse moved across navigation bar

Mouse Clicks

Desktop

Left click on Pricing tab

Mouse Scroll

Desktop

Scroll down landing page

Drag and Drop

Desktop

Drag file into upload area

Text Input

Desktop

Typed "pricing" in search bar

Keypress

Desktop

Pressed Ctrl+S to save

Verifiable Computer Use Trajectories

Verifiable Data Quality

Every demo includes screen recordings and screenshots, enabling full traceability.

Action-Level Reasoning Data

Demonstrations include step-level actions paired with human decision context.

Built to Scale

We can create 10,000 hours of human demonstrations per week.

What Files You Get

Each demonstration includes a complete set of synchronized files for training and evaluation.

Event Logs (.txt)
Screen Recording (.mp4)
Structured Trajectory (.json)
Frames (.jpg)

Raw Data View

Displaying raw mouse and keyboard event data with timestamps and coordinates.

Pricing

Per-task pricing with clear visibility. Custom data priced by scope.

Licensing

Purchased datasets are owned by you and licensed for model training.

Privacy

PII is excluded by default. When present, explicit user consent is obtained.

Common Questions

Have more questions? Contact us

Ready to train better AI agents?

Explore Our Data Library

Training Data for

Computer Use Agents

6M+ Actions Across 150+ Tools

About This Benchmark

Methodology

6M+ Actions Across 150+ Tools

Dataset Overview

Verifiable Computer Use Trajectories

Verifiable Data Quality

Action-Level Reasoning Data

Built to Scale

What Files You Get

Raw Data View

Pricing

Licensing

Privacy

Common Questions

Is this data collected in real tools or simulated environments?

How fine-grained is the action and timing information?

Does the data include mistakes and recovery behavior?

How do you ensure consistency and quality at scale?

Ready to train better AI agents?