Stay tuned for more updates!
Computer Activity Dataset
A dataset for training a multimodal computer agent that can do a variety of tasks from interacting with a web page to playing computer games. We collect video screenshots, audio, keyboard and touchpad inputs (and even eye tracking data) to better emulate how a human interacts with a computer.
An OCR model of quality comparable to proprietary OCR solutions with LaTeX support.
Building higher-quality non-English datasets and tokenizers, and finetune non-English GPTs out of a pretrained English GPT. Our goal is to improve the performance of non-English language models and make them more accessible to a wider audience.
Exploring efficient video pretraining methods to improve the downstreaming performance of generalist agent.