Benjamin Schneider
Gradaute Student at UWaterloo, affiliated with the Vector Institute, Benjamin.Schneider@uwaterloo.ca
Hi! I’m Ben, a CS masters student at the University of Waterloo, advised by Wenhu Chen (TIGER-Lab) and Florian Kerschbaum.
Currently, I’m working on methods for training open-ended embodied agents. The analogy I always draw on is that if you sit a kid in front of a computer and get them to play an open-world game like MineCraft, they quickly become generalist experts (able to accomplish arbitrary tasks) without needing specific objectives. I am interested in machine learning algorithms that can emulate that process of learning. I am usually working on a combination of the following problems:
- Embodied learning in open-ended environments without explicit objectives.
- Continual/Lifelong learning for embodied agents.
- Unified methods for representation learning across modalities.
I’m pretty terrible about keeping my website updated. 😅
So, for an up-to-date list of publications please check my scholar, my code/projects are hosted on GitHub:
Toolbox-HQ (Embodied Agents work) and TIGER-Lab (Multimodal Learning projects).
Fun fact about me: I try to sneak an image of my cat (pictured right) into my papers.
News
| Sep 03, 2025 | I have been battling to teach an agent to play Pokemon Emerald for a few months! Check out our work (in progress) here. |
|---|---|
| May 15, 2025 | First public release of QuickVideo, our library for efficient (long) VideoLLM inference. QuickVideo is an ongoing project focused on improving systems and models for VideoLLMs, please provide feedback if there are features you want implemented! |
| Mar 04, 2025 | We release ABC, a model fine0grained multimodal retrieval. |
Publications
-
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design2025 -
StructEval: Benchmarking LLMs’ Capabilities to Generate Structural Outputs2025 -
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations2025 -
ABC: Achieving Better Control of Multimodal Embeddings using VLMs2025