MRSDs Winning the 2025 Embodied AI Hackathon

[Version 4.0.0] Kwangkyun Kim (Bruce), Indraneel Patil

The 2025 Embodied AI Hackathon hosted by NVIDIA, Hugging Face, and Seeed Studio, brought together over a hundred participants across 15+ teams to tackle home-and-cooking challenges using VLA models on Jetson Thor. Our team of four—including two MRSD alumni—joined to push the boundaries of contact-rich manipulation. This hackathon was interesting to us because real world tasks often go beyond simple pick and place. Robots need contact-rich manipulation, long horizon reasoning, and generalization. In the hackathon, our system shone as the second-place winner, but the journey began months earlier.

Going through the MRSD program, we learned that MRSDs make great teammates—and surprisingly great roommates! Earlier this year, we moved in together and, with the goal of working on fun and ambitious projects, turned our garage into a mini robotics lab inspired by the MRSD lab in NSH. That space became the seed of a shared mission: advance low-cost manipulation by building the entire stack—hardware, software, and infrastructure—from scratch. We decided to build our own robot to clean the house instead of buying an off-the-shelf unit. A robot vacuum gave us a finite, achievable goal and served as the perfect project to ramp up our capabilities and prepare the lab (and ourselves) for more advanced future work, including manipulation tasks.

Let us introduce our low-cost kitchen-helper robot arm called Sprinkle! It can pick up spice containers and dispense precise amounts of seasoning onto dishes. For the hackathon we adapted it for a halloween themed sugar sprinkling challenge. Sprinkle had to pick up a sugar container, tilt it evenly over cookies, press a custom pumpkin-shaped button to rotate the plate, sprinkle the other half, and finally place the container upright on the table.

We defined a low-cost robot as one that uses minimal hardware while still delivering the same level of action performance. To keep the system lightweight enough to run on a Jetson Orin Nano, we chose Hugging Face’s SmolVLA, a 450M-parameter model. We avoided adding a mobile base or designing a new gripper, instead making minimal modifications such as attaching a strip of Velcro to the robot’s existing gripper, and tasked the robot with pushing a button to rotate a dish—a common instrument in many Asian restaurants. This low-cost approach introduced two main challenges: SmolVLA’s small size is not ideal for long-horizon behaviors, and pushing a button while holding a container demands precise contact-rich manipulation.

To address these challenges, we structured the task so a small model could execute it reliably. The first hurdle was long-horizon behavior: we broke the overall action into four clear steps—Grab, Tilt, Press, and Place—and trained Sprinkle so that transitions between stages were seamless. The second challenge was contact-rich manipulation: the button controlled a rotating plate that only moved after the press was released, creating ambiguity for the model since there was no immediate visual cue. To solve this, we added a proximity-triggered red LED “eye glow” to the pumpkin-shaped button, giving the policy a clear signal that the press had succeeded and allowing Sprinkle to complete the sequence reliably on both Jetson Thor and Orin Nano.

We also refined the setup to reduce noise and improve robustness—diversifying container poses, removing visual distractions, stabilizing lighting, and leveraging motion cues from the rotating plate. A playful yet effective tweak reversed the typical grasping order: keeping the gripper open until alignment naturally occurred, simplifying the motion and improving consistency. Judges highlighted our subtask decomposition as highly effective and praised the glowing-eye cue as a creative, practical solution. By the end, Sprinkle executed the full sequence reliably, demonstrating how weeks of iteration coalesced into smooth, contact-rich performance.

In summary, Sprinkle grew out of a practical kitchen use case, powered by lightweight VLAs, systematic data collection, and a few empirical tricks to help the model learn. Despite its success, challenges remain—visual cues alone are still limited for depth perception, and it isn’t feasible to outfit a kitchen with cameras everywhere. Moreover, truly reliable contact-rich manipulation may require tactile feedback rather than purely vision-based cues. Therefore, our next goal is a more tactile task: inserting wires into an electric harness while interpreting rich sensory feedback. We’re eager to push the system further in both software and hardware, building on lessons from the hackathon and the MRSD program that first inspired us to tackle ambitious robotics challenges. We also hope to connect with others exploring similar problems or contributing to contact-rich manipulation research—if this sparks your curiosity, we’d love to collaborate.