UniT: Human Video Trains Humanoid Robots Without Kinematic Matching
UniT introduces a unified latent action tokenizer that maps human video to humanoid actions, bypassing kinematic matching. Success in simulation is promising, but real-world validation is absent.











