Learning 3D Object-Oriented World Models from Unlabeled Videos