Learning Dialog Policies from Weak Demonstrations