Learning in Structured MDPs With convex Cost function: Improved rerget bounds for inventory management