تگ: Non-convex Optimization in Deep Learning Training