Merge "Support R3 model for OOF Policy Optimization" into casablanca