Skip to content

🚀 Deploy Your Policy

To deploy and evaluate your policy, you need to modify the following three files:

In deploy_policy.py, the following components are defined: get_model for loading the policy model, encode_obs for observation processing (modification may not be necessary), and get_action along with the control loop that handles observation acquisition and action execution.

The deploy_policy.yml file specifies the input parameters, which are eventually passed into the get_model function as usr_args to assist in locating, defining, and loading your model.

In eval.sh, the parameters specified after overrides can be used to overwrite those in deploy_policy.yml, allowing you to specify different settings without manually modifying the YAML file each time.

# policy/Your_Policy/deploy_policy.py
# import packages and module here


def encode_obs(observation):  # Post-Process Observation
    obs = observation
    # ...
    return obs


def get_model(usr_args):  # from deploy_policy.yml and eval.sh (overrides)
    Your_Model = None
    # ...
    return Your_Model  # return your policy model


def eval(TASK_ENV, model, observation):
    """
    All the function interfaces below are just examples
    You can modify them according to your implementation
    But we strongly recommend keeping the code logic unchanged
    """
    obs = encode_obs(observation)  # Post-Process Observation
    instruction = TASK_ENV.get_instruction()

    if len(
            model.obs_cache
    ) == 0:  # Force an update of the observation at the first frame to avoid an empty observation window, `obs_cache` here can be modified
        model.update_obs(obs)

    actions = model.get_action()  # Get Action according to observation chunk

    for action in actions:  # Execute each step of the action
        # see for https://robotwin-platform.github.io/doc/control-robot.md more details
        TASK_ENV.take_action(action, action_type='qpos') # joint control: [left_arm_joints + left_gripper + right_arm_joints + right_gripper]
        # TASK_ENV.take_action(action, action_type='ee') # endpose control: [left_end_effector_pose (xyz + quaternion) + left_gripper + right_end_effector_pose + right_gripper]
        # TASK_ENV.take_action(action, action_type='delta_ee') # delta endpose control: [left_end_effector_delta (xyz + quaternion) + left_gripper + right_end_effector_delta + right_gripper]
        observation = TASK_ENV.get_obs()
        obs = encode_obs(observation)
        model.update_obs(obs)  # Update Observation, `update_obs` here can be modified


def reset_model(model):  
    # Clean the model cache at the beginning of every evaluation episode, such as the observation window
    pass

1. 🔧 deploy_policy.yml

You are free to add any parameters needed in deploy_policy.yml to specify your model setup (e.g., checkpoint path, model type, architecture details). The entire YAML content will be passed to deploy_policy.py as usr_args, which will be available in the get_model() function.


2. đŸ–Ĩī¸ eval.sh

Update the script to pass additional arguments to override default values in deploy_policy.yml.

#!/bin/bash

policy_name=Your_Policy
task_name=${1}
task_config=${2}
ckpt_setting=${3}
seed=${4}
gpu_id=${5}
# [TODO] Add your custom command-line arguments here

export CUDA_VISIBLE_DEVICES=${gpu_id}
echo -e "\033[33mgpu id (to use): ${gpu_id}\033[0m"

cd ../.. # move to project root

python script/eval_policy.py --config policy/$policy_name/deploy_policy.yml \
    --overrides \
    --task_name ${task_name} \
    --task_config ${task_config} \
    --ckpt_setting ${ckpt_setting} \
    --seed ${seed} \
    --policy_name ${policy_name} 
    # [TODO] Add your custom arguments here

3. 🧠 deploy_policy.py

You need to implement the following methods in deploy_policy.py:

3.1 encode_obs(obs: dict) -> dict

Optional. This function is used to preprocess the raw environment observation (e.g., color channel normalization, reshaping, etc.). If not needed, it can be left unchanged.


3.2 get_model(usr_args: dict) -> Any

Required. This function receives the full configuration from deploy_policy.yml via usr_args and must return the initialized model. You can define your own loading logic here, including parsing checkpoints and network parameters.


3.3 eval(env, model, observation, instruction) -> Any

Required. The main evaluation loop. Given the current environment instance, model, and observation (as a dictionary), and a natural language instruction (string), this function must compute the next action and execute it in the environment.


3.4 update_obs(obs: dict) -> None

Optional. Used to update any internal state of the model or observation buffer. Useful if your model requires a history of frames or a memory-based context.


3.5 get_action(model, obs: dict) -> Any

Optional. Given a model and current observation, return the action to be executed. This is useful if action computation is separated from the evaluation loop.


3.6 reset_model() -> None

Optional but recommended. This function is called before the evaluation of each episode, allowing you to reset model states such as recurrent memory, history buffers, or context encodings.


4. âœ”ī¸ Run eval.sh

bash eval.sh ...(input parameters you define)

5. 📌 Notes

  • The variable instruction is a string containing the language command describing the task. You can choose how (or whether) to use it.
  • Your policy should be compatible with the input/output format expected by the simulator.