赞
踩
$$ \begin{align} J(\theta) &= \mathbb{E}{\pi_\theta}[\sum{t=0}^T \gamma^t r(s_t, a_t)] \ \theta_{k+1} &= \theta_k + \alpha \nabla_\theta J(\theta_k) \end{align} $$