Last Updated:2025/11/30

During training, the agent adjusted its Q-values to converge on an optimal policy.

See correct answer

During training, the agent adjusted its Q-values to converge on an optimal policy.

音声機能が動作しない場合はこちらをご確認ください
Edit Histories(0)
Source Sentence

訓練中、エージェントは最適な方策に収束するためにQ値を調整した。

Sentence quizzes to help you learn to read

Edit Histories(0)

Login / Sign up

 

Download the app!
DiQt

DiQt

Free

★★★★★★★★★★