AI Dictionary & Vocabulary App "DiQt"

Last Updated：2025/11/30

See correct answer

During training, the agent adjusted its Q-values to converge on an optimal policy.

音声機能が動作しない場合はこちらをご確認ください

Source Sentence

During training, the agent adjusted its Q-values to converge on an optimal policy.

訓練中、エージェントは最適な方策に収束するためにQ値を調整した。