Self-Guided Framework for Improving Arithmetic Reasoning in Large Language Models with Reinforcement Learning
Status: ArchivedAuthors: Jiwoo Hong, Huzama Ahmad, Minsu Kim, James Thorne
Large language models have demonstrated their ability for multi-step reasoning in complex arithmetic problems when prompted with chain-of-thought instructions. This paper introduces a novel self-guided framework that uses reinforcement learning to improve the reasoning capabilities of large language models. Our framework encourages the generation of logical explanations by actively exploring and refining various reasoning paths, with self-logicality serving as a reward signal. Experimental results show the effectiveness of our approach on both encoder-decoder and autoregressive models. Quantitative evaluations on four different arithmetic reasoning datasets show that language models can achieve precise reasoning abilities through our framework. Additionally, evaluations conducted by human experts and automated systems confirm that our framework leads to improved logicality and coherence in chain-of-thought reasoning.
- reasoning-rl
- self-guided