Finding resources is crucial for animals to survive and reproduce, but the understanding of the decision-making underlying foraging decisions to explore new resources and exploit old resources remains lacking. Theory predicts an ‘exploration-exploitation trade-off’ where animals must balance their effort into either stay and exploit a seemingly good resource or move and explore the environment. To date, however, it has been challenging to generate flexible yet tractable statistical models that can capture this trade-off, and our understanding of foraging decisions is limited. Here, I suggest that foraging decisions can be seen as multi-armed bandit problems, and apply deterministic (i.e., the Upper-Confidence-Bound or ‘UCB’) and Bayesian algorithms (i.e., Thompson Sampling or ‘TS’) to demonstrate how these algorithms generate testable a priori predictions from simulated data. Next, I use UCB and TS to analyse empirical foraging data from the tephritid fruit fly larvae Bactrocera tryoni to provide a qualitative and quantitative framework to quantify animal foraging behaviour. Qualitative analysis revealed that TS display shorter exploration period than UCB, although both converged to similar qualitative results. Quantitative analysis demonstrated that, overall, UCB is more accurate in predicting the observed foraging patterns compared with TS, even though both algorithms failed to quantitatively estimate the empirical foraging patterns in high-density groups (i.e., groups with 50 larvae and, more strikingly, groups with 100 larvae), likely due to the influence of intraspecific competition on animal behaviour. The framework proposed here demonstrates how reinforcement learning algorithms can be used to model animal foraging decisions.