Recursive meta-Reinforcement Learning for Personalized Sequential Dynamic Treatment PoliciesMain features, outcomes, and decisions of the OPC treatment sequence
Authors: Tardini, E.
Publication: Submitted as partial fulfillment of the requirements for the degree of Master of Science in Computer Science, Graduate College of the University of Illinois at Chicago In recent years deep meta-reinforcement learning has extended the applicability of reinforcement learning (RL) algorithms: by integrating recurrent networks, trained models have the ability to quickly adapt to new unseen environments without the need for further backpropagation. These models, however, cannot adapt without having information on past rewards, and are therefore not directly applicable to a sequential decision-making setting in which multiple steps are required before observing the final reward. One of the main applications affected by this limitation are dynamic treatment regimes, i.e. the problem of selecting the optimal medical treatment sequence for a patient at each step, keeping into account the complete past treatment history. By expanding deep meta-reinforcement learning to handle sequential decisions, a model would be able to prescribe the optimal treatment for each patient even if the patient’s (or physician’s) preferences on the outcome were never encountered by the model in training. We propose a recursive deep meta-reinforcement learning approach which enables the model of each decision of the sequential process to learn from and adapt to unseen circumstances by recursively integrating the feedback of the models of other decisions in the process. We evaluate our approach on synthetic two-step processes with fixed transition probabilities but varying reward functions, to test the models' ability to propagate environment information from the final reward to intermediate steps. Finally, we train our model on a dataset of three-step chemo-radiotherapeutic and surgical treatment of oropharyngeal squamous cell carcinoma patients, proving our approach’s ability to optimally handle previously unseen patient's preferences on survival and toxicity outcomes. Date: May 1, 2021 Document: View PDF |