|Abstract:||The aim of transfer learning is to reduce sample complexity required to solve a learning task by using information gained from solving related tasks. Transfer learning has in general been motivated by the observation that when people solve problems, they almost always use information gained from solving related problems previously. Indeed, the thought of even children trying to solve problems tabula rasa seem absurd to us. Despite this fairly obvious observation, typical machine learning algorithms consider solving one task at a time and so do not take advantage of information that has become available from solving related tasks previously. Transfer methods aim to rectify this rather serious oversight and so have a potential to make a huge impact on how successful and widespread the use of machine learning is.
Practical methods to transfer information has been developed and applied successfully to difficult real life problems. In addition theoretical analysis of these methods have been developed. However one fundamental problem still remains unsolved, which is how one measures similarity between tasks. This problem is obviously quite troubling from a conceptual point of view, as the notion of relatedness seem central to the objective of transferring information between related tasks. Furthermore, it has been shown in experiments that transferring from `unrelated' tasks hurts generalization performance of learning algorithms. So an appropriate notion of similarity between tasks seem necessary to design algorithms that can determine when to transfer information, when not to and how much information to transfer. In this dissertation we give a formal solution to the problem of measuring task relatedness and all its associated problems.
We derive a very general measure of relatedness between tasks. We show that this measure is universal -- i.e. no other measure of relatedness can uncover much more similarity than our measure. We then use this measure to derive universally optimal transfer learning algorithms in a Bayesian setting. Universal optimality means that no other transfer learning algorithm can perform much better than ours. The methods we develop automatically solve the problems of determining when to transfer information and how much information to transfer. Indeed, we show that transferring information is always justified -- i.e. it never hurts too much to transfer information. This latter result is quite surprising indeed as the commonly held belief in the transfer learning community is that it should hurt to transfer from unrelated tasks. We also show how our transfer learning methods may be used to do transfer in Prediction with Expert Advice Systems and in Reinforcement Learning agents as well.
Our distance measures and learning algorithms are based on powerful, elegant and beautiful ideas from the field of Algorithmic Information Theory. While developing our transfer learning mechanisms we also derive results that are interesting in and of themselves. We also developed practical approximations to our formally optimal method for Bayesian decision trees, and applied it to transfer information between 7 arbitrarily chosen data-sets in the UCI machine learning repository through a battery of 144 experiments. The arbitrary choice of databases makes our experiments the most general transfer experiments to date. The experiments also bear out our result that transfer should never hurt too much.