Transfer learning – an Omics Prediction Solution?
Of late, training deep neural networks on tasks has become increasingly easier. Whether the task is image recognition, natural language processing, or gene expression prediction, if one has a large amount of data, one can train a model to map inputs to outputs with very high accuracy. The question subsequently becomes, how much data is enough?
Problems arise when the models created for the aforementioned tasks fail generalizing relationships, and instead opt to “memorize” the conditions given. We call this overfitting – when our model achieves a high accuracy in our training set, but does poorly on our testing set. Many things characterize this phenomena, a surplus of features in our dataset, early stopping of the model, regularization, but we will focus on one of the most simple – a lack of data. Without ample data to train our model to, for example recognize cats and dogs, we could expect the same behavior we would get from a child if one, the child had no prior understanding of what a cat or dog was, and two, the child had only been shown several pictures of cats and dogs – horrible classification results. So how then do we remedy this situation in problems where little to no data is available? Enter transfer learning.
Suppose we have a model A, trained to classify dog and cat pictures, as well as a model B, trained to classify X-rays with and without tumors. One would be correct in assuming that model A would do far better simply because the availability of dog and cat pictures one could use to train it far outnumbers the availability of X-rays with and without tumors. Transfer learning allows us to leverage the performance in model A for an increase in performance in model B. By transferring the weights of the the trained model A to model B, one gives B a good starting point from which to start training. This effectively solves the limited data problem initially proposed as well as optimizes model B, and is largely applicable simply because many problems today lack the sufficient data required to produce accurate results. There is however a caveat that come with transfer learning – we need to know how much of model A to transfer to model B.
To address this, we first define what it means to “transfer.” Insofar as general deep neural networks go, the act of transferring could most succinctly be put in terms of fixing layers of model A and using those layers to train model B. The simplest implementation of this method would be fixing all of model A‘s layers and replacing the output layer with model B’s output layer. Otherwise, the amount of layers one should transfer depends largely on the problem. For example, in image recognition tasks, convolutional neural networks (CNNs) are known to capture low-level information like angles and lines in the earlier layers, whereas latter layers capture high-level information like shapes and orientation. With this intuition, one might choose to transfer layers based on the structure of information necessary for model B to perform well (e.g. if B’s task is dependent on recognizing similar high-level features to A, then it may benefit to transfer all layers, and if not, only transfer the first few). There are also other “engineering” ways to transfer, such as remapping node values in model B to match those of model A if a certain data similarity criteria is met.
With that very brief introduction to transfer learning and some of its benefits, we turn our attention to a lab problem, omics prediction. Previous work done by two other members of the lab, Minseung Kim and Cheng-En Tan has yielded two compendiums for the bacterium Escherichia coli and Salmonella enterica, respectively. With approximately 4000 gene profiles and 650 conditions, the E. coli compendium is a multi-omic compendium that is far larger than the Salmonella compendium at approximately 2500 gene profiles, 700 conditions. Being as these two bacterium are similar, it is very likely that exploiting transfer learning here for gene expression prediction could be extremely beneficial.
References
- Pan S. J., Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering. 2010. 22(10) pp. 1345-1359