Skip to Main content Skip to Navigation
Theses

Input noise injection for supervised machine learning, with applications on genomic and image data

Abstract : Overfitting is a general and important issue in machine learning that has been addressed in several ways through the progress of the field. We first illustrate the importance of such an issue in a collaborative challenge that provided genotype and clinical data to assess response of Rheumatoid Arthritis patients to anti-TNF treatments. We then re-formalise Input Noise Injection (INI) as a set of increasingly popular regularisation methods. We provide a brief taxonomy of its use in supervised learning, its intuitive and theoretical benefits in preventing overfitting and how it can be incorporated in the learning problem. We focus in this context on the dropout trick, review related lines of work of its understanding and adaptations and provide a novel approximation that can be leveraged for general non-linear models, to understand how dropout works. We then present the DropLasso method, as both a generalisation of dropout by incorporating a sparsity penalty, and apply it in the case of single cell RNA-seq data where we show that it can improve accuracy of both Lasso and dropout while performing biologically meaningful feature selection. Finally we build another generalisation of Noise Injection where the noise variable follows a structure that can be either fixed, adapted or learnt during training. We present Adaptive Structured Noise Injection as a regularisation method for shallow and deep networks, where the noise structure applied on the input of a hidden layer follows the covariance of its activations. We provide a fast algorithm for this particular adaptive scheme, study the regularisation properties of our method on linear and multilayer networks using a quadratic approximation, and show improved results in generalisation performance and in representations disentanglement in real dataset experiments.
Document type :
Theses
Complete list of metadata

https://pastel.archives-ouvertes.fr/tel-03255379
Contributor : Abes Star :  Contact
Submitted on : Wednesday, June 9, 2021 - 3:01:16 PM
Last modification on : Wednesday, November 17, 2021 - 12:31:10 PM
Long-term archiving on: : Friday, September 10, 2021 - 6:46:16 PM

File

2019PSLEM081_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03255379, version 1

Citation

Beyrem Khalfaoui. Input noise injection for supervised machine learning, with applications on genomic and image data. Bioinformatics [q-bio.QM]. Université Paris sciences et lettres, 2019. English. ⟨NNT : 2019PSLEM081⟩. ⟨tel-03255379⟩

Share

Metrics

Record views

121

Files downloads

90