Model-based Clustering with Missing Not At Random Data - 3IA Côte d’Azur – Interdisciplinary Institute for Artificial Intelligence Access content directly
Preprints, Working Papers, ... Year :

Model-based Clustering with Missing Not At Random Data


Model-based unsupervised learning, as any learning task, stalls as soon as missing data occurs. This is even more true when the missing data are informative, or said missing not at random (MNAR). In this paper, we propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data. To do so, we introduce a mixture model for different types of data (continuous, count, categorical and mixed) to jointly model the data distribution and the MNAR mechanism, remaining vigilant to the degrees of freedom of each. Eight different MNAR models which depend on the class membership and/or on the values of the missing variables themselves are proposed. For a particular type of MNAR models, for which the missingness depends on the class membership, we show that the statistical inference can be carried out on the data matrix concatenated with the missing mask considering a MAR mechanism instead; this specifically underlines the versatility of the studied MNAR models. Then, we establish sufficient conditions for identifiability of parameters of both the data distribution and the mechanism. Regardless of the type of data and the mechanism, we propose to perform clustering using EM or stochastic EM algorithms specially developed for the purpose. Finally, we assess the numerical performances of the proposed methods on synthetic data and on the real medical registry TraumaBase® as well.
Fichier principal
Vignette du fichier
main.pdf (637.7 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03494674 , version 1 (19-12-2021)
hal-03494674 , version 2 (13-05-2022)
hal-03494674 , version 3 (13-02-2023)



Aude Sportisse, Matthieu Marbac, Christophe Biernacki, Claire Boyer, Gilles Celeux, et al.. Model-based Clustering with Missing Not At Random Data. 2023. ⟨hal-03494674v3⟩
262 View
220 Download



Gmail Facebook Twitter LinkedIn More