Skip to Main content Skip to Navigation

Longitudinal, large-scale and unbiased Internet measurements

Abstract : Today, a world without the Internet is unimaginable. By interconnecting billions of people worldwide and by offering an uncountable number of services, it is now fully embedded in the modern society. Yet, despite technology evolution and development, its pervasiveness and heterogeneity still raise new challenges, such as security concerns, monitoring of the users' Quality of Experience (QoE), care for transparency and fairness. Accordingly, the goal of this thesis is to shed new light on some of the challenges emerged in recent years. In particular, we provide an in-depth analysis of some of the most prominent aspects of modern Internet. A particular emphasis is given on the World Wide Web, which among all, is undoubtedly one of the most popular Internet applications, and a specific regard to its interaction with machine learning. The first part of this work studies the Quality of Experience of users' browsing the Web, with measurements led both in the wild and in controlled environments. Our contributions follow with an original analysis of both the subjective user feedback and the objective QoE metrics, showing how hard it is to build accurate supervised data-driven models capable to predict the user satisfaction, along with an in-depth discussion of the multi-modal nature of the subjective user opinions.In the second part of this work, we analyze and discuss the fairness of state-of-the-art transformer-based language models, which are pre-trained on Web-based corpora and which are typically used to solve a wide variety of Natural Language Processing (NLP) tasks. Here, we question whether the sheer size and heterogeneity of the Web guarantee diversity in the models. The core of our contributions rests in the measure of the bias embedded in the models, that we discuss under different angles. Finally, the last part of this dissertation addresses the classification of objects generated by machines through some of the simplest state-of-the-art supervised machine learning algorithms. Through a minimally intrusive, robust and lightweight framework, we show that the different behaviors of a field of the IP packet, the IP identification (IP-ID), could be easily classified with few features having high discriminative power. We finally apply our technique to an Internet-wide census and provide an updated view of the adoption of the different implementations in the Internet.
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Monday, December 20, 2021 - 4:31:08 PM
Last modification on : Tuesday, December 21, 2021 - 3:06:44 AM


Version validated by the jury (STAR)


  • HAL Id : tel-03497586, version 1



Flavia Salutari. Longitudinal, large-scale and unbiased Internet measurements. Networking and Internet Architecture [cs.NI]. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAT023⟩. ⟨tel-03497586⟩



Record views


Files downloads