DATA MINIZATION – when simplifying becomes an issue

Data minimization is a simple principle that applies for data protection. It is in reaction against the disproportionate proliferation of BIG DATA and aims, first and foremost, to protect personal information.

Initialy published in


Data acquisition has become a gold rush. Systems store more and more of it, and maximizing this information is sometimes a business goal for organizations.

In the face of the explosion in the amount of accessible data, a protective reflex is to leave as few « fingerprints » as possible of our personal information. The first step therefore seems to be the destruction of data, one of the steps of their life cycle. To protect personal information, companies are therefore encouraged not to back up the data of their customers or users indefinitely.

Moreover, the principle of data minimization should also be applied at the time of collection. So companies are encouraged to select the most relevant data that meets their goals.

For example, if the objective of a department is to recommend an exercise routine, it should be limited to inferring user locations without explicit permission from customers. 

Smart Cities Cybersecurity and Privacy

The use of data must then also be framed according to the principle of minimization. Again, it is usually by reference to the purposes of the application that the limit on data use is established. The transfer should also respect the principle of minimization and ultimately access to data.

Data minimization requires the following: 1) the possibility of collecting personal data about other people should be minimized; 2) the personal data collected should be minimized in the other possibilities of 1); 3) the storage time of this personal data collected must be kept to a minimum.

 Wenlin Han, Yang Xiao

Minimization is a process of cleaning data that should therefore take place throughout its lifecycle.


Why is this “cleaning” effective for data protection? Above all, because it responds to the great fear facing Big Data, that of information leaking about our privacy.

Protection against theft

A large database is more attractive to hackers. Thus, limiting the information collected by communication networks is an obvious obstacle to data theft. In some cases, the designers of these networks may also be able to recommend limits on the use or retention of data. It is also the easiest and most effective way to reduce the risk of a leak.

In this way, limiting data according to the principle of minimization, protects against leaks and misuse of personal data. A major leak of this information can easily destroy a business or even lead to criminal negligence charges.

Improved data management

The more the Internet of Things grows, the more businesses and individuals are faced with data management issues. Cloud solutions (The Cloud) are often very accessible ways to save data. It is often even private and identifiable data.

Data minimization also reduces management costs. Data storage can represent a significant budget for companies. They cannot continue to collect and keep them indefinitely.


Most privacy concerns seem to be resolved by this principle of data minimization.

Lack of concrete processes

But few organizations offer concrete methods of preserving privacy in this cleaning process. Companies have the right to determine for themselves what relevant data to retain rather than others. The data selection process is free. For example, will a human resource application, which only needs to consider relevant information, choose to record an individual’s marital status?

A weak principle

Usually, data is collected without large filters. They are then cleaned. But leaks can happen at the time of collection.

But few or no systems within and around the concepts of the Internet of Things and Big Data adhere to this principle of minimization. They are designed to suck and disseminate as much data as possible, and security is generally weak or almost entirely absent.

Stuart Sumner


Data minimization mitigates important threats like surveillance, identification, secondary use, and disclosure. But it is arbitrary or dependent on the particular objectives of the companies. This is the reason why ethics is at the center of decisions concerning it, and becomes part of the determination of business objectives.


Initialy published in

Becoming Goldilocks, F. Peters, in Perspectives on Data Science for Software Engineering, 2016

Privacy-Enhancing Technologies, Simone Fischer-Hbner, Stefan Berthold, in Computer and Information Security Handbook (Third Edition), 2017

Privacy preservation for V2G networks in smart grid: A survey, Wenlin Han, Yang Xiao, in Computer Communications, 2016

Privacy Preservation in Smart Cities, Danda B. Rawat and Kayhan Zrar Ghafoor, in Smart Cities Cybersecurity and Privacy, 2019

The Internet of Things and the (not so) Smart Grid, Stuart Sumner, British Library Cataloguing-in-Publication Data, 2016

Publié par Patricia Gautrin

Entrepreneurial spirit - Passion for IT - Constant creativity - Ai Ethics