By now you’ve read numerous articles about how the European Union (EU) will enforce its General Data Protection Regulation (GDPR). They tell you how to figure out your exposure, establish processes for dealing with data, appoint a data protection officer, and figure out what “clear affirmative action” means for you. Great ideas in a world of Facebook, credit report agencies, retailers and other companies leaking personal information wholesale.
When you read the regulations they tell you what to do, but not how.
Data Residency and Processing Requirements
The GDPR’s Data Residency requirements stipulate that organizations can neither store personal data of EU data subjects in, nor transfer it through, countries that do not enforce equivalent data protections.1, 2 The United States does not require these protections, so data on EU data subjects must be stored and processed in the EU. This means that if you have a US and international customer base, you will need to store and process data in multiple countries. Further complicating this is that China and Russia are imposing similar restrictions. This has major implications for your processing architecture.
GDPRchitecture: Bringing Processing to Distributed Data
One definition of big data that we like is data that is too large to move. Regardless of how much data you have, if you do business in the EU you now have “big data”. It’s become too expensive to move. So let’s look at how architectures typically handle big data.
Perhaps you’re already using Apache Hadoop or something else that uses a network of compute/storage nodes to process data in parallel. Grossly oversimplified, the MapReduce algorithm works like this:
- Distribute the incoming data to a set of nodes (map)
- Process each node in parallel
- Combine the results (reduce)
A parallel processing pattern similar to Hadoop can be used to meet data residency and processing requirements. Every affected region must have an isolated data set and application architecture to host and process its data.
A geographically distributed “MapReduce” pattern uses these independent data storage and processing nodes. Route or distribute incoming data to an appropriate node, process in parallel, and then combine aggregated and/or pseudonymised results back to a single location. Using this approach will allow you to keep user data in the appropriate administrative domain in order meet data residency requirements, still be able to run local operations on the data, and to run detailed analytics globally.
Of course, there’a a lot more to the architecture than this brief introduction. Please contact us if you want to go into more detail.
(1) ‘personal data‘ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
(2) ‘processing‘ means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction;
(5) ‘pseudonymisation‘ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person;
2 Article 45, EU GDPR, “Transfers on the basis of an adequacy decision”: 1. A transfer of personal data to a third country or an international organisation may take place where the Commission has decided that the third country, a territory or one or more specified sectors within that third country, or the international organisation in question ensures an adequate level of protection.