I know it is unusual to start with a brief introduction of yourself at the start of a blog. However, given the topic, I want to cover what I thought would make sense to give a small introduction. I am the Product architect of Infosys Enterprise Data Privacy Suite (iEDPS), a data privacy product of Infosys. Data masking is one of the significant features which iEDPS supports.
Often, I have been asked, “By masking, you mean you change the data to XXXX, right?”. My reply has been, “People always associate masking with changing the data to XXXX, but that is not correct. It is one of the techniques for masking. Data masking is much beyond that”. In this blog, I want to share a quick overview of the often misinterpreted and interchangeably used terms that I have come across in various privacy implementations.
So, What is Data Masking?
It is the process of hiding your original data with some other data. It can be a simple replacement of your data with ‘XXXX’, or it can be changing your data to random data that seems realistic so that your business processes continue to behave in the same manner even after the data is masked. A few examples from iEDPS products are given below (iEDPS provides 180+ masking techniques):
i. Realistic Data Masking: Change data to realistic data.
ii. Encryption: Encrypt the data using a key. The data can be decrypted later, provided the same key and product (proprietary FPE) are being used.
iii. Shuffling: Shuffles the data within a data set.
iv. Substitution based: Substitute the data with a fixed value.
v. Deterministic masking: Given a data element, it would always be masked to the same value. iEDPS’s algorithms have the intelligence to mask data deterministically. The data can be deterministically masked across various data sources.
So, unlike what many people say, data masking techniques, based on the requirement, can be used to ensure that your masked data can be:
Do you Anonymize or Pseudonymize data?
GDPR has become a holy grail for all regulations across the globe. However, even before GDPR, there have been regulations, like HIPAA, that talk about de-identification of data, which means removing identifiers from a data set. It is done so that personal information cannot be extracted from the data set.
With the onset of GDPR, people have asked me many times can one anonymize or pseudonymize data using iEDPS?. The answer is yes for both.
It is the process of removing Personally Identifiable Information (PII) from a data source so that the data source is protected and ensures that it cannot be used to derive information regarding a person. The process is irreversible. Data masking with irreversible techniques can be one way of achieving anonymization. Another way is to augment the data set while removing the PII. iEDPS provides various anonymization options.
K-anonymity – An anonymized dataset is found to be k-anonymous if at least k individuals share the same set of attributes which can be used to identify a person. Let us see an example of a k-anonymized data set (3-anonymous) using iEDPS.
Note: For demo purposes, details like name, phone and city are not protected in the above example.
In this process, the data is replaced with pseudonyms or pseudo identifiers, ensuring it stays protected. The process, however, is not full proof since personal information may be derived by combining it with additional data sets.
I hope you find this blog interesting and has helped you learn or confirm your understanding of various terminologies used in Data Privacy engagements.
Please do reach out to team iEDPS for anything related to data privacy and visit the social media handles below: