Data privacy is now a necessary need for every organization. Often, an organization creates and sits on a large pile of data. With the change in time, Data privacy by design is more viable with upcoming technologies, where privacy control is inbuilt into the developing technologies in the market. The problem arises when there is a need to implement privacy controls over legacy data, which is still an ordeal for many organizations. To understand why this is not easily feasible, we need to understand legacy data.
Legacy data are created over time and lie as unmanaged, old flat files spread across different formats. Data stored by legacy applications have long surpassed their end-of-life (EOL), which means the vendor will not support it anymore. It can be anything like images, charts, audit reports, etc. If it contains sensitive elements, the organizations are bound to protect it.
Data Privacy Controls
Controlling data privacy needs to be implemented and enforced with methods as per governance rules. Below are the classifications:
- Identity and Access Management: Find who can access what and work on minimum accessibility models
- Data Loss Prevention: Guarding against any breach or leakage
- Encryption & Pseudonymization: Encrypt or hide sensitive data when accessing data
- Incident Response Plan: An operative plan for recovery or to minimize the risk from any leakage or breach
Data classification and protection is still the main challenge for all organizations. Complexity arises in many folds, especially dealing with legacy data. Due to the nature of the wide range of formats and storage types, having only one method to handle these cannot be effective in any way. It can be argued that since legacy data are the old flat files stored in redundant fragments over many places in an organization, one can choose to get rid of it, purge it or shred it. It is not as simple as we think. Those old data are still required until the last minute in every organization. So why can’t we merely migrate all those data to the modern manageable data systems which maintain and adhere to privacy governance laws? It may seem like an appealing solution, but in doing so, an organization will have to spend enough resources and expenses to manage it with basically no return in the end, as these data are not sought daily.
Organizations are usually reluctant to spend a great deal of time, money, and other resources. The problem arises in dealing with legacy data and still adhering to all data governance policy enforcement. The exact answer is still debatable. In other words, there can never exist a solution that will fit all types of legacy data privacy controls. However, we can generalize the approach which can deal with such data.
So far, we know what legacy data is and its problems. What approach should an organization follow to protect this legacy data? It is always likely an organization prioritizes protecting the data rather than having it efficiently re-arranged or re-managed. If we classify the legacy data into different categories, it can be easy to manage them. However, classification cannot be just based on organizations’ internal data authority like employee data, survey data, finance data, etc. To ensure competent privacy controls, we need to classify these in a basic manner.
Let us look at some examples of the categories of legacy data. If we can understand this, then we can picture it with certainty. These are:
- Flat Files: Like logs, text, XML, JSON, etc.
- Well Known document types: Similar to a word document, excel, presentation file, PDF, etc.
- Images & Charts: Any scanned documents and pictures
- Other Binary types: These are the data generated by third-party software that has long surpassed their end-of-life (EOL). It means the vendor will not support it anymore.
All the above data are unstructured, and applying data privacy control will be a real challenge. It cannot be achieved manually, and hence need to scout the market for a utility or a program that can tackle this.
One such utility is the iEDPS – Infosys Enterprise Data Privacy Suite. This suite offers a lot of capabilities in the field of Data Privacy. It also provides a branch of “Unstructured Data Discovery” (UDD) which scans all the legacy data and finds PII, HRCI (Highly restricted confidential information), and generates the report with separate files and data which contain sensitive information. The scope of this program is limited to flat and well-known types of files. However, for other binary types, data generated by custom software from any vendor cannot be dealt with as this is in a binary format and is very specific to the client’s requirement. Further, it cannot be readable by any other sources.
Why IEDPS?
There are many competitive tools available as per the organizations’ requirements. Since every organization creates data, ranging from structured to unstructured data, handling all types of formats is a really hectic job. iEDPS provides a wide range of dB connectors and makes it easy to work hassle-free. Despite that, this is subjective as no tool can support several types of data since the organizations develop their own dB engine to meet their custom requirement making their API interface limited. Nevertheless, iEDPS has been in the market for over a decade and has improved significantly. It has been adding many emerging technologies that need data privacy tools to help its diverse clientele.
Summary
We have glimpsed how to apply privacy controls on legacy data and have seen the challenges associated with it. In short, there are a few steps to follow to achieve legacy data privacy control:
- Classify file types into different categories.
- Execute a privacy assessment tool on the dataset. A tool such as iEDPS can assist in the discovery and marking of sensitive fields.
- Then find which set contains sensitive data and handle those.
Author: Amit Sinha