Contextual Data Generation for Secure Quality Assurance

Why do you need Contextual Data Generation in your testing process?

In the modern world, quality assurance is an integral part of the IT delivery process that ensures that the final product is ready to be shipped to the customer. Testing in production-like test environments is an essential part of quality assurance.

While production data is the best data to test the application, many organizations don’t allow production data to be used for testing purposes due to privacy concerns and key global regulations such as GDPR and CCPA. The alternatives are to use anonymized data or synthetically generated data.

The key to a successful testing exercise is using contextual test data that enables the organization to simulate production-like use cases, devoid of PII (Personally Identifiable Information) / SI (Sensitive Information), to ensure there is no data privacy or regulation breach.


Contextual Test Data as a Pivot of Data Privacy in Application Development and Testing

Test data can be generated through one of the following methods:

  • Manually
  • Mass copying data from production to testing environment
  • Mass copying test data from legacy client systems
  • Automated test data generation using tools

Synthetic data generation falls under the umbrella of automated test data generation, where we leverage emerging technologies, such as Machine Learning to train models to identify and differentiate between the available fields. This can be done by reading through the schema details of the requirement. Once categorized, we can identify the right set of algorithms designed to generate data for the specific category that mimics its production data. This procedure can be followed for all the fields and tables in the schema to generate the mock data required for testing.

We can also train the model to follow references and use data generated at the parent field to regenerate data at the referential fields to avoid the referential integrity errors of tables in the schema. To generate the PII/SI fields, we can follow notation conventions and generate dummy mock data, thus helping comply with the regulations in place.


How do we use contextual data generation for our testing activities?

There is a gamut of products available in the market for data generation – Mockaroo (supports generation in SQL, Delimited files, JSON & Excel), SQL Data Generator by Redgate (As the name suggests used for SQL Server Management Studio), Test database generator by IBM for DB2, Generate Data (MySQL 4 or higher). One of the key products on Contextual Data Generation is Infosys solution, Infosys Enterprise Data Privacy Suite, or iEDPS. iEDPS is an intelligent data generation product that caters to a wide range of requirements aiding test data generation. The solution needs minimal input details about the schema and the number of records. It offers more than 35 algorithms designed specifically for data generation purposes.

iEDPS is an easy-to-use, high-performance, scalable, and cost-effective data privacy and protection solution that automates data protection and privacy across an enterprise. Loaded with deterministic, selective, dynamic, and static masking tools along with the data generation tool, the best part about iEDPS is that it can be deployed on any platform, both On-Premise systems as well as cloud environments, for organization-wide usage of the tool in the enterprise. It supports all major databases and file systems. Before choosing a data generation tool, we should diligently assess critical aspects like offered data generation methods, supported data types, databases, and operating systems. The iEDPS checks most, if not all, of the critical boxes. Here’s a video explaining iEDPS. Explore more about iEDPS and its product suite at the iEDPS Microsite.

Author:- Pranay Sharma R

Author Details

Vijayalaxmi Vijayalaxmi

Vijayalaxmi Suvarna is a Senior System Engineer at Infosys Center for Emerging Technology Solutions, she leads the Marketing initiatives for the PrivacyNext iEDPS Platform. Her focus includes User Experience and online branding of Infosys Data Privacy offerings.

Leave a Comment

Your email address will not be published.