What is Data Masking?
Data masking is a data security technique of creating a structurally similar data but with sensitive data obfuscated that can be used for purposes such as software testing and user training. Data masking ensures sensitive information is replaced with non-sensitive content and it is unavailable beyond the permitted protected production environment.
Why Mask Data?
- Masked data are worthless and we cannot infer anything from it
- Masking data protects Personally Identifiable Information (PII) or other organizational data, making it safe to expose or share in testing, application development and data training
- Data breaches have increased in 1000 folds in occurrence, but many organizations do not have the capabilities, tools, resources and the infrastructure to prevent and nullify them efficiently despite strict regulations and rules that require the protection of sensitive data
- Helps companies comply with major compliance standards such as Sarbanes-Oxley, Payment Card Industry (PCI) Data Security Standard (DSS), Health Insurance Portability Accountability Act (HIPAA) and with other data privacy mandates
What type of Data is Sensitive?
Any data that is protected against unwarranted disclosure is sensitive data. Data such as the following:
- Personally identifiable information (PII)
- Protected health information (PHI)
- Payment card information (subject to PCI-DSS regulation)
- Intellectual property (subject to ITAR and EAR regulations)
Static Data Masking Vs Dynamic Data Masking
Static data masking: Static data masking is a commonly used method when using outsourced contractors or developers.
- In static data masking before masking the real data it is extracted as it is in the table by an ETL tool. You have to trust the developer has deleted the real data and the working on a protected platform that was not compromised before sending it outside.
- The live database is not protected from those who do have permissions to access the database. There are always some administrators, QA, developers, and others with access to the actual live database. This personnel can access actual data records, which are not masked.
- For organizations, the cost of extra hardware and maintenance is a burden as it has to have a full replica of a real database, minus identifying information
Dynamic data masking: Dynamic data masking is the data security technique designed to secure data in real time in both production and non-production systems by masking all sensitive data as accessed in real time so that DBAs, developers, and persons who have permissions to access the actual live database to debug will not be able to see the unmasked sensitive data in any point of time.
Decide on data masking feature
Data masking has many multiple data manipulation capabilities, but not all are to maintain valid business contextual information. These capabilities include:
- Randomization: Sensitive field is replaced with a randomly generated value subject to various parameters to ensure the data is still valid like not giving “January 30 days”.
- Blurring: Replacing a sensitive data with a visually indistinct data
- Nulling: Replacing a sensitive data with a null symbol
- Shuffling: Shuffling the order of a value to mask the data
- Substitution: Substituting original values and randomly using value from a pre given value
- Tokenization: Data masking technique where an algorithm is used to mask the data and is maintained so the information can be later restored to its original value.
Approach for implementing Data Masking
To identify the right set of solutions, we have to consider multiple factors specific to a given organization.
Holistic approach and Planning
When it comes to technology and vendor selection most of the big companies’ departments and business units are independent of each other. For better Optimization and cost savings, data masking can remain a common platform across the company. A holistic plan will
- Need to create a clear Data Security rule and hierarchy with clear responsibility and roles of stakeholders across Business units and departments
- Demarcate non-production and production environment problems. Production environments at runtime requires access to sensitive or real data, but which is different from non-production environments.
Sequence and Prioritize the Implementation
A combination of fine processes, data flow, applications and data stores will help in creating a successful implementation roadmap. Identifying data source systems and prioritizing implementation based on optimal categorization across applications or data stores.
Solution Evaluation with focus on Seamless integration
There are thousands of solution out there, but there might not be a single solution for all your challenges or problems. We need to choose a solution that can be reused in terms of people, processes and technology for easy seamless integration. Simple guideline to establish a right mix of solution options should have
- Requirements assessment – Assessment on the amount of Requirements that can be worked in-house compared work that requires customization and need external help
- Assessment of cost and effort for Implementation – While cost is one of the key differentiator, the effort required for implementing should be in-line with our organization’s need
- Skills required for execution and maintenance – Ability to reuse existing Organization skills across data stores and file systems is critical to seamless integration
Guidelines for Algorithm
The evaluation of masking algorithms should give priority to the risk associated with the information exposure and then to the application and system dependencies. In some scenarios, it could be wise to mandate specific algorithm choices even if applications or processes have to go through small changes.
Automating – Data Validation process
The implementation can be made more effective with the help of automation for data validation processes. The data validation scripts can be reused multiple times as per our need because it is based on the rules and conditions that went into the data masking process.
In Non Production environments process of creating data has changed due to Data Masking routine.
Operational framework includes
- Handling new data stores and applications
- The process of selecting an algorithm for a sensitive data
- Guidelines for creating new algorithms
- Identifying data validation related requirements
- Identifying integration touch points for executing data masking as offline processes along with a current refresh process
Data Masking Best Practices
- 90% of the data in the world today has been created in the last two years. Companies need to cope up with the speed to ensure all confidential data are masked and delivered securely with pace.
- Data masking tools should be consistent in masking all the data from different sources so that the relationships between values remain same after transformation
- Any data masking plan should concentrate more on data obfuscation techniques on non-production (development, testing, backup, and analytics) environments to achieve full coverage.
- Any good data masking process should use irreversible transformation for confidential data so that there is no way to get back the data even if the environment is breached or compromised.
- Organizations should have a concrete procedure to identify sensitive data, use the right data masking technique, and then repeatedly assess all the sources to ensure data masking techniques are fool proof.