Data Protection

A practice that involves safeguarding data from possible loss, corruption, or compromise by natural or exogenous threats

What is Data Protection?

Data protection is the practice of safeguarding data from loss, corruption, or compromise. It involves ensuring the privacy and safeguarding of data from compromise. Data protection includes factors such as data integrity, data privacy, protection from errors and corruption, and guidance on the use of data by businesses.

The concept of data protection applies to personal, business, public entities, and political and international data. It is becoming more important as the volume of data collected and stored digitally on online platforms is rising.

 

Data Protection

 

Why Data Protection?

  • Protect personal data from being misused, mishandled, or exploited
  • Ensure that the fundamental rights and freedoms of providers of data are being upheld
  • Ensure fair and friendly practices in commercial activities between consumers and businesses
  • Placing responsibilities with organizations on handling personal data
  • Provide greater control and understanding to individuals over how their data is collected and used

 

Data Protection Principles

Data protection principles fall under the General Data Protection Regulation (GDPR), an EU regulation on data protection and privacy. The GDPR’s main aim is to give people control of their data and simplify the regulatory environment for international business.

The GDPR’s seven principles deal with the lawful processing of personal data. Data processing encompasses the following processes: data collection, organizing, storage, structuring, altering, consulting, usage, erasure, destruction, communicating, or restricting.

 

GDPR Protection Principles

 

1. Lawful, Fairness, and Transparency

The processing of data by organizations should be lawful, fair, and transparent. Organizations should ensure that their data processing activities do not break the law. The lawfulness part entails being knowledgeable of GDPR data collection rules.

The transparency part means what is being done to the data is not hidden from data subjects, i.e., all the information and communication concerning personal data usage be easily accessible, clear, and easily deciphered. Hence, it is strongly recommended that an organization publicly state its privacy policy, indicating the nature of data collected and the reasons behind it.

 

2. Purpose Limitation

Data should be collected for a specific, explicit, and legitimate purpose. The collection purpose should be clearly articulated, and no additional processing should be done that is incompatible with that purpose.

However, data processing for archiving purposes in the public interest or for scientific, historical, or statistical purposes is not deemed incompatible with initial purposes and hence is given more freedom.

 

3. Data Minimization

The processing of personal data must be adequate to what is necessary concerning the purpose of the collection. It ensures that in the event of a data breach, cyber-criminals will be able to access a limited amount of data. Data minimization also enables keeping accurate and up-to-date data.

The processing of personal data should only be conducted if the purpose cannot be achieved by other means. Therefore, data minimization entails ensuring that the period for storing personal data is limited to a bare minimum.

 

4. Accuracy

Organizations must ensure that personal data is accurate and reasonable steps must be taken to ensure that inaccurate and incomplete personal data is erased. Data that is not updated can end up being inaccurate, and data is as accurate as the purpose for which it is being processed.

Therefore, inaccurate data should be erased or rectified within the shortest possible time. Furthermore, at the point of collection, the data, source of the data, and when it was collected should form part of the data record.

 

5. Storage Limitation

Personal data should only be kept in the system for as long as it is necessary and should be deleted after it serves the purpose for which it was collected. Hence, it is necessary to regularly review collected data to see if it is still required for its initial purpose.

In addition, organizations should adopt a policy for retiring personal data, i.e., for a definite length of time. However, the time limit for storing personal data should depend on the reason for which it was collected and the type of industry in which an organization belongs.

 

6. Integrity and Confidentiality

Integrity and confidentiality deal with data security. It states that data should be processed in a way that ensures its security, including protection against unlawful or unauthorized access, accidental loss, and destruction or damage. Suitable technical or organizational measures should be taken to ensure data security.

The GDPR does not recommend any specific technical or organizational measures but leaves it to the organizations themselves in light of the ever-changing technological and organizational best practices. Some common techniques would be to encrypt personal data to ensure its security.

 

7. Accountability

Organizations must demonstrate compliance with the GDPR principles and show accountability in the handling of personal data protection. Hence, they must indicate responsibility in processing personal data. In addition, if organizations are not clear on how to comply with the GDPR principles, they should take appropriate training courses or consult a legal practitioner.

 

Data Protection Strategies

Organizations use a variety of strategies to ensure the protection and security of organizational data. Below are various ways in which data can be lost or stolen and the strategies deployed to address such challenges/failures.

 

1. Failure of Storage Systems

Media that store data can fail or become corrupted, leading to data loss. The strategy here is to ensure that data is available even in the event of media failure.

Synchronous Mirroring: One strategy to counter data loss is to use synchronous mirroring, where data is stored on both on-site storage and a remote site simultaneously. Mirroring ensures the two sites are identical.

RAID Protection: RAID (Redundant Array of Inexpensive Disks or Redundant Array of Independent Disks) is a good alternative as it requires less overhead capacity. Physical drives are combined into a logical unit that will work as a singular drive to the operating system. The way RAID works is that data is stored in different areas on multiple disks. Performance and protection are increased as the input/output operations overlap in a well-adjusted way.

Erasure Coding: The erasure coding technology is used in scale-out storage environments. It uses parity-based data protection systems, which write both data and parity across a cluster of storage nodes. All the nodes in a cluster help each other in replacing a failed node.

Replication: Replication is another strategy for scale-out storage in which data is mirrored to multiple nodes. It is simpler to erasure coding but consumes at least twice the capacity of the protected data.

 

2. Data Corruption

Snapshots: Snapshots can be used to restore data that is accidentally deleted or corrupted. Snapshots can be used with a variety of storage systems such as SQL Server and Oracle. They capture a clean data copy while the snapshot is running, and it enables recurrent snapshots that can be stored longer. Therefore, when data is corrupted or deleted accidentally, a snapshot can be loaded, and the data is copied back and replaced. Snapshots ensure minimal data loss and instant recovery.

 

3. Failure of Storage System

Snapshot Replication: Snapshot replication, where replication technology is built on top of snapshots, prevents the failure of multiple drives in data centers. Snapshot technology copies data structures that are altered from the primary storage system to an off-premises secondary storage system.  It is also used to replicate data that is available for recovery if the primary storage system fails.

 

4. Data Center Failure

Snapshot Replication: In snapshot replication, data is replicated to a secondary site. The only drawback is the exorbitant cost of maintaining a secondary storage site. However, the loss of a data center usually requires a disaster recovery plan in place to deal with its timely functional restoration.

Cloud Services: Cloud services entail using replication together with cloud backup services. It enables speedy recovery in the event of the breakdown of data centers by storing the most recent data copies pertinent in case of disaster.

 

Trends in Data Protection

 

Ransomware

Ransomware holds personal data hostage from consumers and forces them to pay a ransom to get it back. Over time, ransomware has become more sophisticated and advanced. It can infiltrate the system over a period of time such that when a backup is done, the backup will contain the ransomware as well.

IT experts are constantly working on countering ransomware. The inability to deal with malware means organizations will not be able to roll back clean backup data, leaving data unprotected. It is crucial to ensure backup data is protected.

 

Hyper-Convergence Infrastructure (HCI)

Hyper-Convergence Infrastructure (HCI) is a unified system that combines traditional data center elements with storage, compute, networking, and management. Data protection capabilities integrated into hyper-converged infrastructure are slowly replacing data center equipment, decreasing data center complexity, and increasing scalability.

Backup and recovery equipment that supports both hyper-converged and non-hyper-converged environments are offered by vendors. With HCI, you can build a private cloud, extend to a public cloud, or achieve a true hybrid cloud.

 

Copy Data Management (CDM)

The Copy Data Management (CDM) technology cuts down on the number of copies of data that need to be saved, which reduces the overhead for storage and data management. CDM also simplifies data protection, increases productivity, and lowers administrative costs.

 

Artificial Intelligence and Machine Learning

Artificial Intelligence and Machine Learning are being adopted to detect potential attacks before they materialize. They can play a big role in data protection measures.

 

Internet of Things (IoT)

Extra data protection measures are needed due to the interconnectedness of devices as attackers can find it easy to penetrate the system. Protection of data in inter-connected devices is crucial to ensure the privacy of users.

 

Related Readings

This has been CFI’s guide to Data Protection. To keep advancing your career, the additional CFI resources below will be useful:

  • Data Loss
  • Python (in Finance)
  • Knowledge Economy
  • Overheads