Bill C-11, the now defunct privacy bill which was set to replace the current Personal Information Protection and Electronic Documents Act (PIPEDA), introduced various concepts to Canadian law in an effort to “modernize” the rules. One of the concepts introduced in Bill C-11 was related to de-identified information.
In this blog, we outline what “de-identification” means in law and practice, how it differs from anonymization, and how it may contribute to effective data management practices.
Currently, main sources in Canadian law exploring the definitions of “personal information” cover the definitions of de-identified and anonymized information under Bill 64, the definition of “de-identify” under the Personal Health Information Protection Act, 2004 (PHIPA), and the “serious possibility” test from Gordon v. Canada (Minister of Health), 2008 FC 258, which is widely cited by Canadian regulators and courts.
What is de-identified information?
Generally, de-identification means the modification of personal information so that the information does not identify an individual, or cannot be used in reasonably foreseeable circumstances, alone or in combination with other information, to identify an individual. De-identifying information will usually only bring the information and associated uses of data outside the scope of Canadian privacy regulation if the target dataset is made anonymous, meaning that it is permanently stripped of identifying information such that it is no longer possible to identify (or re-identify) an individual. Unlike anonymous information, de-identified information is information that can possibly be used to re-identify an individual, and the practice of de-identifying information refers to a variety of scenarios in which data transformation techniques are applied to increase data security and protect privacy. Some examples of de-identification practices include:
- Redaction – Erasing or expunging sensitive data from a record;
- Suppression – Removing data (e.g., from a cell or row in a table, or data element(s) in a record) prior to dissemination, to prevent the identification of individuals in small groups or those with unique characteristics;
- Blurring – Reducing precision of data by combining one or more data elements. This includes:
- Aggregation: Combining individual subject data with a sufficient number of other subjects to disguise the attributes of a single subject (e.g., reporting a group average instead of an individual value).
- Generalization: Collecting or reporting values in a given range (e.g., using age or age-range instead of date of birth); including individual data as a member of a set (e.g., creating categories that incorporate unique cases); or reporting rounded values instead of exact amounts.
- Masking – Replacing one data element with either a random or made-up value, or with another value in the data set. This can be done manually or by using an algorithm, and is often associated with the following techniques:
- Pseudonymization/Coding: Replacing a real name with a made up name or a real value with a made-up value.
- Perturbation: Replacing sensitive info with realistic but inauthentic data (synthetic data) or modifying original data based on predetermined masking rules
- Scrambling/Encryption: Data is algorithmically scrambled and only those with access to the appropriate key can view the encrypted data
- Noise and Differential Privacy: Statistical technique that introduces errors by randomly misclassifying values of categorical variable(s).
The decision to use such techniques to modify data is generally a strategic business decision. While these techniques may confer benefits for the protection of this information and potentially minimize liability related to the information, businesses will try to ensure that the resulting information in its de-identified form remains valuable and useful for secondary purposes.
How should organizations employ de-identification techniques?
As we allude to above, the identifiability of information lives on a spectrum, with one end representing information it its identifiable form (i.e., information about an identifiable individual), and the other end representing scenarios in which information can no longer identify a person (anonymous information). De-identified information exists in between. To satisfy privacy obligations, organizations should employ de-identification techniques in consideration of this spectrum in a way that is commensurate to the privacy risks associated with the sensitivity of the data they collect, and the ways in which such data is used and disclosed. While this is a general principle that organizations should consider when employing de-identification techniques, different contractual considerations will also need to be implemented to address nuances in Canadian legislation; specifically, statutory differences set out in Québec’s Bill 64, and federal legislation (assuming the next iteration of Bill C-11 is similar to the now defunct version).
Bill C-11 only included a definition for de-identification, whereas Bill 64 defines both de-identification and anonymization. This difference is complicated by the fact that Bill C-11’s definition of de-identification corresponds with Bill 64’s definition of anonymization. The manner in which Bill C-11 addressed de-identified information suggested that this information would have fallen within the scope of the proposed bill, which is similar to the Québec model with Bill 64. In Bill 64, information is de-identified when it no longer allows the person concerned to be directly identified, whereasinformation is anonymized when it no longer allows the person concerned to be directly or indirectly identified, irreversibly. There is also a fine imposed to organizations that attempt to re-identify a natural person using anonymized data.
Ontario has also begun its effort to implement a private-sector privacy law. In the government’s White Paper, the government says it intends on distinguishing the two concepts, de-identified and anonymous information, and it wants to ensure that anonymized information is removed from privacy rules altogether in order to expand the risk-based approaches to data use and incentivize the use of anonymized data. One of the main reasons for this approach is to promote the use of data transformation techniques, which could help reduce residual risks of harm to individuals while enabling organizations realize the value of their data.
What should organizations consider as part of their data strategies?
While articulated differently in each, both Bill C-11 and Bill 64 carve out exceptions for organizations to use or disclose de-identified information for certain purposes without the knowledge and consent of the individual whose information has been de-identified. These purposes are generally confined to internal research and development, socially beneficial purposes, or prospective business transactions. Accordingly, there may be significant business incentives for transforming personal information to non-personal information, for example, in order to use it for secondary purposes such as research or statistics.
In addition, de-identification may act as a valuable safeguard for the organization to protect information and to avoid potential breaches, for example, to protect against attacks from malicious actors. Finally, in most cases, anonymous information falls beyond the reach of the law, which makes it free to use for any purpose.
To start, de-identified information remains personal information and therefore is within the scope of privacy law and simply benefits from a specific safeguard, while anonymized information is either outside the scope of privacy law because it no longer relates to an identifiable individual, or it is subject to very high level conditions such as those under Bill 64 (i.e., that it be used for “legitimate and serious purposes”).
While de-identification is a valuable safeguard that organizations can implement to lower privacy risks related to the misuse or unauthorized access and disclosure of personal information, it is important to remember that de-identification is not anonymization, and the risk that the information may be re-identified could render your organization liable in certain circumstances. Therefore, it is important that organizations apply appropriate de-identification techniques to their data, once it has served the purpose for which it was originally collected, and also if the data will be used in the pursuit of secondary purposes (e.g., R&D, socially beneficial purposes, etc., as set out above). Data transformation techniques such as generalizing, suppressing, or adding noise, coupled with privacy, security and contractual controls, can mitigate the re-identification risk.
Moreover, when an organization engages in data transformations in an attempt to de-identify or anonymize the personal information, the organization may be subject to the regulators’ scrutiny regarding the purpose and the methodology used. There is a risk that regulators do not agree with the methodology or the acceptance of risk employed by organizations in their de-identification and anonymization practices. For example, an organization may consider a data set anonymous and not subject to Canadian privacy laws, while a regulator may disagree and consider the data set as de-identified, but still subject to Canadian privacy law, and as a result, the organization would be obligated to comply with statutory requirements regarding this data set. In addition, organizations would be well-advised to carefully document their due diligence when using anonymized information for “legitimate and serious purposes”, as the uses may come under scrutiny by the regulator.
Ultimately, organizations need to think strategically about how they de-identify information and whether the application de-identification techniques will provide value for either compliance or business purposes.