“Data Provenance”: Navigating Ownership, Authenticity, and Rights in the Digital Age
Privacy Plus+
Privacy, Technology and Perspective
“Data Provenance”: Navigating Ownership, Authenticity, and Rights in the Digital Age. This week, let’s concentrate on an increasingly crucial concept—Data Provenance. This involves tracing the lineage of data — its origins, transformations, and ownership – and perhaps also the permissions that its original owners have given for further use.
What Exactly is Data Provenance? The phrase “Data Provenance” appears in at least three (3) specialties. (We will propose a fourth at the end of this post.)
In art law, “provenance” is the documented lineage of ownership, custody, or location of an artifact or work of art. Tracing a piece’s “provenance” is akin to tracing its chain of title in property law, but with added layers of cultural and historical significance. Particularly, museums, art dealers, and others study “provenance” to ensure their pieces have not been stolen or unlawfully removed from foreign lands.
In technical fields, governmental organizations like the Network of National Library of Medicine (NNLM) and the National Institute of Standards and Technology (NIST) use “data provenance” to describe the important process of regulating the integrity of research data, as the data moves from place to place – similarly akin to a “chain of title:”
The term “data provenance,” sometimes called “data lineage,” refers to a documented trail that accounts for the origin of a piece of data and where it has moved from to where it is presently. The purpose of data provenance is to tell researchers the origin, changes, and details supporting the confidence or validity of research data. The concept of provenance guarantees that data creators are transparent about their work and where it came from and provides a chain of information where data can be tracked as researchers use other researchers’ data and adapt it for their own purposes.
See https://www.nnlm.gov/guides/data-glossary/data-provenance
In law enforcement, the term “data provenance” is also very close (if not identical) to “chain of custody.” It involves tracing the origin of a piece of information processed by community resources.
See https://csrc.nist.gov/glossary/term/data_provenance
In essence, the term “data provenance” captures the comprehensive history of digital data, detailing its origin, movements, and any changes made along the way. Think of it as the digital counterpart to traditional “provenance,” ensuring the reliability and authenticity of data.
Legal Implications of Data Provenance: The concept of data provenance implicates numerous legal issues, which are taking on increased importance as artificial intelligence develops. These include:
Authenticity & Misrepresentation: Just as with art, a detailed data provenance helps verify the authenticity of a data set, protecting stakeholders from erroneous or falsified information and shielding entities from potential claims of misrepresentation.
Tagging & Due Diligence: Tagging, or the practice of annotating data, aids in providing context, ensuring proper data lineage and enabling more effective data governance. This assists in the assessment of data during transactions or investigations.
Access Rights & Data Protection: Properly documented data provenance aids in determining who has had access to particular data sets and/or data elements. This becomes pivotal where there is controversy about ownership or creation and in cases of breaches involving personal data, trade secrets, intellectual property, or confidential business information.
Challenges in Data Provenance from a Legal Standpoint: While provenance is important, tracing and proving data “provenance” is easier said than done:
- Volume & Complexity: The sheer volume of data and its dynamic nature can make comprehensive tracking a Herculean task.
- Interoperability: As data moves across platforms and systems, ensuring consistent tagging and lineage can become a complex endeavor.
- Trade Secrets & Confidentiality: Balancing transparency in data provenance with the need to protect trade secrets and confidential business information remains a tightrope walk.
- Protecting against Bad Actors: Along the way on its journey, data must be secured in ways that prevent bad actors from changing not only the data itself but also its history (including, for example, making up “consent” which the original data owner did not give).
- Expense: Sometimes it may be simple, but all of the above may combine to make tagging or other means to prove provenance extremely expensive.
Other Issues:
- The GDPR in Europe, and similar data protection regulations globally, emphasize the importance of data mapping to understand data lineage, especially when personal data is concerned;
- Intellectual Property issues may come into play, especially when data is derived from proprietary algorithms or contains trade secrets;
- In the U.S., state, FTC, and some federal privacy requirements are increasingly tightening the “consents” that must be obtained before personal data may be further used or disclosed, beyond first-person permission for limited purposes;
- We see growing frustration with regulators trying to govern this increasingly complex area with detailed prescriptions that don’t fully answer today’s questions before they’re obsolete tomorrow; and
- Perhaps as a result, we see more and more contracts that try to “pass the hot potato” to the other party, by disclaiming any warranties not only of accuracy, but also of title, permission, consents, and lawfulness and placing the entire risk of use of personal data on the other party.
Our thoughts: We propose, therefore, a fourth (4th) specialized application for “Data Provenance:” regulating the privacy of personal data. Specifically, ”Data Provenance” concepts can be used to ensure that data has been lawfully collected and that further use or disclosure has received the consent of the data subject (original owner) and contractual and technical protections provided by the original data controller. In this context, personal data would likely be tagged by attaching metadata or labels to data elements to track their origin, any modifications made to them, and their journey through various systems and processes. Such tagging could be integrated with blockchain technology, so that Data Provenance could become not only a tool for tracking the journey of personal data but also a transparent and tamper-proof system that can help maintain the integrity and trustworthiness of personal data. Yes, it would be expensive, complex, and take time to become accepted practice. But we also suspect that once accepted, it could make databases of verified personal data much, much more valuable.
---
Hosch & Morris, PLLC is a boutique law firm dedicated to data privacy and protection, cybersecurity, the Internet and technology. Open the Future℠.