The valuation of data governance, both to the enterprise and to data management as a whole, is evinced in two of the most discernable trends to shape this discipline in 2023.
The first is that the very term itself has been all but appropriated by vendors specializing in access management, data controls, and internal aspects of enterprise security. These are the providers concentrating on regulatory compliance, data privacy, and data protection—which has rapidly become the stuff that can make or break contemporary businesses in today’s hyper-regulated environs. Even a cursory examination of the capital dedicated to these companies proves this fact.
The second trend is the real-time applicability of data governance to a widening array of circumstances, use cases, and market conditions, all of which require it to become more mutable than ever to fit these needs as they arise. Organizations are acknowledging the difficulty of attempting to exhaustively determine—in advance—every possible data governance contingency and prepare for it accordingly.
Instead, they’re now attempting to tailor data governance constructs so they can dynamically adjust to such situations as they occur.
“There’s a distinction between knowing what could happen (often data governance is about expressing that), what did happen, and what is happening,” observed Ralph Hodgson, TopQuadrant CTO. “What is happening is a very difficult problem to solve on the operational side of data governance. If you have the possibility of a digital twin of a business, that’s where the future is with the ‘what is happening’ idea of data governance.”
Although digital twins of entire organizations aren’t yet pervasive throughout the data landscape, many of the fundamental aspects of data governance—when applied through the lens of real-time access controls and situational adaptability—can simulate their ability to detail what’s happening in the moment.
It’s just a logical progression from that knowledge to controlling and exploiting it to achieve governance objectives.
Metadata management will likely always persist as the nucleus of data governance. Organizations can choose from a multitude of tools designed to optimize this task. There’s an abundance of data catalogs, Master Data Management tools, and what Privacera CEO Balaji Ganesan termed “sensitive data catalogs” that automate aspects of data discovery and classification via various metadata models. According to Hodgson, there are six capital forms of metadata that border prominent areas of data governance, including:
- Data Expression: This dimension pertains to “how is the data expressed; what data types; what data expressions; is it quantifiable; does it have units of measure, etc.” Hodgson revealed.
- Data Quality: Quality data is integral for trusting data and encouraging business adoption of those data.
- Usage: This area deals with data’s importance, data security, and data confidentiality.
- Data Stewardship: Data stewardship entails “ownership, metrics, accessibility,” Hodgson detailed.
- Regulatory Compliance: Metadata about regulatory compliance oftentimes informs policies and standards.
- Data Provenance: Data lineage or data provenance indicates data’s origination and enterprise journey.
According to Gartner, metadata has gone from passive to active to inform real-time use cases like data integration for data fabrics. Thus, one of the most critical facets of the metadata (and accompanying data governance constructs) Hodgson described is they “all share a common need to express relationships between things,” Hodgson said. Clear understanding of how the elements within these areas of data governance interrelate enables organizations to adjust them to meet new requirements, data sources, or use cases.
The precept of what Hodgson called “meta-relations” is integral to dynamically modifying data governance components to meet emerging circumstances and business conditions. Conceptual data models consist of these relationships, their definitions, and the semantics that disambiguate them—between departments or applications, if necessary. Such data models assist with everything related to data governance, from rendering data access controls to facilitating lifecycle management necessities (like retention policies). Well defined conceptual data models are perhaps the starting point for adapting data governance protocols to meet arising situations. To that end, such models are comprised of a specific domain (what the model is about) and discipline. “You could have chemistry for nurses or electrical engineering for computer science,” Hodgson pointed out. “There’s a distinction between domain and discipline.”
Other dimensions include enterprise viewpoint on the subject, the model’s level of specificity, its aspect, and temporal information. Specifying these elements of a model with the requisite data identifiers, terminology systems, and schema, makes it easy to combine them for inter-department analytics, that between source systems, customer 360s, data privacy needs, and more. TopQuadrant CEO Nimit Mehta articulated a use case in which there were several governmental agencies dedicated to wildfire prevention, each with its own terminology. “When you’ve got all those different religions, how do you get them all talking the same language?” Mehta asked. “Instead of centralizing it and doing a ‘thou shall speak the same way’, graph allows you, in a standards-driven way, to create a meta-model and enable those federated linguistics to remain where they are.”
Distributed Data Stewardship
The distribution of the data landscape, and its effect on data stewardship, will continue to be the foremost challenge for this realm of data governance next year. In addition to the increasing prevalence of cloud and multi-cloud computing, architectures such as data fabric and data mesh, in particular, are compounding this issue. Granted, it impacts all aspects of data governance, from lifecycle management to metadata management. Nonetheless, as Ganesan rightfully noted, “this is where the data governance comes in: inside the company. How are they viewing and treating that data?” To that end, investment in solutions specializing in scaling the ability to deliver governed data access—while simultaneously reducing the amount of policy enforcement measures across sources—aren’t likely to abate in the new twelvemonth.
These gains are achieved in a variety of ways, including by “giving a single pane of glass where you can manage all policies [and the platform] enforces them,” Ganesan indicated. Policy reduction is achieved via Attribute Based Access Control (ABAC) and its corollary, Purpose Based Access Control (PBAC), both of which are influential for maintaining fluid, responsive data governance. According to Immuta CTO Steve Touw, “Tagging data and pushing policies based on data tags is not ABAC. That’s a component of ABAC. The real power of ABAC is making [access] a dynamic runtime decision instead of a pre-computed, role-based decision.” PBAC builds on this advantage by granting access only for specific purposes, such as working on a certain report, for example.
Data Privacy and Regulatory Compliance
Both ABAC and PBAC are foundational to fulfilling regulatory compliance, particularly in relation to horizontal data privacy mandates, and demonstrating these facts to regulators. “You’re not only getting the controls and the masking, but you’re also getting legal oversight as far as getting your users to agree they’re only going to use things for certain purposes, and acting under that purpose when they access the data,” Touw stipulated. The dynamic runtime decision-making he mentioned is primed for modifying data governance constructs for situations such as complying with simultaneous regulations for a specific use case, or even facilitating data access during mergers and acquisitions.
The demonstration of regulatory adherence for this and other such use cases is bolstered by data provenance, which is also universally applicable to numerous facets of data governance. Log files are generated of who accessed which data, where, when, and with PBAC and other methods, for what purpose. Perhaps the broader ramifications of data lineage as applied to regulatory compliance and other dimensions of governance pertains to the context it delivers—which helps inform the ability to modify governance concepts to meet changing circumstances. “It’s not just where did the data come from, from a lineage perspective, but who are the users that are using the data; why are they using it?” indicated Matt Vogt, Immuta VP of Global Solution Architecture. “All those bits of context are important.”
Quality of Data
The increasing distribution of the data sphere—in addition to the varieties of largely unstructured data available—make data quality a prerequisite for well governed data. “The governance part is around metadata, quality, and the access part to reduce friction for users to find and use that data,” Ganesan reflected. Although there is a plenitude of metrics pertaining to data quality, the core ones inevitably relate to “completeness, correctness, clarity, consistency, and things like that,” Hodgson explained.
Similar to how there is an influx of automation involving statistical and non-statistical Artificial Intelligence for discovering and classifying data, there are such mechanisms to aid in pinpointing areas in which data quality is remiss—and to rectify them. Fuzzy matching and exact matching can also provide these benefits. Consequently, such modern data quality mechanisms “use machine learning where we can to suggest mappings to a vocabulary and then the vocabulary, in a form of glossary, for example, can express rules for consistency,” Hodgson said.
Situational Awareness, Real-time Responses
As Hodgson previously denoted, the ideal for data governance is a real-time model of the business and its data processes to ensure the longstanding value of the latter while enriching the former. Implicit to this paradigm is the ability to modify data models, permissions, terminology, and even governance policies, if need be, in a cohesive manner that provides business value while diminishing risk. Dynamic access control capabilities, active metadata, and fluid, distributed data stewardship can facilitate these gains.
Sooner than one thinks, it will be obligatory to implement such mutable forms of data governance.
About the Author
Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW