Accelerating time-to-insight for the modern digital enterprise

The enterprise data catalog is a powerful tool for any data-driven organization. It empowers data consumers to quickly discover and understand data so they can generate impactful insights that drive business value. On the other side, it enables the governance team, whose responsibility it is to document, curate, and classify data so that the organization’s governance policies and security classification standards can be enforced correctly.

The actual enforcement of these data access policies, however, currently falls to IT or the data platform team, who are the only ones with either the specific technical knowledge or the correct access to interact with the source systems. Because of the proliferation of users, data, storage systems, and analytics tools in the modern enterprise analytics platform, as well as an increased number of privacy regulations governing sensitive data and its use cases, IT are forced either to engineer bespoke enforcement solutions for each tool or create multiple copies of each dataset (one per use case). Ultimately, this results in more of the data being locked down in an attempt to standardize access to the lowest common denominator.

This disconnect between the data governance and platform teams results in often-inconsistent policy enforcement, and an operational bottleneck that ultimately slows down the work of data analysts and data scientists. In order to generate business value, data consumers need to be able to find the right dataset, understand its context, trust its quality, access it in the tool of their choice with the right data access and governance policies applied.

Last-mile-image

Organizations seeking to accelerate their time-to-insight on new data need to close this gap and ensure that data consumers can easily get access to data regardless of where it’s stored, with all access policies dynamically and consistently applied. Even with all the value that the enterprise data catalog provides, the organization is still stuck at the “last mile” trying to enable secure data access.

Solving for the Last Mile: Okera and The Collibra Data Catalog

The Okera and Collibra integration closes the gap between governance and central IT, enabling faster time-to-insight and increased agility.

Okera’s Collibra Connector offers a two-way automated sync:

  1. Imports business metadata such as data classification attributes from Collibra into Okera to enforce attribute-based access control (ABAC) policies
  2. Exports technical metadata from Okera’s unified metadata layer back into Collibra to enhance the data catalog and help the governance teams understand their data.

Okera-Collibra

The integrated solution improves governance, saves time, and eliminates repetitive work by:

  • Automating policy enforcement: Policies can be defined in Okera leveraging classification of attributes from Collibra. These policies will then be automatically enforced at query time. This allows the enterprise to leverage their pre-existing workflows and attributes, and saves the governance team and/or data stewards from doing repetitive work.
  • Creating a unified metadata layer: Collibra is a rich source of manually curated business metadata, while Okera focuses on technical metadata with automatic discovery and tagging of sensitive data.
  • Improving responsiveness to regulatory change: Because policy enforcement happens dynamically within Okera, any policy changes made in Collibra because of changes in regulations or internal governance standards are transparently enforced by Okera the next time an analyst runs a query.
  • Scaling data classification: Okera’s automated tagging of sensitive data provides data stewards with a head start in classifying data, enabling them to quickly understand whether a dataset contains sensitive data and what type, so they can map it back to the organization’s governance standards.

Case Study

One of our joint customers, a leading US financial institution, was able to significantly shorten the time-to-insight for their cloud-native data lake by integrating Okera and the Collibra Data Catalog. Their governance team was already leveraging Collibra’s catalog to classify data according to their governance levels (e.g. highly confidential, confidential, sensitive, public) across all their existing data stores. As they built out their modern data platform, they deployed Okera as the foundational unified access layer across data sources to ensure consistent policy and access management at scale.

Instead of having to create a new governance workflow for the data lake, they were able to use Okera’s Collibra Connector to automatically sync data classification from Collibra to Okera in order to define and enforce dynamic ABAC policies across all their datasets and analytics tools. This customer also decided to leverage AWS Glue as their central technical metadata repository, and Okera was also able to seamlessly integrate with Glue to enable the access portion.

Here’s what the end-to-end flow of adding a new dataset looks like:

  1. New data lands in the Amazon S3 data lake as the result of some ETL.
  2. Technical metadata is automatically registered inside the AWS Glue catalog.
  3. Technical metadata is synced from Glue to Collibra via API.
  4. Data steward begins the approval workflow in Collibra and adds data classification attributes on the various fields. Take the example below for a sample credit card transactions table:Collibra_transactions
  5. Once this dataset is approved in Collibra, those classification attributes are automatically synced to Okera. Users with the correct policy will now be able to query that dataset with ABAC policies enforced.

As an example of these access policies: All users can see columns with the public classification (see card_public role). If they need confidential data, they can apply to be added to the correct Active Directory group and receive access to the columns classified as confidential; data classified as highly-confidential will be masked. (see card_confidential role).

Okera_transactions

This is how the data would look for an analyst with the card_public role querying this dataset in Tableau through Okera. They only see columns that were classified as public in Collibra.

Tableau_public

Now this is how the same data would look for that same analyst once they’ve been added to the card_confidential role. Notice that they now see confidential columns, and anything highly confidential is masked.

Tableau_confidential

You can see the integration in action below:

A process to add a new dataset that previously may have taken weeks to months – registering the technical metadata, classifying it in the business catalog, and then getting stuck in the gap between the governance and platform teams figuring out how to provision secure access – can now be enabled in a matter of hours.

Together Okera and Collibra were able to deliver agile data governance with minimal change management, which allowed the customer to hit the initial milestones for the data lake initiative significantly faster and therefore accelerate the adoption of their data lake. The customer was able to get the best of both worlds: a robust business catalog in Collibra, and data access and policy enforcement through Okera.

Okera’s Collibra Connector is now available to new and existing customers. We’re excited to enable agile data governance and automation of security and privacy enforcement at enterprise scale.