Attribute-Based Access Control (ABAC) for Data Lakes at Scale

Okera recently released Okera Active Data Access Platform (ODAP) 1.4. In this post, we’ll look at the following new capabilities in this release:

  • Business metadata support
  • Attribute-based access control (ABAC)
  • Easy access for everyone with native JDBC support

Understanding Attribute-Based Access Control

Just as business context needs to be stored explicitly in the data catalog, access policies are easier to understand when they’re based on this business context. In order to understand the concept of ABAC, consider the following examples:

  • Option 1: Implicit security context that’s technically-oriented
    • Grant users in “NA Sales” LDAP group access to:
      • Databases: Sales01, SFDCExport, sfnaenf01
      • Tables: mkto.leads
      • Views: mkto.leads_usa
  • Option 2: Explicit business context
    • Grant users in “NA Sales” LDAP group all data classified as:
      • Department = Sales
      • Year = 2019
      • Security = Low

As data lakes have become large, security professionals can no longer rely solely on enforcing access control based on implicit business context; it’s too difficult to remember which columns are sensitive, for example, without explicitly storing this context in the catalog. That makes Option 1 really difficult to manage for petabyte-scale data lakes.

Now that the data lake catalog has become the single-source-of-truth for business context, ODAP 1.4 can rely on this business context for defining access policies. This is called attribute-based access control or ABAC for short.

ODAP 1.4 New Capabilities: Business metadata, Attribute-based Access Control & Native JDBC Connectivity

Okera’s ODAP 1.4 extends the data lake security model to support attribute-based access control. There are several components to this new capability, and we’ll look at each component individually.

Business metadata

ODAP 1.4’s schema registry now allows you to tag tables and columns with business context, such as “PII” or “sales.” It’s easy to assign these tags in a variety of ways:

  • You can assign tags manually in the schema catalog
  • You can assign tags automatically on schema registration through the dataset registration wizard
  • You can use Okera’s auto-tagging capability that scans data sets and applies tags based on whether columns are credit card numbers, social security numbers, phone numbers, email addresses, and so on.
  • Okera is also building rich APIs to make it easy to integrate with all the leading catalog vendors

Defining policies with ABAC

Policies are defined using Okera’s sophisticated access control syntax. Starting in ODAP 1.4, you can use ODAP’s powerful capabilities for attribute-based access control, in addition to role-based access control.

ODAP 1.4 is available today. If you’d like to learn more about this release—including a host of other new capabilities—please contact us at sales@okera.com or sign up for our webinar “What’s new in ODAP 1.4: How to control access to sensitive data at scale in your data lake.”

ODAP 1.4. Capabilities Based on Industry Trends

  1. Data lake security has become a business imperative

Businesses can no longer afford to treat data lake security as an afterthought. With increased regulation such as GDPR and CCPA, organizations risk stiff penalties for mishandling sensitive data. Yet the task of protecting data lakes is becoming more and more daunting for security professionals for the following reasons:

  • Data lakes are getting larger. A typical data lake contains more than several petabytes of data. With such massive amounts of data, it has become more and more difficult for security professionals to remember where sensitive data is located, let alone keep data organized for end users to find, trust, and analyze.
  • Legacy approaches to security don’t scale to the diversity or size of petabyte-scale data lakes. It used to be enough to remember that all the sales-related data is stored in the database called “SalesDB,” and if a field was called “SSN” it contained a social security number. Today, data sets are coming from thousands of data sources— both internal and external to the organization, so it’s next to impossible to enforce (let alone remember) such rigid naming conventions in modern data platforms. Data lakes require much more sophisticated ways of explicitly curating and storing business context for both regulatory and discovery purposes.
  • Privacy regulation is getting more complex: each new regulation introduces new kinds of requirements. GDPR introduced “right to erasure,” whereas CCPA introduces a “look back provision.” To avoid heavy penalties, Data Lake security professionals need to stay abreast of emerging regulations and develop a holistic security approach that keeps them ready to tackle whatever is coming next.

 

  1. Catalog has become the single-source-of-truth for business context in the data lake

To address the challenge of providing a way to explicitly curate petabyte-scale data lakes, companies such as Alation, Collibra, and Informatica have built data lake catalogs. These products are useful for both end users and security professionals alike: end users rely on the data catalog to find, trust, and learn how to use data sets, whereas security professionals use the catalog as the single-source-of-truth for security-related context, such as tagging a column as “PII” (personally-identifiable information) — whether the column is called “ssn,” “SocialSec,” or “TaxID.”

Most data catalogs store many different kinds of business context:

  • Tags: such as “PII” or “sensitive”
  • Key-value pairs: such as “Department = Sales,” “RetainUntil = 5/15/2020,” or “Steward = data.stewart@okerainc.com
  • Glossary definitions: free-form text that provides deeper guidance for data consumers on the meaning of the table, column, etc.

Rather than remembering the implicit business context of each database, table, and column, wouldn’t it be nice to simply define access policies based on the business context that’s explicitly defined in the data catalog? That’s exactly what ODAP 1.4 lets you do!