Okera recently released Okera Active Data Access Platform (ODAP) 1.4. In this post, we’ll look at the following new data access control capabilities in this release:
- Business metadata support
- Attribute-based access control (ABAC)
- Easy access for everyone with native JDBC support
Understanding Attribute-Based Access Control
Option 1: Implicit security context that’s technically-oriented
Grant users in “NA Sales” LDAP group access to:
- Databases: Sales01, SFDCExport, sfnaenf01
- Tables: mkto.leads
- Views: mkto.leads_usa
Option 2: Explicit business context
Grant users in “NA Sales” LDAP group all data classified as:
Department = Sales
Year = 2019
Security = Low
- You can assign tags manually in the schema catalog
- You can assign tags automatically on schema registration through the dataset registration wizard
- You can use Okera’s auto-tagging capability that scans data sets and applies tags based on whether columns are credit card numbers, social security numbers, phone numbers, email addresses, and so on.
- Okera is also building rich APIs to make it easy to integrate with all the leading catalog vendors
Defining policies with ABAC
ODAP 1.4. Capabilities Based on Industry Trends
1. Data lake security has become a business imperative
Businesses can no longer afford to treat data lake security as an afterthought. With increased regulation such as GDPR and CCPA, organizations risk stiff penalties for mishandling sensitive data. Yet the task of protecting data lakes is becoming more and more daunting for security professionals for the following reasons:
- Data lakes are getting larger. A typical data lake contains more than several petabytes of data. With such massive amounts of data, it has become more and more difficult for security professionals to remember where sensitive data is located, let alone keep data organized for end users to find, trust, and analyze.
- Legacy approaches to security don’t scale to the diversity or size of petabyte-scale data lakes. It used to be enough to remember that all the sales-related data is stored in the database called “SalesDB,” and if a field was called “SSN” it contained a social security number. Today, data sets are coming from thousands of data sources— both internal and external to the organization, so it’s next to impossible to enforce (let alone remember) such rigid naming conventions in modern data platforms. Data lakes require much more sophisticated ways of explicitly curating and storing business context for both regulatory and discovery purposes.
- Privacy regulation is getting more complex: each new regulation introduces new kinds of requirements. GDPR introduced “right to erasure,” whereas CCPA introduces a “look back provision.” To avoid heavy penalties, Data Lake security professionals need to stay abreast of emerging regulations and develop a holistic security approach that keeps them ready to tackle whatever is coming next.
2. Catalog has become the single-source-of-truth for business context in the data lake
- Tags: such as “PII” or “sensitive”
- Key-value pairs: such as “Department = Sales,” “RetainUntil = 5/15/2020,” or “Steward = email@example.com”
- Glossary definitions: free-form text that provides deeper guidance for data consumers on the meaning of the table, column, etc.