Simplify fine-grained data access control for Amazon EMR
Maximize the value of Amazon EMR while protecting confidential, personally identifiable, and regulated data
Okera’s data authorization policies are managed, enforced, and audited on a unified platform. There are no gaps or inconsistencies, no complex policy synchronization to manage, and no plugin inconsistencies to work around.
Okera’s data authorization policies are easy to write and manage because they are agnostic to the data platform. Define no-code policies through an intuitive web UI or programmatically via API. Once you have a policy in place, simply register the data to be governed.
All data access requests are automatically logged for every individual user, down to the exact query, timestamp, access method, sensitive data attribute, and whether requests are approved or denied. Every administrative task is also audited, so you know when policy and platform changes are made, and by whom.
Solution At A Glance
Take an agile approach to data authorization. Start with the end in mind. Complete one use case and accelerate your expansion as you learn from the Data Usage Intelligence that Okera provides.
1. Set business requirements: The security and governance teams contribute sensitive data attributes and data authorization policies. Learn about Distributed Stewardship.
2. Classify and register: Leverage machine learning to classify your S3 data. Then the data owner registers data with the policies defined in step 1.
3. Bootstrap: The platform owner configures Amazon EMR to enforce data authorization policies using Okera nScale.
4. Dynamic authorization: Users work with the Amazon EMR cluster as normal, using their preferred tools, and Okera dynamically authorizes each request.
5. Iterate: Optimize, improve, and expand with real-world data usage intelligence.
How to enforce and scale fine-grained access control on Amazon EMR: demo and real-life use cases
OCT 21, 2020 | 10 AM PDT / 1 PM EDT
How nScale Policy Enforcement Works on Amazon EMR
nScale Process Isolation for Security
Okera intercepts data requests sent to Spark, Hive, or Presto and authorizes queries off-cluster using metadata stored in Amazon Glue Data Catalog or Hive Metastore.
Authorized data access requests are then delegated to Okera nScale. Okera nScale is a very high-performance, distributed data access control layer that runs co-located on the EMR cluster in a secure, isolated process. Okera nScale receives temporary privileges to directly access data in S3 for the exclusive purpose of applying data authorization policies, such as applying row-level filters, hiding columns, masking data, and so on.
nScale Co-Location for Elasticity and Performance
Okera nScale co-location provides the elastic scalability needed for big data environments. As the secure data access control layer, it’s important that nScale remains in perfect sync on each node with your compute frameworks, such as Spark, Hive, or Presto. Simply bootstrap nScale to load as your Amazon EMR cluster scales up and down. Okera nScale handles secure data access and your compute framework takes care of the rest. It’s that easy.
Zero Trust Approach
With nScale process isolation, user code does not access S3, and thus does not have access to full fidelity data. Okera allows organizations to successfully implement a Zero Trust approach to data access, where the cluster does not have IAM access to S3. Instead, Okera provisions data dynamically and in the approved format according to current policies.
Cost Savings & Reduced Attack Surface
In practical economic terms, Okera can deliver significant cost reductions in AWS storage and processing fees. Okera can also significantly reduce the attack surface.
How? Instead of replicating and managing hundreds or even tens-of-thousands of curated data sets to support different users and projects, a single, current version of the data can be maintained securely on S3, dynamically provisioned, and centrally audited. You pay less because you no longer store redundant versions of data, and you reduce the opportunity for data copies going rogue and getting in the wrong hands.
For Amazon EMR specifically, using Okera can present significant cost savings by making it practical to run one multi-tenant EMR cluster to support a wide variety of users and tools with the confidence that data is securely accessed and used responsibly.