Simplify fine-grained data access control for Amazon EMR

Enjoy the cost savings and flexibility you want from Amazon EMR and the sensitive data protection you need from Okera.

Maximize the value of Amazon EMR while protecting confidential, personally identifiable, and regulated data

Dynamically filter, hide, mask, tokenize, and anonymize sensitive data as users work with Amazon S3 data. Okera universal data authorization is complete, clear, and consistent across big data frameworks such as Spark, Hive, and Presto.


Implementing data authorization with Okera becomes easier, not harder, over time. Okera’s focus on Attribute Based Access Control (ABAC) and dynamic row-level filtering dramatically reduces complexity. With Okera, for example, you can reduce hundreds of policies down to a handful. Our commitment to API-first development also makes Okera ideally suited to automation. Okera delivers an economy of scale that you simply can’t achieve with alternative approaches.

implement privacy icon


Dynamically apply row-level filters, and mask, minimize, and de-identify data so each query complies with personal privacy regulations and security mandates.

enforce all policies icon


Okera’s data authorization policies are managed, enforced, and audited on a unified platform. There are no gaps or inconsistencies, no complex policy synchronization to manage, and no plugin inconsistencies to work around.


Okera’s data authorization policies are easy to write and manage because they are agnostic to the data platform. Define no-code policies through an intuitive web UI or programmatically via API. Once you have a policy in place, simply register the data to be governed.


All data access requests are automatically logged for every individual user, down to the exact query, timestamp, access method, sensitive data attribute, and whether requests are approved or denied. Every administrative task is also audited, so you know when policy and platform changes are made, and by whom.

Solution At A Glance

Take an agile approach to data authorization. Start with the end in mind. Complete one use case and accelerate your expansion as you learn from the Data Usage Intelligence that Okera provides.

1. Set business requirements: The security and governance teams contribute sensitive data attributes and data authorization policies. Learn about Distributed Stewardship.

2. Classify and register: Leverage machine learning to classify your S3 data. Then the data owner registers data with the policies defined in step 1.

3. Bootstrap: The platform owner configures Amazon EMR to enforce data authorization policies using Okera nScale.

4. Dynamic authorization: Users work with the Amazon EMR cluster as normal, using their preferred tools, and Okera dynamically authorizes each request.

5. Iterate: Optimize, improve, and expand with real-world data usage intelligence.


How to enforce and scale fine-grained access control on Amazon EMR: demo and real-life use cases

How nScale Policy Enforcement Works on Amazon EMR

nScale Process Isolation for Security

Okera intercepts data requests sent to Spark, Hive, or Presto and authorizes queries off-cluster using metadata stored in Amazon Glue Data Catalog or Hive Metastore.

Authorized data access requests are then delegated to Okera nScale. Okera nScale is a very high-performance, distributed data access control layer that runs co-located on the EMR cluster in a secure, isolated process. Okera nScale receives temporary privileges to directly access data in S3 for the exclusive purpose of applying data authorization policies, such as applying row-level filters, hiding columns, masking data, and so on.

Okera nScale streams the authorized data on-cluster to the Spark, Hive, or Presto framework for compute processing.

nScale Co-Location for Elasticity and Performance

Okera nScale co-location provides the elastic scalability needed for big data environments. As the secure data access control layer, it’s important that nScale remains in perfect sync on each node with your compute frameworks, such as Spark, Hive, or Presto. Simply bootstrap nScale to load as your Amazon EMR cluster scales up and down. Okera nScale handles secure data access and your compute framework takes care of the rest. It’s that easy.

Zero Trust Approach

With nScale process isolation, user code does not access S3, and thus does not have access to full fidelity data. Okera allows organizations to successfully implement a Zero Trust approach to data access, where the cluster does not have IAM access to S3. Instead, Okera provisions data dynamically and in the approved format according to current policies.

Cost Savings & Reduced Attack Surface

In practical economic terms, Okera can deliver significant cost reductions in AWS storage and processing fees. Okera can also significantly reduce the attack surface.

How? Instead of replicating and managing hundreds or even tens-of-thousands of curated data sets to support different users and projects, a single, current version of the data can be maintained securely on S3, dynamically provisioned, and centrally audited. You pay less because you no longer store redundant versions of data, and you reduce the opportunity for data copies going rogue and getting in the wrong hands.

For Amazon EMR specifically, using Okera can present significant cost savings by making it practical to run one multi-tenant EMR cluster to support a wide variety of users and tools with the confidence that data is securely accessed and used responsibly.

Complexity is the Enemy of Security:
Why Choose Okera for Amazon EMR

Complexity is the enemy of security. If managing data authorization gets harder over time, you have a security gap, not a working solution. Alternative approaches are unnecessarily complicated to set up, and become increasingly unwieldy when adding more data, users, use cases, and tools.
Okera’s approach makes it easier to use over time. Registering new data, onboarding new users, and updating policies can be automated and validated within minutes or even seconds. Choose Okera for complete, consistent, and clear universal data authorization that works at scale as demand increases and business requirements evolve.