Simplify fine-grained data access control for Amazon EMR
Enjoy the cost savings and flexibility you want from Amazon EMR and the sensitive data protection you need from Okera.
Maximize the value of Amazon EMR while protecting confidential, personally identifiable, and regulated data
Dynamically filter, hide, mask, tokenize, and anonymize sensitive data as users work with Amazon S3 data. Okera universal data authorization is complete, clear, and consistent across big data frameworks such as Spark, Hive, and Presto.
Implementing data authorization with Okera becomes easier, not harder, over time. Okera’s focus on Attribute Based Access Control (ABAC) and dynamic row-level filtering dramatically reduces complexity. With Okera, for example, you can reduce hundreds of policies down to a handful. Our commitment to API-first development also makes Okera ideally suited to automation. Okera delivers an economy of scale that you simply can’t achieve with alternative approaches.
Solution At A Glance
Take an agile approach to data authorization. Start with the end in mind. Complete one use case and accelerate your expansion as you learn from the Data Usage Intelligence that Okera provides.
1. Set business requirements: The security and governance teams contribute sensitive data attributes and data authorization policies. Learn about Distributed Stewardship.
2. Classify and register: Leverage machine learning to classify your S3 data. Then the data owner registers data with the policies defined in step 1.
3. Bootstrap: The platform owner configures Amazon EMR to enforce data authorization policies using Okera nScale.
4. Dynamic authorization: Users work with the Amazon EMR cluster as normal, using their preferred tools, and Okera dynamically authorizes each request.
5. Iterate: Optimize, improve, and expand with real-world data usage intelligence.
How nScale Policy Enforcement Works on Amazon EMR
nScale Process Isolation for Security
Okera intercepts data requests sent to Spark, Hive, or Presto and authorizes queries off-cluster using metadata stored in Amazon Glue Data Catalog or Hive Metastore.
Authorized data access requests are then delegated to Okera nScale. Okera nScale is a very high-performance, distributed data access control layer that runs co-located on the EMR cluster in a secure, isolated process. Okera nScale receives temporary privileges to directly access data in S3 for the exclusive purpose of applying data authorization policies, such as applying row-level filters, hiding columns, masking data, and so on.
Okera nScale streams the authorized data on-cluster to the Spark, Hive, or Presto framework for compute processing.
nScale Co-Location for Elasticity and Performance
Okera nScale co-location provides the elastic scalability needed for big data environments. As the secure data access control layer, it’s important that nScale remains in perfect sync on each node with your compute frameworks, such as Spark, Hive, or Presto. Simply bootstrap nScale to load as your Amazon EMR cluster scales up and down. Okera nScale handles secure data access and your compute framework takes care of the rest. It’s that easy.
Zero Trust Approach
With nScale process isolation, user code does not access S3, and thus does not have access to full fidelity data. Okera allows organizations to successfully implement a Zero Trust approach to data access, where the cluster does not have IAM access to S3. Instead, Okera provisions data dynamically and in the approved format according to current policies.
Cost Savings & Reduced Attack Surface
In practical economic terms, Okera can deliver significant cost reductions in AWS storage and processing fees. Okera can also significantly reduce the attack surface.
How? Instead of replicating and managing hundreds or even tens-of-thousands of curated data sets to support different users and projects, a single, current version of the data can be maintained securely on S3, dynamically provisioned, and centrally audited. You pay less because you no longer store redundant versions of data, and you reduce the opportunity for data copies going rogue and getting in the wrong hands.
For Amazon EMR specifically, using Okera can present significant cost savings by making it practical to run one multi-tenant EMR cluster to support a wide variety of users and tools with the confidence that data is securely accessed and used responsibly.
Complexity is the Enemy of Security:
Why Choose Okera for Amazon EMR
Complexity is the enemy of security. If managing data authorization gets harder over time, you have a security gap, not a working solution. Alternative approaches are unnecessarily complicated to set up, and become increasingly unwieldy when adding more data, users, use cases, and tools.
Okera’s approach makes it easier to use over time. Registering new data, onboarding new users, and updating policies can be automated and validated within minutes or even seconds. Choose Okera for complete, consistent, and clear universal data authorization that works at scale as demand increases and business requirements evolve.