Simplify attribute-based and fine-grained data access control
Enjoy the cost savings and flexibility you want from Amazon EMR and the sensitive data protection you need from Okera.
Okera helps the world’s largest organizations analyze big data safely, securely, and responsibly.
Struggling to use big data responsibly at the scale and velocity required to innovate?
Okera nScale co-locates on your Amazon EMR cluster, so no matter how big your data lake, or how many compute nodes spin up, Okera hums along to protect every query.
Big Data Presents Unique Security Challenges
The separation of storage and compute is one of the most impactful and consequential innovations in modern computing. But the separation introduces a data security gap. Without an integrated database, where do you define data access controls? Nowhere? Everywhere?
Ambiguity is risk. It holds back companies who want to migrate sensitive data workloads to the cloud, but are reluctant because they don’t know if they’ll be able to comply with data security and privacy regulations.
Solve Your Hardest Data Access Governance Problem
With Okera, you can have it all: the agility of cloud computing, cost benefits of separation of storage and compute, collaboration with non-technical data stakeholders to accelerate compliance with data privacy regulations, and better security at lower effort.
- Advanced yet simple-to-manage data access management
- Centralized IT control with the ability to delegate authority and accountability to business, security, and privacy stakeholders
- Powerfully simple row-level security dramatically reduces policy complexity
- Fine-grained access control (FGAC) down to the column, row, and cell level
- Attribute-based access control (ABAC) reduces errors and enables economy of scale
With Okera, data policies are separate from data compute, which is separate from data storage. Create and manage platform-agnostic policies in Okera, configure EMR to bootstrap the nScale enforcement fleet, and you’re done!
Okera nScale for Amazon EMR
Okera nScale is a distributed data policy enforcement fleet that runs on Amazon EMR. It is a data security control layer that operates between your S3 data lake and popular compute frameworks such as Spark, Hive, and Presto.
With OkeraEnsemble, FINRA can look to enable self-service data analytics by providing highly secure access to a wide variety of structured and unstructured data through latest generation analytical platforms. And, we can envision protecting the data from our enterprise data lake and gain the benefits of centralized entitlement policies and audit trails across hundreds of petabytes of data without having to write complex IAM policies.
Senior Principal Architect , FINRA
Strategic companies like Okera are providing a tremendous value-add for our customers who rely on Amazon EMR for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto. Thanks to solutions like OkeraEnsemble on Amazon EMR, our customers can accelerate time to value with Amazon EMR and Amazon SageMaker by leveraging both structured and unstructured Amazon S3 data files quickly and easily while ensuring security and governance at scale that work well with existing AWS services.
General Manager, Amazon EMR at AWS
Prior to Okera, data authorizations were organic and inconsistent. Okera helps us bring everything to the center.
Senior Director of Data Management, FINRA - Financial Industry Regulatory Authority
Zero Trust in Practice
With Okera, you can implement zero trust: simply deny EMR clusters all access to S3. No more managing complex IAM roles for each cluster or reconciling user roles.
Fewer configuration requirements means fewer opportunities for error.
Secure Data Access Isolation
Your compute engine (Spark, Hive, Presto) receives user query requests, and through a lightweight plugin reaches out to the Okera policy engine off-cluster for authorization.
The Okera policy engine vends temporary credentials to nScale — not the compute engine. nScale processes are co-located on-node with Spark, Hive, or Presto workers. User code — including custom UDFs — never touch the data lake.
Data access is delegated to Okera nScale so it can securely retrieve specific S3 buckets. Within this isolated process nScale applies data authorization policies, such as dynamic row-level filters, hiding columns, and data tokenization and masking.
nScale then streams cleaned, authorized data to the compute workers for analytics and business logic processing.
Co-Location for Extreme Elasticity and Performance
Okera nScale co-location provides the elastic scalability needed for big data environments.
Simply bootstrap nScale to load as your Amazon EMR cluster scales up, and terminates along with nodes that scale down. nScale remains in perfect sync on each node for exceptional performance and to support extreme elasticity.
Cost Savings & Reduced Attack Surface
Instead of replicating data into multiple security zones, with Okera you can maintain a single authoritative version of your data.
You pay less because you reduce redundancy and operating costs.
You also minimize risk because fewer data copies means a smaller attack surface and less opportunity for data to get into the wrong hands.
Compare Okera nScale with EMR Record Server
Okera is an AWS Advanced Technology Partner.
Okera nScale and Amazon EMR Record Server address the problem of secure data access at scale.
Both use a distributed enforcement fleet that is purpose-built to enforce data policies. The fleet receive temporary credentials to retrieve data from S3 buckets, then pre-processes data for security before sending cleaned data to the compute engine.
See how Okera nScale and Amazon EMR Record Server are different.
BLOG: Extend Policies to Amazon S3 and Elastically Scale Provisioning with Amazon EMR Clusters
Grant access to data analysts and scientists while restricting access to sensitive and governed data
WHITEPAPER: File Access Control with ABAC and nScale on Amazon EMR
Use one tool to authorize access to files with structured and unstructured data on an object store (S3) through attribute-based access control (ABAC). Natively scale on Amazon EMR.
WEBINAR: Simplify Fine-grained Data Access Control for Amazon EMR
Learn what works and what doesn't, plus watch a live demo using Spark and Presto!