Simplify fine-grained data access control for Amazon EMR
Enjoy the cost savings and flexibility you want from Amazon EMR and the sensitive data protection you need from Okera.
Okera helps the world’s largest organizations analyze big data safely, securely, and responsibly.
Struggling to use big data responsibly at the scale and velocity required to innovate?
Okera nScale co-locates on your Amazon EMR cluster, so no matter how big your data lake, or how many compute nodes spin up, Okera hums along to protect every query.
AIRSIDE 2022 KEYNOTE
FINRA’s mission: safeguard financial markets.
Their challenges: expand analytics, manage costs, and enforce data security and privacy at big data scale.
Big Data Presents Unique Security Challenges
The separation of storage and compute is one of the most impactful and consequential innovations in modern computing. But the separation introduces a data security gap. Without an integrated database, where do you define data access controls? Nowhere? Everywhere?
Ambiguity is risk. It holds back companies who want to migrate sensitive data workloads to the cloud, but are reluctant because they don’t know if they’ll be able to comply with data security and privacy regulations.
Solve Your Hardest Data Access Governance Problem
With Okera, you can have it all: the agility of cloud computing, cost benefits of separation of storage and compute, collaboration with non-technical data stakeholders to accelerate compliance with data privacy regulations, and better security at lower effort.
- Advanced yet simple-to-manage data access management
- Centralized IT control with the ability to delegate authority and accountability to business, security, and privacy stakeholders
- Powerfully simple row-level security dramatically reduces policy complexity
- Fine-grained access control (FGAC) down to the column, row, and cell level
- Attribute-based access control (ABAC) reduces errors and enables economy of scale
With Okera, data policies are separate from data compute, which is separate from data storage. Create and manage platform-agnostic policies in Okera, configure EMR to bootstrap the nScale enforcement fleet, and you’re done!
Okera nScale for Amazon EMR
Okera nScale is a distributed data policy enforcement fleet that runs on Amazon EMR. It is a data security control layer that operates between your S3 data lake and popular compute frameworks such as Spark, Hive, and Presto.
Prior to Okera, data authorizations were organic and inconsistent. Okera helps us bring everything to the center.
Senior Director of Data Management, FINRA - Financial Industry Regulatory Authority
VIDEO: Okera nScale for Amazon EMR
See Okera nScale run colocated on an EMR cluster:
- S3 buckets are locked down following a zero-trust model
- Queries are issued using Spark and Presto
- PII (personally identifiable information) is protected through filters and transformations
- Run time: 00:05:50
Zero Trust in Practice
With Okera, you can implement zero trust: simply deny EMR clusters all access to S3. No more managing complex IAM roles for each cluster or reconciling user roles.
Fewer configuration requirements means fewer opportunities for error.
Secure Data Access Isolation
Your compute engine (Spark, Hive, Presto) receives user query requests, and through a lightweight plugin reaches out to the Okera policy engine off-cluster for authorization.
The Okera policy engine vends temporary credentials to nScale — not the compute engine. nScale processes are co-located on-node with Spark, Hive, or Presto workers. User code — including custom UDFs — never touch the data lake.
Data access is delegated to Okera nScale so it can securely retrieve specific S3 buckets. Within this isolated process nScale applies data authorization policies, such as dynamic row-level filters, hiding columns, and data tokenization and masking.
nScale then streams cleaned, authorized data to the compute workers for analytics and business logic processing.
Co-Location for Extreme Elasticity and Performance
Okera nScale co-location provides the elastic scalability needed for big data environments.
Simply bootstrap nScale to load as your Amazon EMR cluster scales up, and terminates along with nodes that scale down. nScale remains in perfect sync on each node for exceptional performance and to support extreme elasticity.
Cost Savings & Reduced Attack Surface
Instead of replicating data into multiple security zones, with Okera you can maintain a single authoritative version of your data.
You pay less because you reduce redundancy and operating costs.
You also minimize risk because fewer data copies means a smaller attack surface and less opportunity for data to get into the wrong hands.
Compare Okera nScale with EMR Record Server
Okera is an AWS Advanced Technology Partner.
Okera nScale and Amazon EMR Record Server address the problem of secure data access at scale.
Both use a distributed enforcement fleet that is purpose-built to enforce data policies. The fleet receive temporary credentials to retrieve data from S3 buckets, then pre-processes data for security before sending cleaned data to the compute engine.
See how Okera nScale and Amazon EMR Record Server are different.
WEBINAR: Simplify Fine-grained Data Access Control for Amazon EMR
Learn what works and what doesn't, plus watch a live demo using Spark and Presto!
IN THE NEWS: Big Data Analytics: Top Three Data Security Mistakes
Properly securing data lakes and complying with privacy regulations are common Fortune 500 board-level concerns and for good reason. However, most organizations still struggle to use data responsibly at the scale and velocity required to innovate. I spoke recently with a C-suite technologist in the financial services industry who flat out said, “data lakes scare me.”
MIT CDOIQ 2021: An Inside Look at FINRA’s Massive Data & Analytics Architecture
Aaron Carreras, Vice President of Data Management and Transparency Services Technology, and Nate Weisz, Senior Director of Data Management, shared an in-depth look at FINRA's data and analytics architecture.