Simplify fine-grained data access control for Amazon EMR

Enjoy the cost savings and flexibility you want from Amazon EMR and the sensitive data protection you need from Okera.

amazon-emr

Okera helps the world’s largest organizations analyze big data safely, securely, and responsibly.

Struggling to use big data responsibly at the scale and velocity required to innovate?

Okera nScale co-locates on your Amazon EMR cluster, so no matter how big your data lake, or how many compute nodes spin up, Okera hums along to protect every query.

AIRSIDE 2022 KEYNOTE

FINRA’s mission: safeguard financial markets.

Their challenges: expand analytics, manage costs, and enforce data security and privacy at big data scale.

Big Data Presents Unique Security Challenges

The separation of storage and compute is one of the most impactful and consequential innovations in modern computing. But the separation introduces a data security gap. Without an integrated database, where do you define data access controls? Nowhere? Everywhere?

Ambiguity is risk. It holds back companies who want to migrate sensitive data workloads to the cloud, but are reluctant because they don’t know if they’ll be able to comply with data security and privacy regulations.

Solve Your Hardest Data Access Governance Problem

With Okera, you can have it all: the agility of cloud computing, cost benefits of separation of storage and compute, collaboration with non-technical data stakeholders to accelerate compliance with data privacy regulations, and better security at lower effort.

  • Advanced yet simple-to-manage data access management
  • Centralized IT control with the ability to delegate authority and accountability to business, security, and privacy stakeholders
  • Powerfully simple row-level security dramatically reduces policy complexity
  • Fine-grained access control (FGAC) down to the column, row, and cell level
  • Attribute-based access control (ABAC) reduces errors and enables economy of scale

With Okera, data policies are separate from data compute, which is separate from data storage. Create and manage platform-agnostic policies in Okera, configure EMR to bootstrap the nScale enforcement fleet, and you’re done!

Okera nScale for Amazon EMR

Okera nScale is a distributed data policy enforcement fleet that runs on Amazon EMR. It is a data security control layer that operates between your S3 data lake and popular compute frameworks such as Spark, Hive, and Presto.

Transparent to users

Seamlessly supports the most demanding EMR workloads

Bootstraps with your cluster

Ideal solution for ephemeral workloads and those with unpredictable scaling requirements

Broad compatibility and version support

Enables more business use cases on a cost-effective and powerful platform

amazon-emr
Apache Spark Logo
HIVE logo
Presto Icon
TRINO_Overview

Prior to Okera, data authorizations were organic and inconsistent. Okera helps us bring everything to the center.

Nate Weisz

Senior Director of Data Management, FINRA - Financial Industry Regulatory Authority

VIDEO: Okera nScale for Amazon EMR

See Okera nScale run colocated on an EMR cluster:

  • S3 buckets are locked down following a zero-trust model
  • Queries are issued using Spark and Presto
  • PII (personally identifiable information) is protected through filters and transformations
  • Run time: 00:05:50

Zero Trust in Practice

With Okera, you can implement zero trust: simply deny EMR clusters all access to S3. No more managing complex IAM roles for each cluster or reconciling user roles.

Fewer configuration requirements means fewer opportunities for error.

Secure Data Access Isolation

Your compute engine (Spark, Hive, Presto) receives user query requests, and through a lightweight plugin reaches out to the Okera policy engine off-cluster for authorization.

The Okera policy engine vends temporary credentials to nScale — not the compute engine. nScale processes are co-located on-node with Spark, Hive, or Presto workers. User code — including custom UDFs — never touch the data lake.

Data access is delegated to Okera nScale so it can securely retrieve specific S3 buckets. Within this isolated process nScale applies data authorization policies, such as dynamic row-level filters, hiding columns, and data tokenization and masking.

nScale then streams cleaned, authorized data to the compute workers for analytics and business logic processing.

Co-Location for Extreme Elasticity and Performance

Okera nScale co-location provides the elastic scalability needed for big data environments.

Simply bootstrap nScale to load as your Amazon EMR cluster scales up, and terminates along with nodes that scale down. nScale remains in perfect sync on each node for exceptional performance and to support extreme elasticity.

Cost Savings & Reduced Attack Surface

Instead of replicating data into multiple security zones, with Okera you can maintain a single authoritative version of your data.

You pay less because you reduce redundancy and operating costs.

You also minimize risk because fewer data copies means a smaller attack surface and less opportunity for data to get into the wrong hands.

Compare Okera nScale with EMR Record Server

Okera is an AWS Advanced Technology Partner.

Okera nScale and Amazon EMR Record Server address the problem of secure data access at scale.

Both use a distributed enforcement fleet that is purpose-built to enforce data policies. The fleet receive temporary credentials to retrieve data from S3 buckets, then pre-processes data for security before sending cleaned data to the compute engine.

See how Okera nScale and Amazon EMR Record Server are different.