Who needs control?
This blog post continues the topic of Schema Design Best Practices and focuses on the access control view layer. Many of us at Okera have years to decades of IT and data processing backgrounds. Some of us were early Hadoop adopters all the way back to 2007. Initially the clusters were wide open and there was little in regards to access control available. You would use network boundaries and firewalls to restrict access to the data owners. Inside the cluster they could see and do whatever they wanted. And it was OK for the time being as those early users where hands-on and dev-ops savvy, owning the infrastructure and the data, delivering results at their convenience.
Unified Access Control
Schema Design
We recommend DBAs to follow our schema design best practices, which organizes the various dataset related tasks into layers. The lowest layer maps the raw data from their storage location into our registry as a base table. On top of that we recommend views that disconnects the user from the base tables, and which can be used to clean up issues and bring the dataset in line with corporate standards.The DBA should also use an access control view to implement any filtering or masking that needs to take place for users to see their expected results.
Access Control
Okera provides the commands to create roles which are then granted privileges, such as read access to a dataset. We also provide a set of built-in functions that can be used to further implement more complex access scenarios, which are evaluated at query time. Using the basic RBAC mechanisms, you can define roles and views, and subsequently grant access as necessary.
For example, you can create a base table and directly grant access to it.
CREATE ROLE analyst_role;
GRANT ROLE analyst_role TO GROUP analysts;
CREATE EXTERNAL TABLE salesdb.transactions_raw(
txnid BIGINT,
dt_time STRING,
sku STRING,
userid INT,
price FLOAT,
creditcard STRING,
ip STRING,
region STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ","
LOCATION "s3://acme-data/sales/transactions";
GRANT SELECT(txnid, dt_time, sku, price)
ON TABLE salesdb.transactions_raw TO ROLE analyst_role'; # Not recommended!
Example: Create a base table over files in S3 and grant limited access to it
CREATE VIEW salesdb.transactions AS
SELECT
txnid,
dt_time,
sku,
if (has_access('salesdb.transactions_raw'), userid, tokenize(userid)) as userid,
price,
if (has_access('salesdb.transactions_raw'), creditcard,mask_ccn(creditcard)) as creditcard,
if (has_access('salesdb.transactions_raw'), ip, cast(tokenize(ip) as STRING)) as ip,
region
FROM salesdb.transactions_raw;
GRANT SELECT ON TABLE salesdb.transactions TO ROLE analyst_role;
Example: Create an access control that filters based on access rights to the base table
CREATE VIEW salesdb.transactions AS
SELECT
txnid,
dt_time,
sku,
if (user() = "admin", userid, tokenize(userid)) AS userid,
price,
if (user() = "admin", creditcard, mask_ccn(creditcard)) AS creditcard,
if (user() = "admin", ip, cast(tokenize(ip) AS STRING)) AS ip
FROM salesdb.transactions_raw;
GRANT SELECT ON TABLE saledb.transactions TO ROLE analyst_role;
Example: Create a view that masks data based on the user accessing it
CREATE VIEW salesdb.transactions AS
SELECT
txnid,
dt_time,
sku,
if (has_roles('sales_admins'), userid, tokenize(userid)) AS userid,
price,
if (has_roles('sales_admins'), creditcard, mask_ccn(creditcard)) AS creditcard,
if (has_roles('sales_admins'), ip, cast(tokenize(ip) AS STRING)) AS ip
FROM salesdb.transactions_raw;
CREATE VIEW salesdb.sales_per_region AS
SELECT * FROM salesdb.transactions
WHERE has_roles('de_role') AND region = 'de' OR
has_roles('gbr_role') AND region = 'gbr';
Advanced Functionality
CREATE ROLE de;
CREATE ROLE gbr;
CREATE VIEW salesdb.sales_per_region AS
SELECT * FROM salesdb.transactions
WHERE has_roles(lower(region));
GRANT ROLE de TO GROUP analysts_de;
presto> select * from jdbc_demo.sales_per_region limit 5;
txnid | dt_time | sku | userid | price | creditcard | ip | region
------+---------------------+-------+--------+-------+---------------------+----------------------+--------
483 | 08/21/2016 09:49 AM | sku87 | 82364 | 810.0 | XXXX-XXXX-XXXX-6331 | -8418315997702349334 | DE
579 | 06/15/2016 05:05 AM | sku52 | 59072 | 400.0 | XXXX-XXXX-XXXX-5804 | -5706330635178565995 | DE
948 | 05/16/2016 02:05 PM | sku73 | 34515 | 670.0 | XXXX-XXXX-XXXX-2537 | 2315370975016102511 | DE
1465 | 03/17/2016 04:18 AM | sku50 | 46605 | 250.0 | XXXX-XXXX-XXXX-5932 | 2326400379827501408 | DE
2137 | 11/05/2016 07:23 AM | sku06 | 20626 | 540.0 | XXXX-XXXX-XXXX-9349 | -5031361145655993176 | DE
(5 rows)
Query 20180817_133516_00001_iixrd, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:01 [1.95K rows, 0B] [1.47K rows/s, 0B/s]