Who needs control?
This blog post continues the topic of Schema Design Best Practices and focuses on the access control view layer. Many of us at Okera have years to decades of IT and data processing backgrounds. Some of us were early Hadoop adopters all the way back to 2007. Initially the clusters were wide open and there was little in regards to access control available. You would use network boundaries and firewalls to restrict access to the data owners. Inside the cluster they could see and do whatever they wanted. And it was OK for the time being as those early users where hands-on and dev-ops savvy, owning the infrastructure and the data, delivering results at their convenience.
Unified Access Control
Schema Design
Access Control
CREATE
ROLE analyst_role;
GRANT
ROLE analyst_role
TO
GROUP
analysts;
CREATE
EXTERNAL
TABLE
salesdb.transactions_raw(
txnid
BIGINT
,
dt_time STRING,
sku STRING,
userid
INT
,
price
FLOAT
,
creditcard STRING,
ip STRING,
region STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED
BY
","
GRANT
SELECT
(txnid, dt_time, sku, price)
ON
TABLE
salesdb.transactions_raw
TO
ROLE analyst_role'; #
Not
recommended!
GRANT
SELECT
(txnid, dt_time, sku, price)
ON
TABLE
salesdb.transactions_raw
TO
ROLE analyst_role'; #
Not
recommended!
has_access()
function to check a related resource, like the underlying dataset. Combined with the SQL if()
function, you can dynamically return the original column value or an obfuscated version of it.CREATE
VIEW
salesdb.transactions
AS
SELECT
txnid,
dt_time,
sku,
if (has_access(
'salesdb.transactions_raw'
), userid, tokenize(userid))
as
userid,
price,
if (has_access(
'salesdb.transactions_raw'
), creditcard,mask_ccn(creditcard))
as
creditcard,
if (has_access(
'salesdb.transactions_raw'
), ip,
cast
(tokenize(ip)
as
STRING))
as
ip,
region
FROM
salesdb.transactions_raw;
GRANT
SELECT
ON
TABLE
salesdb.transactions
TO
ROLE analyst_role;
Example: Create an access control that filters based on access rights to the base table
CREATE
VIEW
salesdb.transactions
AS
SELECT
txnid,
dt_time,
sku,
if (
user
() =
"admin"
, userid, tokenize(userid))
AS
userid,
price,
if (
user
() =
"admin"
, creditcard, mask_ccn(creditcard))
AS
creditcard,
if (
user
() =
"admin"
, ip,
cast
(tokenize(ip)
AS
STRING))
AS
ip
FROM
salesdb.transactions_raw;
GRANT
SELECT
ON
TABLE
saledb.transactions
TO
ROLE analyst_role;
Example: Create a view that masks data based on the user accessing it
has_roles()
function and make decision based on the role instead.CREATE
VIEW
salesdb.transactions
AS
SELECT
txnid,
dt_time,
sku,
if (has_roles(
'sales_admins'
), userid, tokenize(userid))
AS
userid,
price,
if (has_roles(
'sales_admins'
), creditcard, mask_ccn(creditcard))
AS
creditcard,
if (has_roles(
'sales_admins'
), ip,
cast
(tokenize(ip)
AS
STRING))
AS
ip
FROM
salesdb.transactions_raw;
has_roles()
really helps is in selections. For example, you can combine the above view with another that filters out rows that are outside of the roles region.CREATE
VIEW
salesdb.sales_per_region
AS
SELECT
*
FROM
salesdb.transactions
WHERE
has_roles(
'de_role'
)
AND
region =
'de'
OR
has_roles(
'gbr_role'
)
AND
region =
'gbr'
;
Advanced Functionality
region
, which includes a country code.We can make use of that fact and filter rows based on roles that match those regions.CREATE
ROLE de;
CREATE
ROLE gbr;
CREATE
VIEW
salesdb.sales_per_region
AS
SELECT
*
FROM
salesdb.transactions
WHERE
has_roles(
lower
(region));
GRANT
ROLE de
TO
GROUP
analysts_de;
presto>
select
*
from
jdbc_demo.sales_per_region limit 5;
txnid | dt_time | sku | userid | price | creditcard | ip | region
------+---------------------+-------+--------+-------+---------------------+----------------------+--------
483 | 08/21/2016 09:49 AM | sku87 | 82364 | 810.0 | XXXX-XXXX-XXXX-6331 | -8418315997702349334 | DE
579 | 06/15/2016 05:05 AM | sku52 | 59072 | 400.0 | XXXX-XXXX-XXXX-5804 | -5706330635178565995 | DE
948 | 05/16/2016 02:05 PM | sku73 | 34515 | 670.0 | XXXX-XXXX-XXXX-2537 | 2315370975016102511 | DE
1465 | 03/17/2016 04:18 AM | sku50 | 46605 | 250.0 | XXXX-XXXX-XXXX-5932 | 2326400379827501408 | DE
2137 | 11/05/2016 07:23 AM | sku06 | 20626 | 540.0 | XXXX-XXXX-XXXX-9349 | -5031361145655993176 | DE
(5
rows
)
Query 20180817_133516_00001_iixrd, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:01 [1.95K
rows
, 0B] [1.47K
rows
/s, 0B/s]