In this two part series we will focus on GDPR challenges facing organizations and how Okera can help solve them. This first post will cover Consent and the Right to be Forgotten. The second post will cover Pseudonymization.
- Consent and Right to be forgotten
Consent and Right to be forgotten
- Whitelists – A list of all record IDs of subjects that have given consent to the use of their data.
- Blacklists – A list of record IDs of subjects that have opted out of the use of their data.
1. First, we create datasets that point to transaction and activity data:
CREATE EXTERNAL TABLE demo.transactions(txnid BIGINT, dt_time STRING, sku STRING, userid INT, price FLOAT, creditcard STRING, ip STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," LOCATION "s3://cerebro-datasets/transactions" CREATE EXTERNAL TABLE demo.user_activity(dt_time STRING, userid INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," LOCATION "s3://cerebro-datasets/user_activity"
2. Next, we create a VIEW that filters out all inactive users:
CREATE VIEW demo.active_users AS SELECT * FROM demo.user_activity WHERE unix_timestamp(dt_time, "M/dd/yy H:mm") > unix_timestamp(months_sub(now(), 18))
3. Now, we JOIN the two datasets to filter the transactions by user activity and return only those where the respective user was active.
Note the GRANT statement to only allow members of the analyst role access to the VIEW:
CREATE VIEW demo.transactions_active_users AS SELECT t.txnid, u.dt_time AS last_active_time, t.dt_time AS transaction_time, t.sku, t.userid, t.price, t.creditcard, t.ip FROM demo.transactions t JOIN demo.active_users u ON t.userid = u.userid GRANT SELECT ON TABLE demo.transactions_active_users TO ROLE analyst_role
4. Lastly, we define a view that JOINs the transaction and user activity datasets to filter by inactive users. Here we use an OUTER JOIN to also include all transactions that had a user with no recorded activity at all. The VIEW also redacts all sensitive columns to make the result easier to share:
CREATE VIEW demo.transactions_anonymize_inactive_users AS SELECT t.txnid, decode(u.userid, NULL, "INACTIVE", u.dt_time) AS last_active_time, t.dt_time AS transaction_time, t.sku, decode(u.userid, NULL, "REDACTED", cast(t.userid AS STRING)) AS userid, t.price, decode(u.userid, NULL, "REDACTED", t.creditcard) AS ccn, decode(u.userid, NULL, "REDACTED", t.ip) AS ip FROM demo.transactions t LEFT OUTER JOIN demo.active_users u on t.userid = u.userid GRANT SELECT ON TABLE demo.transactions_anonymize_inactive_users TO ROLE Analyst_role