Sunday, March 29, 2020

Row Level Security for Data Bricks Metastore Database



Problem Statement:

Team decided to maintain a common metastore database for all the source deployed in Azure Data Lake. Since the database is common across all the compute, it will allow developer to create\drop\modify tables in a database which team doesn’t own it. We need to ensure proper governance is maintained to prevent developer not to make any change to database which team doesn’t own it.

Solution Diagram:


Understand how Metastore Table are used internally:

Metastore is the central repository. It stores metadata for tables (like their schema and location) and partitions in a relational database. It provides client access to this information by using metastore service API.

                      dbo.DBS and dbo.TBLS are the two tables used by metastore whenever we create any table from any compute. DB_ID is the foreign key which used to validate between DBS and TBLS table.


What is ROW Level Security?


Row-Level Security enables you to use group membership or execution context to control access to rows in a database table. Row-Level Security (RLS) simplifies the design and coding of security in your application. RLS helps you implement restrictions on data row access.



RLS supports two types of security predicates:
·       Filter predicates silently filter the rows available to read operations (SELECT, UPDATE, and DELETE).
We’re not going to implement this predicate because we want everyone to read other source system tables.

·       Block predicates explicitly block write operations (AFTER INSERT, AFTER UPDATE, BEFORE UPDATE, BEFORE DELETE) that violate the predicate.
We’re going to implement this block predicate and it will prevent developer to create table which they don’t own it.


What is Databricks Cluster Username?

By Default, Databricks cluster use ROOT as a user and all the session request to metastore will get established with ROOT only.




Procedure to apply this security:


Step 1: Create Schema to make sure it is separate from other purpose
                               create schema security;
go
Step 2: Create a function which accept UserName and DataBase ID as a parameter. And validate those value in DBS table. If it is valid it will return 1 else NULL
       create function [security].[fn_DBsecuritypredicate](@User as                  sysname,@db_id bigint)
RETURNS TABLE
with schemabinding
as
return select 1 as fn_DBsecuritypredicate_result from [dbo].[DBS]
where @user = [OWNER_NAME] and @db_id = [DB_ID];
GO

Step 3: Create a security policy to block the insert when the above condition is not satisfied
CREATE SECURITY POLICY DB_CHECK
ADD BLOCK PREDICATE security.fn_DBsecuritypredicate(owner,DB_ID) ON [dbo].[TBLS] AFTER INSERT
with (state=on)





Conclusion:

Post security policy implementation. I'm able to prevent use to create table in database which they don't own.



No comments:

Call Data bricks Job Using REST API

Below power shell will help to call Data bricks Job with parameter  [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]...