How to Read Azure Key Vault Data from Databricks

Data Geek
4 min readApr 10, 2023

Azure Key Vault is a cloud-based service that provides a secure way to store and manage cryptographic keys, secrets, and certificates. It is widely used to protect sensitive information such as passwords, API keys, and other credentials in cloud applications. On the other hand, Databricks is a cloud-based analytics platform that provides a collaborative workspace for data scientists, engineers, and analysts to build, train, and deploy machine learning models.

In this blog post, we will explore how to read Azure Key Vault data from Databricks. This will allow us to securely retrieve and use secrets in our Databricks notebooks and jobs.

Step 1: Create an Azure Key Vault

First, we need to create an Azure Key Vault in our Azure portal. This can be done by following these steps:

  1. Go to the Azure portal and log in with your credentials.
  2. Click on the “Create a resource” button and search for “Key Vault”.
  3. Click on the “Create” button and fill in the required fields such as the vault name, subscription, resource group, and location.
  4. Once the vault is created, go to the “Access policies” tab and add the appropriate access policies for your users and applications.

Step 2: Set up Azure Databricks

Next, we need to set up our Azure Databricks workspace to access the Azure Key Vault. This can be done by following these steps:

  1. Go to the Azure portal and navigate to your Databricks workspace.
  2. Click on the “Launch Workspace” button to open your Databricks workspace.
  3. Once the workspace is open, click on the “Secrets” tab and then on “Scoped to Azure Databricks”.
  4. Click on “Add” to add a new secret scope.
  5. Fill in the required fields, such as the scope name, subscription ID, resource group, and Azure Key Vault URL.
  6. Click on “Create” to create the secret scope.

Step 3: Access Azure Key Vault data from Databricks

Now that we have set up our Azure Key Vault and Databricks workspace, we can access the Key Vault data from our Databricks notebooks and jobs. This can be done by using the Databricks Secrets API to retrieve the secret values.

Here’s an example of how to retrieve a secret value in Python and Scala.

# Import the Databricks Secrets API
import databricks.secrets as secrets

# Retrieve the secret value from the Azure Key Vault
secret_value = secrets.get(scope="<scope-name>", key="<secret-name>")

# Use the secret value in your code
print(secret_value)
// Import the Databricks Secrets API
import com.databricks.dbutils_v1.DBUtilsHolder.dbutils.secrets

// Retrieve the secret value from the Azure Key Vault
val secretValue = secrets.get(scope = "<scope-name>", key = "<secret-name>")

// Use the secret value in your code
println(s"The secret value is: $secretValue")

In the above code, replace <scope-name> with the name of your secret scope and <secret-name> with the name of the secret, you want to retrieve.

Bonus:

If you are using Terraform(open-source infrastructure as code) to create infrastructure, the below code can be used to create a key vault and create new scope Databricks.

Step 1: Create an Azure Key Vault and secret

To create an Azure Key Vault resource and add a secret using Terraform, you can use the azurerm_key_vault and azurerm_key_vault_secret resources. Here is an example code snippet:

# Create Azure Key Vault resource
resource "azurerm_resource_group" "example_rg" {
name = "example-resource-group"
location = "eastus2"
}

resource "azurerm_key_vault" "example_kv" {
name = "example-key-vault"
location = azurerm_resource_group.example_rg.location
resource_group_name = azurerm_resource_group.example_rg.name

sku_name = "standard"
tenant_id = "YOUR_TENANT_ID"

access_policy {
tenant_id = "YOUR_TENANT_ID"
object_id = "YOUR_OBJECT_ID"
secret_permissions = [
"Get",
"List",
"Set",
"Delete"
]
}
}

# Create a secret
resource "azurerm_key_vault_secret" "example_secret" {
name = "example-secret"
value = "example-secret-value"
key_vault_id = azurerm_key_vault.example_kv.id
}

Step 2: Set up Azure Databricks

To create an Azure Databricks workspace using Terraform, you can use the azurerm_databricks_workspace resource. Here is an example code snippet:

resource "azurerm_resource_group" "example_rg" {
name = "example-resource-group"
location = "eastus2"
}

resource "azurerm_databricks_workspace" "example_workspace" {
name = "example-workspace"
location = azurerm_resource_group.example_rg.location
resource_group_name = azurerm_resource_group.example_rg.name

sku_name = "premium"
sku_tier = "trial"

managed_resource_group_id = "/subscriptions/YOUR_SUBSCRIPTION_ID/resourceGroups/example-managed-rg"
custom_parameters = {
example_parameter = "example-value"
}
}

Step 3: Add secret scope

To add a secret scope to Azure Databricks using Terraform, you can use the azurerm_databricks_secret_scope resource. Here is an example code snippet:

resource "azurerm_databricks_secret_scope" "example_scope" {
name = "example-scope"
resource_group_name = "example-resource-group"
location = "eastus2"
scope_backend_type = "AZURE_KEYVAULT"
backend_azure_keyvault {
keyvault_uri = "https://example-key-vault.vault.azure.net/"
service_principal {
tenant_id = "YOUR_TENANT_ID"
client_id = "YOUR_CLIENT_ID"
client_secret = "YOUR_CLIENT_SECRET"
}

In the above code, replace the values for name, resource_group_name, location, keyvault_uri, tenant_id, client_id, and client_secret with your own values.

This code creates a new secret scope called example-scope in the resource group example-resource-group in the eastus2 location. The scope backend type is set to AZURE_KEYVAULT, and the Azure Key Vault URI is specified. The service principal is also specified with the tenant ID, client ID, and client secret.

Once you have defined this resource in your Terraform configuration, you can apply the configuration to create the secret scope in your Azure Databricks workspace.

Conclusion

In this blog post, we explored how to read Azure Key Vault data from Databricks. By following the steps outlined above, you can securely retrieve and use secrets in your Databricks notebooks and jobs.

--

--

Data Geek

I write about Python, Unix, Data Engineering, and Automation Testing.