Skip to main content

Azure Blob Storage

Overview

This destination writes data to Azure Blob Storage.

The Airbyte Azure Blob Storage destination allows you to sync data to Azure Blob Storage. Each stream is written to its own blob under the container, as <stream_namespace>/<stream_name>/yyyy_mm_dd_<unix_epoch>_<part_number>.<file_extension>.

Network access

If you're using Airbyte Cloud and this destination uses IP-based access controls, add Airbyte's IP addresses to your allowlist.

Supported sync modes

Sync modeSupported?
Full Refresh - OverwriteYes
Full Refresh - AppendYes
Full Refresh - Overwrite + DedupedNo
Incremental Sync - AppendYes
Incremental Sync - Append + DedupedNo

Configuration

ParameterTypeNotes
Azure Blob Storage Endpoint Domain NamestringAzure Blob Storage endpoint domain name. Leave the default value, blob.core.windows.net, to use the standard Azure public cloud endpoint.
Azure Blob Storage Container NamestringName of an existing Azure Blob Storage container. Create this container before you configure the destination.
Azure Blob Storage Account NamestringName of the Azure Storage account.
Azure Blob Storage Account KeystringAzure Blob Storage account key. If this is set, the Shared Access Signature, Azure Tenant ID, Azure Client ID, and Azure Client Secret fields must not be set.
Shared Access SignaturestringAzure Blob Storage shared access signature (SAS). If this is set, the Azure Blob Storage Account Key, Azure Tenant ID, Azure Client ID, and Azure Client Secret fields must not be set.
Azure Tenant IDstringAzure Active Directory (Entra ID) tenant ID. Required for Entra ID authentication. If this is set, Azure Client ID and Azure Client Secret must also be set.
Azure Client IDstringAzure Active Directory (Entra ID) client ID. Required for Entra ID authentication. If this is set, Azure Tenant ID and Azure Client Secret must also be set.
Azure Client SecretstringAzure Active Directory (Entra ID) client secret. Required for Entra ID authentication. If this is set, Azure Tenant ID and Azure Client ID must also be set.
Azure Blob Storage Target Blob Size (MB)integerHow large each blob should be, in megabytes. Example: 500. After a blob exceeds this size, the connector starts writing to a new blob and increments the part number. Enter 0 to disable this behavior.
FormatobjectFormat-specific configuration. See Output Schema for details.

Output Schema

CSV

Like most other Airbyte destination connectors, the output contains your data, along with some metadata fields. If you select the "root level flattening" option, your data will be promoted to additional columns; if you select "no flattening", your data will be left as a JSON blob inside the _airbyte_data column.

For example, given the following JSON object from a source:

{
"user_id": 123,
"name": {
"first": "John",
"last": "Doe"
}
}

With no flattening, the output CSV is:

_airbyte_raw_id_airbyte_extracted_at_airbyte_generation_id_airbyte_meta_airbyte_data
26d73cde-7eb1-4e1e-b7db-a4c03b4cf206162213580500011{"changes":[], "sync_id": 10111 }{ "user_id": 123, name: { "first": "John", "last": "Doe" } }

With root level flattening, the output CSV is:

_airbyte_raw_id_airbyte_extracted_at_airbyte_generation_id_airbyte_metauser_idname.firstname.last
26d73cde-7eb1-4e1e-b7db-a4c03b4cf206162213580500011{"changes":[], "sync_id": 10111 }123JohnDoe

JSON Lines (JSONL)

JSON Lines is a text format with one JSON per line. As with the CSV format, this connector will write your data along with some metadata fields. You can enable "root level flattening" to promote your data to the root of the JSON object, or use "no flattening" to leave your data inside the _airbyte_data object.

For example, given the following two JSON object from a source:

{
"user_id": 123,
"name": {
"first": "John",
"last": "Doe"
}
}
{
"user_id": 456,
"name": {
"first": "Jane",
"last": "Roe"
}
}

With no flattening, the output JSONL is:

{ "_airbyte_raw_id": "26d73cde-7eb1-4e1e-b7db-a4c03b4cf206", "_airbyte_extracted_at": "1622135805000", "_airbyte_generation_id": "11", "_airbyte_meta": { "changes": [], "sync_id": 10111 }, "_airbyte_data": { "user_id": 123, "name": { "first": "John", "last": "Doe" } } }
{ "_airbyte_raw_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_extracted_at": "1631948170000", "_airbyte_generation_id": "12", "_airbyte_meta": { "changes": [], "sync_id": 10112 }, "_airbyte_data": { "user_id": 456, "name": { "first": "Jane", "last": "Roe" } } }

With root level flattening, the output JSONL is:

{ "_airbyte_raw_id": "26d73cde-7eb1-4e1e-b7db-a4c03b4cf206", "_airbyte_extracted_at": "1622135805000", "_airbyte_generation_id": "11", "_airbyte_meta": { "changes": [], "sync_id": 10111 }, "user_id": 123, "name": { "first": "John", "last": "Doe" } }
{ "_airbyte_raw_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_extracted_at": "1631948170000", "_airbyte_generation_id": "12", "_airbyte_meta": { "changes": [], "sync_id": 10112 }, "user_id": 456, "name": { "first": "Jane", "last": "Roe" } }

Getting started

Requirements

  1. Create an Azure Storage account.
  2. Create the Azure Blob Storage container where Airbyte should write files.
  3. Create credentials for one supported authentication method:
    • A shared access signature (SAS)
    • An Azure Entra ID service principal
    • An Azure Blob Storage account key
  4. Verify that you can access the container in the Azure portal by using Storage browser.

Setup guide

  1. For Azure Blob Storage Endpoint Domain Name, keep the default value unless you use a custom Azure Storage endpoint.
  2. For Azure Blob Storage Container Name, enter the name of the container you created.
  3. For Azure Blob Storage Account Name, enter the storage account name.
  4. For Authentication, choose exactly one method:
    • Shared Access Signature: Use a SAS scoped to the target container. The SAS must allow Airbyte to list blobs, read blob metadata, create and write blobs, and delete blobs.
    • Azure Entra ID (Service Principal): Enter the Azure tenant ID, client ID, and client secret for a service principal. Assign the service principal the Storage Blob Data Contributor role on the target container or storage account.
    • Azure Blob Storage Account Key: Enter an access key for the storage account.
  5. For Azure Blob Storage Target Blob Size (MB), use the default unless you need smaller or larger output blobs.
  6. For Format, choose the output file format and flattening behavior.
  7. Make sure the machine running Airbyte can reach your Azure Blob Storage endpoint. If you're using Airbyte Cloud with IP-based access controls, see Network access.
  8. Use Check connection in the Airbyte UI to verify that Airbyte can write, list, read metadata for, and delete a test blob in the container.

Reference

Use this reference when you configure the connector with PyAirbyte, Terraform, or the Airbyte API. For the Airbyte UI, use the field names in Configuration.

FieldTypeRequiredDescription
azure_blob_storage_endpoint_domain_namestringNoAzure Blob Storage endpoint domain name. Defaults to blob.core.windows.net.
azure_blob_storage_account_namestringYesName of the Azure Storage account.
azure_blob_storage_container_namestringYesName of the existing Azure Blob Storage container.
shared_access_signaturestringNoSAS token for Azure Blob Storage. Set this only if you don't set azure_blob_storage_account_key, azure_tenant_id, azure_client_id, or azure_client_secret.
azure_blob_storage_account_keystringNoAzure Blob Storage account key. Set this only if you don't set shared_access_signature, azure_tenant_id, azure_client_id, or azure_client_secret.
azure_tenant_idstringNoAzure Entra ID tenant ID. For Entra ID authentication, set this with azure_client_id and azure_client_secret.
azure_client_idstringNoAzure Entra ID client ID. For Entra ID authentication, set this with azure_tenant_id and azure_client_secret.
azure_client_secretstringNoAzure Entra ID client secret. For Entra ID authentication, set this with azure_tenant_id and azure_client_id.
azure_blob_storage_spill_sizeintegerNoMaximum target blob size in megabytes before Airbyte writes to a new blob. Defaults to 500.
formatobjectYesOutput format. Set format_type to CSV or JSONL.

Namespace support

This destination supports namespaces. The namespace is used as part of the output path structure.

Reference

Config fields reference

Field
Type
Property name
string
azure_blob_storage_account_name
string
azure_blob_storage_container_name
object
format
string
azure_blob_storage_account_key
string
azure_blob_storage_endpoint_domain_name
integer
azure_blob_storage_spill_size
string
azure_client_id
string
azure_client_secret
string
azure_tenant_id
string
shared_access_signature

Changelog

Expand to review
VersionDatePull RequestSubject
1.1.72026-05-2078243Prevent overwrite syncs from deleting old files when a source stream fails mid-sync in speed mode.
1.1.62026-01-2672355Fix sync failures for sources with empty schemas by upgrading CDK to 0.2.1
1.1.52026-01-2072301Upgrade CDK to 0.2.0
1.1.42025-11-0569127Upgrade to Bulk CDK 0.1.61.
1.1.32025-10-2167153Implement new proto schema implementation
1.1.22025-10-0667078Remove memory limit for sync jobs to improve performance and resource utilization.
1.1.12025-09-1066139Fix inconsistent field name casing and improve tooltip clarity. Field names now use consistent title casing and tooltips reference exact field names.
1.1.02025-09-0565933Add support for Azure Entra ID (Service Principal) authentication. You can now authenticate using Azure AD tenant ID, client ID, and client secret.
1.0.42025-08-0764556Promoting release candidate 1.0.4-rc.1 to a main version.
1.0.4-rc.12025-08-0559710Release Azure blob destination on latest CDK
1.0.32025-05-0759710CDK backpressure bugfix
1.0.22025-04-1457563Fix signature spec example
1.0.12025-04-0957541Fix metadata to actually certify.
1.0.02025-04-0356391Bring into compliance with modern connector standards; certify connector.
0.2.52025-03-2155906Upgrade to airbyte/java-connector-base:2.0.1 to be M4 compatible.
0.2.42025-01-1051507Use a non root base image
0.2.32024-12-1849910Use a base image: airbyte/java-connector-base:1.0.0
0.2.22024-06-12#38061File Extensions added for the output files
0.2.12023-09-13#30412Switch noisy logging to debug
0.2.02023-01-18#21467Support spilling of objects exceeding configured size threshold
0.1.62022-08-08#15318Support per-stream state
0.1.52022-06-16#13852Updated stacktrace format for any trace message errors
0.1.42022-05-1712820Improved 'check' operation performance
0.1.32022-02-1410256Add -XX:+ExitOnOutOfMemoryError JVM option
0.1.22022-01-20#9682Each data synchronization for each stream is written to a new blob to the folder with stream name.
0.1.12021-12-29#9190Added BufferedOutputStream wrapper to blob output stream to improve performance and fix issues with 50,000 block limit. Also disabled autoflush on PrintWriter.
0.1.02021-08-30#5332Initial release with JSONL and CSV output.