Azure Document Intelligence – Fix: ContentSourceNotAccessible (Invalid data URL)

Azure Document Intelligence – Fix: ContentSourceNotAccessible (Invalid data URL)

Problem

Training custom models (for example, delivery notes) in Azure Document Intelligence initially worked fine. Suddenly, both training and auto-labeling started failing with “Internal Server Error”. In the model details, the following appeared:

ContentSourceNotAccessible: Content is not accessible: Invalid data URL

Symptoms

  • Training fails immediately or shortly after with “Internal Server Error”.
  • Auto-labeling fails as well.
  • Model details show: “ContentSourceNotAccessible: … Invalid data URL”.

Possible causes

There are three common root causes for this error:

  1. Invalid data URL to Azure Storage (wrong container/path, casing, special characters)
  2. Permission issue between Document Intelligence and Azure Storage (auth)
  3. Network restrictions (Storage behind firewall/VNET/private endpoint that Document Intelligence can’t reach)

When creating projects in the Document Intelligence Studio UI, a connection to Azure Storage is set up automatically via SAS token. That’s convenient, but not recommended. SAS tokens expire and must be renewed reliably. Sometimes that renewal fails - the result: access suddenly breaks even though nothing changed functionally.

Instead of SAS tokens: Enable the managed identity of the Document Intelligence account and assign “Storage Blob Data Reader” on the target storage.

High-level steps:

  1. Enable system-assigned managed identity on the Document Intelligence resource
  2. Assign the managed identity the “Storage Blob Data Reader” role on the storage account (or container scope if needed)
  3. If storage is restricted (firewall/VNET/private endpoint): allow access for the managed identity (for example, via “Allow trusted Microsoft services” or a proper network setup)
  4. Wait 10–30 minutes (RBAC/token propagation), then retry training/labeling

Optional: Azure CLI (PowerShell)

Note: Adjust resource names/IDs.

 1# 1) Enable system-assigned managed identity on the Document Intelligence account
 2# Resource type: Microsoft.CognitiveServices/accounts (kind: FormRecognizer / Document Intelligence)
 3$rg = "rg-di"
 4$diName = "di-prod-001"
 5
 6az cognitiveservices account update `
 7  --resource-group $rg `
 8  --name $diName `
 9  --set identity.type=SystemAssigned
10
11# 2) Assign the managed identity as Storage Blob Data Reader on the storage account
12$stgRg = "rg-storage"
13$stgName = "stgdi001"
14
15# Get the principalId of the managed identity
16$principalId = az cognitiveservices account show `
17  --resource-group $rg `
18  --name $diName `
19  --query identity.principalId -o tsv
20
21# Assign role at the account scope
22az role assignment create `
23  --assignee-object-id $principalId `
24  --assignee-principal-type ServicePrincipal `
25  --role "Storage Blob Data Reader" `
26  --scope $(az storage account show -g $stgRg -n $stgName --query id -o tsv)

Tip: For auto-labeling that writes label files, you may need “Storage Blob Data Contributor”. Review your flows and grant the minimum required permissions.

Check networking

  • Storage firewall: enable “Allow trusted Microsoft services to access this storage account” - or configure private access correctly.
  • Private endpoint: ensure blob endpoints are reachable and DNS resolves correctly.
  • VNET restrictions: without a suitable path (for example, private endpoint), Document Intelligence can’t reach the blobs.

Verification

Wait 10–30 minutes after the RBAC assignment. Then:

  • Reopen the project in Studio and verify the data source
  • Start auto-labeling again
  • Trigger a new training run

If access works again, the “Internal Server Error” and “ContentSourceNotAccessible” messages will disappear.

Lessons learned

  • SAS tokens are convenient but fragile (expiry/renewal). For production, prefer managed identity + RBAC.
  • Separate permissions cleanly: read (training) vs. write (labels/outputs).
  • Plan networking early (firewall/private endpoints/DNS).

Common pitfalls

  • “Content is not accessible: Invalid data URL” due to typos in the container path (case sensitivity!)
  • SAS expired - the UI doesn’t always make it obvious
  • Managed Identity enabled but role assigned at the wrong scope (for example, only container instead of account - or vice versa)
  • Not waiting long enough after RBAC changes (propagation!)

References

Good luck fixing it!


Comments

Twitter Facebook LinkedIn WhatsApp