
Problem
Training custom models (for example, delivery notes) in Azure Document Intelligence initially worked fine. Suddenly, both training and auto-labeling started failing with “Internal Server Error”. In the model details, the following appeared:
ContentSourceNotAccessible: Content is not accessible: Invalid data URL
Symptoms
- Training fails immediately or shortly after with “Internal Server Error”.
- Auto-labeling fails as well.
- Model details show: “ContentSourceNotAccessible: … Invalid data URL”.
Possible causes
There are three common root causes for this error:
- Invalid data URL to Azure Storage (wrong container/path, casing, special characters)
- Permission issue between Document Intelligence and Azure Storage (auth)
- Network restrictions (Storage behind firewall/VNET/private endpoint that Document Intelligence can’t reach)
Background: SAS tokens from the UI (not recommended)
When creating projects in the Document Intelligence Studio UI, a connection to Azure Storage is set up automatically via SAS token. That’s convenient, but not recommended. SAS tokens expire and must be renewed reliably. Sometimes that renewal fails - the result: access suddenly breaks even though nothing changed functionally.
Fix (recommended): Managed identity + RBAC
Instead of SAS tokens: Enable the managed identity of the Document Intelligence account and assign “Storage Blob Data Reader” on the target storage.
High-level steps:
- Enable system-assigned managed identity on the Document Intelligence resource
- Assign the managed identity the “Storage Blob Data Reader” role on the storage account (or container scope if needed)
- If storage is restricted (firewall/VNET/private endpoint): allow access for the managed identity (for example, via “Allow trusted Microsoft services” or a proper network setup)
- Wait 10–30 minutes (RBAC/token propagation), then retry training/labeling
Optional: Azure CLI (PowerShell)
Note: Adjust resource names/IDs.
1# 1) Enable system-assigned managed identity on the Document Intelligence account
2# Resource type: Microsoft.CognitiveServices/accounts (kind: FormRecognizer / Document Intelligence)
3$rg = "rg-di"
4$diName = "di-prod-001"
5
6az cognitiveservices account update `
7 --resource-group $rg `
8 --name $diName `
9 --set identity.type=SystemAssigned
10
11# 2) Assign the managed identity as Storage Blob Data Reader on the storage account
12$stgRg = "rg-storage"
13$stgName = "stgdi001"
14
15# Get the principalId of the managed identity
16$principalId = az cognitiveservices account show `
17 --resource-group $rg `
18 --name $diName `
19 --query identity.principalId -o tsv
20
21# Assign role at the account scope
22az role assignment create `
23 --assignee-object-id $principalId `
24 --assignee-principal-type ServicePrincipal `
25 --role "Storage Blob Data Reader" `
26 --scope $(az storage account show -g $stgRg -n $stgName --query id -o tsv)
Tip: For auto-labeling that writes label files, you may need “Storage Blob Data Contributor”. Review your flows and grant the minimum required permissions.
Check networking
- Storage firewall: enable “Allow trusted Microsoft services to access this storage account” - or configure private access correctly.
- Private endpoint: ensure blob endpoints are reachable and DNS resolves correctly.
- VNET restrictions: without a suitable path (for example, private endpoint), Document Intelligence can’t reach the blobs.
Verification
Wait 10–30 minutes after the RBAC assignment. Then:
- Reopen the project in Studio and verify the data source
- Start auto-labeling again
- Trigger a new training run
If access works again, the “Internal Server Error” and “ContentSourceNotAccessible” messages will disappear.
Lessons learned
- SAS tokens are convenient but fragile (expiry/renewal). For production, prefer managed identity + RBAC.
- Separate permissions cleanly: read (training) vs. write (labels/outputs).
- Plan networking early (firewall/private endpoints/DNS).
Common pitfalls
- “Content is not accessible: Invalid data URL” due to typos in the container path (case sensitivity!)
- SAS expired - the UI doesn’t always make it obvious
- Managed Identity enabled but role assigned at the wrong scope (for example, only container instead of account - or vice versa)
- Not waiting long enough after RBAC changes (propagation!)
References
- Azure Document Intelligence (Form Recognizer) – Overview: https://learn.microsoft.com/azure/ai-services/document-intelligence/
- RBAC roles for Azure Storage: https://learn.microsoft.com/azure/storage/common/storage-auth-aad#azure-built-in-roles-for-azure-storage
- Managed identities – overview: https://learn.microsoft.com/entra/identity/managed-identities-azure-resources/overview
Good luck fixing it!

Comments