Skip to content

Data Ingestion

Data Ingestion

Akurion turns project content into a searchable knowledge base. After content is added through direct upload or an enabled workspace ingestion path, Akurion prepares it for governed search, cited answers, workflows, graph context, REST APIs, and MCP tools.

Availability

Source connector availability is controlled in the Akurion app and may vary by workspace. Disabled connector types are not documented publicly until they are enabled for customer use.

Direct file upload is the default documented ingestion path for pilots, ad hoc analysis, and curated knowledge bases.

Sync Lifecycle

Each ingestion path follows this general lifecycle:

  1. Add content to a project.
  2. Store the file or source configuration.
  3. Prepare content for processing.
  4. Discover or receive files and records.
  5. Track readiness and source status.
  6. Enrich content with project metadata.
  7. Make approved content available for search, answers, workflows, and APIs.
  8. Add graph context when Graph RAG is enabled.

Source Status

File and source status tells you whether content is ready for retrieval.

Common states:

StateMeaning
processingThe source or file is actively being processed.
processed or completedContent is ready for search and answer generation.
failedProcessing failed and may need admin action.
retryingAkurion is retrying a job, often after transient failure or memory escalation.
no_creditsThe subscription or project has insufficient credits for processing.
permission_errorThe configured ingestion path cannot access the file or record.
format_not_supportedThe file type could not be processed.

Direct Upload

Use file upload for pilots, ad hoc analysis, and small curated knowledge bases.

For developer uploads, use:

Terminal window
curl -X POST "https://api.structhub.io/api/v1/project/files/upload-url" \
-H "Content-Type: application/json" \
-H "API-KEY: YOUR_API_KEY" \
-H "X-Project-ID: YOUR_PROJECT_ID" \
-d '{
"name": "policy.pdf",
"file_type": "application/pdf"
}'

Metadata for Better Retrieval

Metadata makes retrieval more precise. Useful metadata keys include:

  • Department
  • Region
  • Customer
  • Product
  • Source system
  • Document type
  • Effective date
  • Owner
  • Confidentiality
  • Workflow stage

After metadata is generated, users can filter in chat and developers can pass metadata filters to REST or MCP tools.

Operational Tips

  • Start pilots with a small set of high-value documents rather than all company content.
  • Define metadata keys before indexing large document sets.
  • Use file status and source health, where available, to validate readiness.
  • Enable Graph RAG when entity relationships are important.
  • Use resync after changing source scope or metadata settings.
  • Use project instructions to steer answer behavior for each knowledge domain.