Data Ingestion
Data Ingestion
Akurion turns project content into a searchable knowledge base. After content is added through direct upload or an enabled workspace ingestion path, Akurion prepares it for governed search, cited answers, workflows, graph context, REST APIs, and MCP tools.
Availability
Source connector availability is controlled in the Akurion app and may vary by workspace. Disabled connector types are not documented publicly until they are enabled for customer use.
Direct file upload is the default documented ingestion path for pilots, ad hoc analysis, and curated knowledge bases.
Sync Lifecycle
Each ingestion path follows this general lifecycle:
- Add content to a project.
- Store the file or source configuration.
- Prepare content for processing.
- Discover or receive files and records.
- Track readiness and source status.
- Enrich content with project metadata.
- Make approved content available for search, answers, workflows, and APIs.
- Add graph context when Graph RAG is enabled.
Source Status
File and source status tells you whether content is ready for retrieval.
Common states:
| State | Meaning |
|---|---|
processing | The source or file is actively being processed. |
processed or completed | Content is ready for search and answer generation. |
failed | Processing failed and may need admin action. |
retrying | Akurion is retrying a job, often after transient failure or memory escalation. |
no_credits | The subscription or project has insufficient credits for processing. |
permission_error | The configured ingestion path cannot access the file or record. |
format_not_supported | The file type could not be processed. |
Direct Upload
Use file upload for pilots, ad hoc analysis, and small curated knowledge bases.
For developer uploads, use:
curl -X POST "https://api.structhub.io/api/v1/project/files/upload-url" \ -H "Content-Type: application/json" \ -H "API-KEY: YOUR_API_KEY" \ -H "X-Project-ID: YOUR_PROJECT_ID" \ -d '{ "name": "policy.pdf", "file_type": "application/pdf" }'Metadata for Better Retrieval
Metadata makes retrieval more precise. Useful metadata keys include:
- Department
- Region
- Customer
- Product
- Source system
- Document type
- Effective date
- Owner
- Confidentiality
- Workflow stage
After metadata is generated, users can filter in chat and developers can pass metadata filters to REST or MCP tools.
Operational Tips
- Start pilots with a small set of high-value documents rather than all company content.
- Define metadata keys before indexing large document sets.
- Use file status and source health, where available, to validate readiness.
- Enable Graph RAG when entity relationships are important.
- Use resync after changing source scope or metadata settings.
- Use project instructions to steer answer behavior for each knowledge domain.