IsabelProject

Sign In Get Started

SNAP Data Ingestion

This page simulates how datasets are onboarded into SNAP from external providers and internal collection workflows.

Ingestion pipeline stages

Source registration
Schema mapping
Validation checks
Transformation and harmonization
Load into staging
Quality review
Publish to catalog

Source registration

At registration time, SNAP stores:

Source organization
Endpoint or delivery method
File format
Update frequency
Contact owner
Licensing terms

Accepted formats (debug set)

CSV (UTF-8)
JSON / NDJSON
Parquet
XLSX (with explicit tab selection)

Validation checks

Validation is performed before publication:

Required columns exist
Data types match schema
Date fields are parseable and normalized
Categorical values match allowed lists
Numeric fields pass bounds checks

Harmonization rules

To align heterogeneous inputs:

Region codes are converted to standard NUTS mappings.
Time references are normalized to ISO date conventions.
Units are converted to canonical forms where possible.
Missing values are tagged with explicit null reasons.

Ingestion status model

Every ingestion run has a status:

queued
running
failed-validation
loaded-staging
published

Operational metrics

Useful debug metrics:

Total files processed
Success/failure ratio
Average processing duration
Rows rejected by validation

Simulated CLI snippets

echo "ingest source: eu-lfs"
echo "validate schema: employment-v2"
echo "publish dataset: regional-employment-rate"

Failure handling

When ingestion fails:

The dataset remains hidden from public catalog.
A run log is retained with per-rule failures.
Data steward receives a notification entry.

Debug checklist

Run appears in ingestion history.
Validation summary is visible.
Published datasets include refresh timestamp.

On this page