SNAP Data Ingestion

This page simulates how datasets are onboarded into SNAP from external providers and internal collection workflows.

Ingestion pipeline stages

  1. Source registration
  2. Schema mapping
  3. Validation checks
  4. Transformation and harmonization
  5. Load into staging
  6. Quality review
  7. Publish to catalog

Source registration

At registration time, SNAP stores:

  • Source organization
  • Endpoint or delivery method
  • File format
  • Update frequency
  • Contact owner
  • Licensing terms

Accepted formats (debug set)

  • CSV (UTF-8)
  • JSON / NDJSON
  • Parquet
  • XLSX (with explicit tab selection)

Validation checks

Validation is performed before publication:

  • Required columns exist
  • Data types match schema
  • Date fields are parseable and normalized
  • Categorical values match allowed lists
  • Numeric fields pass bounds checks

Harmonization rules

To align heterogeneous inputs:

  • Region codes are converted to standard NUTS mappings.
  • Time references are normalized to ISO date conventions.
  • Units are converted to canonical forms where possible.
  • Missing values are tagged with explicit null reasons.

Ingestion status model

Every ingestion run has a status:

  • queued
  • running
  • failed-validation
  • loaded-staging
  • published

Operational metrics

Useful debug metrics:

  • Total files processed
  • Success/failure ratio
  • Average processing duration
  • Rows rejected by validation

Simulated CLI snippets

echo "ingest source: eu-lfs"
echo "validate schema: employment-v2"
echo "publish dataset: regional-employment-rate"

Failure handling

When ingestion fails:

  • The dataset remains hidden from public catalog.
  • A run log is retained with per-rule failures.
  • Data steward receives a notification entry.

Debug checklist

  • Run appears in ingestion history.
  • Validation summary is visible.
  • Published datasets include refresh timestamp.
Funded by the European Union

This project has received funding from the European Union’s Horizon research and innovation actions program under grant agreement No 101177687.

Connect With Us

© 2026 IsabelProject. All rights reserved.

Funded by the European Union.

Version: Alpha v2