CKAN -- Comprehensive Knowledge Archive Network

CKAN is the leading open-source data management system for powering open data portals. Originally developed by the Open Knowledge Foundation, it is used by data.gov, data.gov.uk, data.gov.au, open.canada.ca, and hundreds of other government and institutional data portals worldwide.

Architecture

Language:Python (Flask/Werkzeug)
Database:PostgreSQL (metadata + DataStore)
Search:Apache Solr (full-text index)
File Storage:Local filesystem, S3, or cloud storage
Task Queue:Redis + CKAN Workers (background jobs)
License:AGPL v3.0

Data Model

Dataset (Package)

The central unit: metadata describing a collection of data. Contains title, description, license, author, tags, and extras (custom key-value metadata fields).

Resource

A file or link within a dataset. Each dataset can have multiple resources (e.g., CSV, JSON, and API endpoint for the same data). Resources can be uploaded files or external URLs.

Organization

The publishing entity (government department, agency). Controls who can create and edit datasets. Organizations form the primary access control mechanism.

Group

Thematic groupings that cut across organizations (e.g., "Environment", "Transportation"). Datasets can belong to multiple groups.

Key API Endpoints

EndpointMethodDescriptionAuthKey Parameters
package_listGETReturns a list of all dataset (package) names in the catalog.Nolimit, offset
package_showGETReturns the full metadata for a single dataset, including all resources.No (public datasets)id (dataset name or ID)
package_searchGETFull-text search across dataset metadata. Supports Solr query syntax, faceting, and filtering.Noq (query), fq (filter), rows, start, sort, facet.field
resource_showGETReturns metadata for a single resource (file or API link within a dataset).No (public)id (resource ID)
datastore_searchGET/POSTSQL-like query against tabular data stored in the DataStore extension. Supports filters, full-text search, and field selection.No (public)resource_id, filters, q, fields, sort, limit, offset
datastore_search_sqlPOSTExecute raw SQL (read-only) against DataStore tables. Allows JOINs and complex queries.No (if enabled)sql (SQL query string)
organization_listGETReturns all organizations in the catalog.Noall_fields, sort, limit, offset
group_listGETReturns all thematic groups (topic categories) in the catalog.Noall_fields, sort, limit, offset
tag_listGETReturns all tags used across datasets.Noquery, all_fields
package_createPOSTCreates a new dataset. Requires API key with appropriate permissions.Yes (API key)name, title, notes, owner_org, resources, tags, extras

Popular Extensions

ExtensionDescription
ckanext-harvestHarvests metadata from remote catalogs (DCAT, CSW, WAF, CKAN).
ckanext-spatialAdds spatial/geographic search capabilities and map widgets.
ckanext-schemingCustomizable metadata schemas without writing Python code.
ckanext-dcatExposes CKAN datasets as DCAT RDF and enables DCAT harvesting.
ckanext-xloaderLoads CSV and tabular files into the DataStore automatically.
ckanext-pagesAdds CMS-like static pages to a CKAN instance.
ckanext-showcaseAllows users to create showcases linking to applications built with datasets.
ckanext-googleanalyticsIntegrates Google Analytics tracking for dataset and resource views.

Deployment

CKAN can be deployed via Docker (official images available), package install on Ubuntu/Debian, or source install. The recommended production stack includes:

  • CKAN application server (Python WSGI, typically with uWSGI or Gunicorn)
  • PostgreSQL 12+ (metadata database and DataStore)
  • Apache Solr 8+ (search indexing)
  • Redis (job queue and caching)
  • Nginx (reverse proxy and static file serving)
  • Supervisor or systemd (process management)