CKAN -- Comprehensive Knowledge Archive Network

CKAN is the leading open-source data management system for powering open data portals. Originally developed by the Open Knowledge Foundation, it is used by data.gov, data.gov.uk, data.gov.au, open.canada.ca, and hundreds of other government and institutional data portals worldwide.

Architecture

Language:Python (Flask/Werkzeug)

Database:PostgreSQL (metadata + DataStore)

Search:Apache Solr (full-text index)

File Storage:Local filesystem, S3, or cloud storage

Task Queue:Redis + CKAN Workers (background jobs)

License:AGPL v3.0

Data Model

Dataset (Package)

The central unit: metadata describing a collection of data. Contains title, description, license, author, tags, and extras (custom key-value metadata fields).

Resource

A file or link within a dataset. Each dataset can have multiple resources (e.g., CSV, JSON, and API endpoint for the same data). Resources can be uploaded files or external URLs.

Organization

The publishing entity (government department, agency). Controls who can create and edit datasets. Organizations form the primary access control mechanism.

Group

Thematic groupings that cut across organizations (e.g., "Environment", "Transportation"). Datasets can belong to multiple groups.

Key API Endpoints

Endpoint	Method	Description	Auth	Key Parameters
package_list	`GET`	Returns a list of all dataset (package) names in the catalog.	No	limit, offset
package_show	`GET`	Returns the full metadata for a single dataset, including all resources.	No (public datasets)	id (dataset name or ID)
package_search	`GET`	Full-text search across dataset metadata. Supports Solr query syntax, faceting, and filtering.	No	q (query), fq (filter), rows, start, sort, facet.field
resource_show	`GET`	Returns metadata for a single resource (file or API link within a dataset).	No (public)	id (resource ID)
datastore_search	`GET/POST`	SQL-like query against tabular data stored in the DataStore extension. Supports filters, full-text search, and field selection.	No (public)	resource_id, filters, q, fields, sort, limit, offset
datastore_search_sql	`POST`	Execute raw SQL (read-only) against DataStore tables. Allows JOINs and complex queries.	No (if enabled)	sql (SQL query string)
organization_list	`GET`	Returns all organizations in the catalog.	No	all_fields, sort, limit, offset
group_list	`GET`	Returns all thematic groups (topic categories) in the catalog.	No	all_fields, sort, limit, offset
tag_list	`GET`	Returns all tags used across datasets.	No	query, all_fields
package_create	`POST`	Creates a new dataset. Requires API key with appropriate permissions.	Yes (API key)	name, title, notes, owner_org, resources, tags, extras

Popular Extensions

Extension	Description
ckanext-harvest	Harvests metadata from remote catalogs (DCAT, CSW, WAF, CKAN).
ckanext-spatial	Adds spatial/geographic search capabilities and map widgets.
ckanext-scheming	Customizable metadata schemas without writing Python code.
ckanext-dcat	Exposes CKAN datasets as DCAT RDF and enables DCAT harvesting.
ckanext-xloader	Loads CSV and tabular files into the DataStore automatically.
ckanext-pages	Adds CMS-like static pages to a CKAN instance.
ckanext-showcase	Allows users to create showcases linking to applications built with datasets.
ckanext-googleanalytics	Integrates Google Analytics tracking for dataset and resource views.

Deployment

CKAN can be deployed via Docker (official images available), package install on Ubuntu/Debian, or source install. The recommended production stack includes:

CKAN application server (Python WSGI, typically with uWSGI or Gunicorn)
PostgreSQL 12+ (metadata database and DataStore)
Apache Solr 8+ (search indexing)
Redis (job queue and caching)
Nginx (reverse proxy and static file serving)
Supervisor or systemd (process management)