CKAN -- Comprehensive Knowledge Archive Network
CKAN is the leading open-source data management system for powering open data portals. Originally developed by the Open Knowledge Foundation, it is used by data.gov, data.gov.uk, data.gov.au, open.canada.ca, and hundreds of other government and institutional data portals worldwide.
Architecture
Data Model
Dataset (Package)
The central unit: metadata describing a collection of data. Contains title, description, license, author, tags, and extras (custom key-value metadata fields).
Resource
A file or link within a dataset. Each dataset can have multiple resources (e.g., CSV, JSON, and API endpoint for the same data). Resources can be uploaded files or external URLs.
Organization
The publishing entity (government department, agency). Controls who can create and edit datasets. Organizations form the primary access control mechanism.
Group
Thematic groupings that cut across organizations (e.g., "Environment", "Transportation"). Datasets can belong to multiple groups.
Key API Endpoints
| Endpoint | Method | Description | Auth | Key Parameters |
|---|---|---|---|---|
| package_list | GET | Returns a list of all dataset (package) names in the catalog. | No | limit, offset |
| package_show | GET | Returns the full metadata for a single dataset, including all resources. | No (public datasets) | id (dataset name or ID) |
| package_search | GET | Full-text search across dataset metadata. Supports Solr query syntax, faceting, and filtering. | No | q (query), fq (filter), rows, start, sort, facet.field |
| resource_show | GET | Returns metadata for a single resource (file or API link within a dataset). | No (public) | id (resource ID) |
| datastore_search | GET/POST | SQL-like query against tabular data stored in the DataStore extension. Supports filters, full-text search, and field selection. | No (public) | resource_id, filters, q, fields, sort, limit, offset |
| datastore_search_sql | POST | Execute raw SQL (read-only) against DataStore tables. Allows JOINs and complex queries. | No (if enabled) | sql (SQL query string) |
| organization_list | GET | Returns all organizations in the catalog. | No | all_fields, sort, limit, offset |
| group_list | GET | Returns all thematic groups (topic categories) in the catalog. | No | all_fields, sort, limit, offset |
| tag_list | GET | Returns all tags used across datasets. | No | query, all_fields |
| package_create | POST | Creates a new dataset. Requires API key with appropriate permissions. | Yes (API key) | name, title, notes, owner_org, resources, tags, extras |
Popular Extensions
| Extension | Description |
|---|---|
| ckanext-harvest | Harvests metadata from remote catalogs (DCAT, CSW, WAF, CKAN). |
| ckanext-spatial | Adds spatial/geographic search capabilities and map widgets. |
| ckanext-scheming | Customizable metadata schemas without writing Python code. |
| ckanext-dcat | Exposes CKAN datasets as DCAT RDF and enables DCAT harvesting. |
| ckanext-xloader | Loads CSV and tabular files into the DataStore automatically. |
| ckanext-pages | Adds CMS-like static pages to a CKAN instance. |
| ckanext-showcase | Allows users to create showcases linking to applications built with datasets. |
| ckanext-googleanalytics | Integrates Google Analytics tracking for dataset and resource views. |
Deployment
CKAN can be deployed via Docker (official images available), package install on Ubuntu/Debian, or source install. The recommended production stack includes:
- CKAN application server (Python WSGI, typically with uWSGI or Gunicorn)
- PostgreSQL 12+ (metadata database and DataStore)
- Apache Solr 8+ (search indexing)
- Redis (job queue and caching)
- Nginx (reverse proxy and static file serving)
- Supervisor or systemd (process management)