Differences between V1-V2 and V3#
Metax V3 Rest-API has number of changes from previous versions (v1-v2) and is incompatible in most API-endpoints.
This page introduces the main differences between the versions, but the most up-to-date information about the specific API version can always be found in the Swagger documentation.
Info
Following markers are used to clarify changes:
known change from V1-V2 (and what it is), not yet implemented
change unknown, because of unknown third party library conventions or limitations
Authentication and authorization#
Unlike Metax V1-V2, V3 does not use basic authentication headers. Instead, a bearer token is provided to users and integration customers. More details in End User Access.
Dataset#
Also named CatalogRecord in V1-V2. Main differences are: removal of the research_dataset nested object, and more descriptive field names.
For more information, see the new Dataset API user guide.
A good starting point for converting a dataset payload to the V3 format is the /v3/datasets/convert_from_legacy
helper endpoint. It accepts V1/V2 dataset JSON and converts it into V3 style dataset JSON. If errors are detected in the
resulting JSON, they are included in "errors"
object in the response. The endpoint does not do any permission
checks or validate that the dataset JSON has all the data needed for publishing.
Example dataset conversion using POST /v3/datasets/convert_from_legacy
Dataset endpoints#
There's a few changes made for the datasets endpoints. Bulk actions are removed. Also fetching datasets based on the list of identifiers are removed as well as separate endpoints to get the identifiers as a list.
Endpoints to manage datasets' editor permissions are not implemented to v3 yet but will be replaced with similar system as in Metax v2. Metax V2 editor permissions continue to work as before.
type | v2 endpoint | v3 endpoint | notes |
---|---|---|---|
PUT | /datasets[list] | Not used in V3 | Update of a list of datasets. |
PATCH | /datasets[list] | Not used in V3 | Partial update of a list of datasets. |
DELETE | /datasets | Not used in V3 | Delete of a list of datasets |
POST | /datasets/identifiers | Not used in V3 | A list of all dataset identifiers |
POST | /datasets/unique_preferred_identifiers | Not used in V3 | A list of unique dataset preferred ids |
POST | /datasets/list | Not used in V3 | Fetch a set of datasets using ids |
GET | /datasets/{pid}/files | see Files for more information | |
POST | /datasets/{pid}/files | see Files for more information | |
GET | /datasets/{pid}/files/{file_pid} | see Files for more information | |
PUT | /datasets/{pid}/files/user_metadata | see Files for more information | |
PATCH | /datasets/{pid}/files/user_metadata | see Files for more information | |
GET | /datasets/{CRID}/editor_permissions/users | Not implemented yet | list editor permissions |
POST | /datasets/{CRID}/editor_permissions/users | Not implemented yet | create editor permissions |
GET | /datasets/{CRID}/editor_permissions/users/{USER_ID} | Not implemented yet | return single permission |
PATCH | /datasets/{CRID}/editor_permissions/users/{USER_ID} | Not implemented yet | update permission |
DELETE | /datasets/{CRID}/editor_permissions/users/{USER_ID} | Not implemented yet | remove permission |
Field names#
V1-V2 field name | V3 field name | Notes |
---|---|---|
data_catalog [object] | data_catalog [str] | This is a URN-type identifier. |
dataset_version_set [list] | dataset_versions [list] | |
date_created [datetime] | created [datetime] | |
date_cumulation_started [datetime] | cumulation_started [datetime] | |
date_deprecated [datetime] | deprecated [datetime] | date_deprecated [datetime] and deprecated [bool] have been combined into one deprecated [datetime] field. |
date_last_cumulative_addition [datetime] | last_cumulative_addition [datetime] | |
date_modified [datetime] | modified [datetime] | |
date_removed [datetime] | removed [datetime] | date_removed [datetime] and removed [bool] have been combined into one removed [datetime] field. |
deprecated [bool] | deprecated [datetime] | deprecated [bool] and date_deprecated [datetime] have been combined into one deprecated [datetime] field. |
identifier [uuid] | id [uuid] | |
metadata_owner_org [str] | metadata_owner.organization [str] | |
metadata_provider_org [str] | not used in V3 | Metadata provider can now be found under metadata_owner.organization . |
metadata_provider_user [str] | metadata_owner.user [str] | |
N/A | fileset [object] | See Dataset Files for more information on fileset . |
preservation_dataset_origin_version [object] | preservation.dataset_origin_version [object] | not yet implemented |
preservation_dataset_version [object] | preservation.dataset_version [object] | not yet implemented |
preservation_description [str] | preservation.description [str] | |
preservation_identifier [str] | preservation.id [uuid] | |
preservation_reason_description [str] | preservation.reason_description [str] | |
preservation_state [int] | preservation.state [int] | |
preservation_state_modified [datetime] | preservation.state_modified [datetime] | not yet implemented |
previous_dataset_version [object] | not used in V3 | Version information can now be found under dataset_versions . |
removed [bool] | removed [datetime] | removed [bool] and date_removed [datetime] have been combined into one removed [datetime] field. |
research_dataset [object] | not used in V3 | All metadata under research_dataset has been moved directly under dataset. See fields below. |
research_dataset.access_rights [object] | access_rights [object] | |
research_dataset.available [date] | access_rights.available [date] | |
research_dataset.contributor [list] | actors [list] | See Actors-section for information on specifying actor roles in V3. |
research_dataset.creator [list] | actors [list] | See Actors-section for information on specifying actor roles in V3. |
research_dataset.curator [list] | actors [list] | See Actors-section for information on specifying actor roles in V3. |
research_dataset.description [dict] | description [dict] | |
research_dataset.field_of_science [list] | field_of_science [list] | |
research_dataset.is_output_of [list] | projects [list] | |
research_dataset.issued [date] | issued [datetime] | |
research_dataset.keyword [list] | keyword [list] | |
research_dataset.language [list] | language [list] | |
research_dataset.modified [datetime] | modified [datetime] | |
research_dataset.other_identifier [list] | other_identifiers [list] | |
research_dataset.preferred_identifier | persistent_identifier | |
research_dataset.provenance [list] | provenance [list] | |
research_dataset.publisher [object] | actors [list] | See Actors-section for information on specifying actor roles in V3. |
research_dataset.relation[list] | relation [list] | |
research_dataset.rights_holder [list] | actors [list] | See Actors-section for information on specifying actor roles in V3. |
research_dataset.spatial [list] | spatial [list] | |
research_dataset.temporal [list] | temporal [list] | |
research_dataset.theme [list] | theme [list] | |
research_dataset.title [dict] | title [dict] | |
research_dataset.total_files_byte_size [int] | fileset.total_files_size [int] | See Dataset Files for more information on fileset . |
user_created | not used in V3 | This information can now be found in metadata_owner.user . |
Complex fields#
Reference data fields#
In dataset, reference data fields such as location and organization have been unified structurally. Now all of these share the same base fields and payload format.
V1-V2 | V3 field name |
---|---|
identifier [url] | url [url] |
title or pref_label [dict] | pref_label [dict] |
Only the url
needs to be provided for objects from reference data. The other related values (e.g. pref_label
, scheme
) are filled in from reference data.
Reference data JSON differences between V2 and V3
You can add most reference data using only the url field:
Access Rights#
Dataset access rights is now top level object in dataset. Following table shows differences in object structure.
V1-V2 | V3 field name |
---|---|
access_type/identifier [url] | access_type/url [url] |
license/identifier [url] | license/url [url] |
license/license [url] | license/custom_url [url] |
access_url [url] | Not used in V3 |
Access rights JSON differences between V2 and V3
{
"access_rights": {
"id": "ed1a5847-2db4-4a45-9505-916bd8e62407",
"license": [
{
"custom_url": "https://creativecommons.org/licenses/by/4.0/",
"id": "706e9fd7-4299-4c4a-bc0c-a4438b2a33b5",
"url": "http://uri.suomi.fi/codelist/fairdata/license/code/CC-BY-4.0",
"in_scheme": "http://uri.suomi.fi/codelist/fairdata/license",
"pref_label": {
"en": "Creative Commons Attribution 4.0 International (CC BY 4.0)",
"fi": "Creative Commons Nimeä 4.0 Kansainvälinen (CC BY 4.0)"
}
}
],
"access_type": {
"id": "36e31df0-04cd-4178-ad56-81fc5f74ac31",
"url": "http://uri.suomi.fi/codelist/fairdata/access_type/code/open",
"in_scheme": "http://uri.suomi.fi/codelist/fairdata/access_type",
"pref_label": {
"fi": "Avoin"
}
}
}
}
{
"access_rights": {
"license": [
{
"title": {
"fi": "Creative Commons Nimeä 4.0 Kansainvälinen (CC BY 4.0)"
},
"license": "https://creativecommons.org/licenses/by/4.0/",
"identifier": "http://uri.suomi.fi/codelist/fairdata/license/code/CC-BY-4.0"
}
],
"access_type": {
"in_scheme": "http://uri.suomi.fi/codelist/fairdata/access_type",
"identifier": "http://uri.suomi.fi/codelist/fairdata/access_type/code/open",
"pref_label": {
"fi": "Avoin"
}
}
}
}
Access rights object is composed of two main sub-objects:
- license
- access_type
Info
License object needs either url or custom_url filled. Url must be part of suomi.fi license collection. Custom url is reserved for licenses not part of the collection.
You don't have to submit the entire object again if you want to edit it. This is valid PATCH request body:
Example
PATCH /v3/datasets/{id}
Actors#
All actors are listed in actors field. Dataset related actors with roles such as creator, publisher, curator, rights_holder and contributor have been moved under actor object in roles field. Roles field is a list of roles eg. ["creator", "publisher"].
Instead of having a typed actor object like @type: "Person"
or @type: "Organization
, actors have person
and organization
fields. The fields for person
are:
V1-V2 person actor field | V3 actor field | Notes |
---|---|---|
@type [str] | Not used in V3 | replaced by person field |
email [str] | person.email [str] | |
identifier [str] | person.external_identifier [str] | |
member_of [object] | organization [object] | |
name [str] | person.name [str] | |
contributor_type [object] | Not used in V3 | |
contributor_role [object] | Not used in V3 | |
telephone [str] | Not used in V3 | |
Not in V2 | roles [list] | List of roles of an actor. eg. creator, publisher |
Organizations have some fields specific only to reference data organizations: url
and in_scheme
.
Only url
is writable for reference data organizations, other values are determined automatically.
The fields for organization
are:
V1-V2 organization actor field | V3 actor field | Notes |
---|---|---|
@type [str] | Not used in V3 | same as V2 organization, if person field is null. |
name [str] | organization.pref_label [str] | |
is_part_of [object] | organization.parent [object] | |
identifier [str] | organization.external_identifier [str] | |
identifier [str] | organization.url [url] (only reference data) | |
Not in V2 | organization.in_scheme [url] (only reference data) | |
email [str] | organization.email [str] | |
contributor_type [object] | Not used in V3 | |
telephone [str] | Not used in V3 | |
Not in V2 | roles [list] | List of roles of an actor. eg. creator, publisher |
Actors JSON differences between V2 and V3
{
"actors": [
{
"roles": [
"creator"
],
"organization": {
"url": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320",
"in_scheme": "http://uri.suomi.fi/codelist/fairdata/organization",
"pref_label": {
"en": "CSC – IT Center for Science",
"fi": "CSC - Tieteen tietotekniikan keskus Oy",
"sv": "CSC – IT Center for Science",
"und": "CSC - Tieteen tietotekniikan keskus Oy"
}
},
"person": {
"name": "John Doe"
}
},
{
"roles": [
"publisher"
],
"organization": {
"url": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320",
"in_scheme": "http://uri.suomi.fi/codelist/fairdata/organization",
"pref_label": {
"en": "CSC – IT Center for Science",
"fi": "CSC - Tieteen tietotekniikan keskus Oy",
"sv": "CSC – IT Center for Science",
"und": "CSC - Tieteen tietotekniikan keskus Oy"
}
},
"person": {
"name": "Jane Doe"
}
}
]
}
{
"creator": [
{
"name": "John Doe",
"@type": "Person",
"member_of": {
"name": {
"en": "CSC – IT Center for Science",
"fi": "CSC - Tieteen tietotekniikan keskus Oy",
"sv": "CSC – IT Center for Science",
"und": "CSC - Tieteen tietotekniikan keskus Oy"
},
"@type": "Organization",
"identifier": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320"
}
}
],
"publisher": {
"name": "Jane Doe",
"@type": "Person",
"member_of": {
"name": {
"en": "CSC – IT Center for Science",
"fi": "CSC - Tieteen tietotekniikan keskus Oy",
"sv": "CSC – IT Center for Science",
"und": "CSC - Tieteen tietotekniikan keskus Oy"
},
"@type": "Organization",
"identifier": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320"
}
}
}
Spatial coverage#
In Metax V2 as_wkt
was filled in from reference data if it was empty.
In V3, reference data geometry is in reference.as_wkt
string and user-provided
geometry in custom_wkt
list.
V1-V2 field | V3 field | Notes |
---|---|---|
alt [str] | altitude_in_meters [float] | |
as_wkt [list] | custom_wkt [list] | user defined wkt |
place_uri [object] | reference [object] | |
place_uri.identifier [object] | reference.url [object] | |
as_wkt [list] | reference.as_wkt [str] | reference defined wkt |
Temporal coverage#
Temporal coverage objects now use dates instead of datetime values.
V1-V2 field | V3 field |
---|---|
start_date [datetime] | start_date [date] |
end_date [datetime] | end_date [date] |
temporal_coverage [str] |
Entity relations#
V1-V2 field | V3 field |
---|---|
entity.identifier [dict] | entity.entity_identifier [str] |
Provenance#
Biggest change in provenance field is that it is its own object in database. Provenance fields are also their own objects and as such have their own id fields when created, as does provenance itself.
Provenance JSON differences between V2 and V3
{
"provenance": [
{
"title": {
"fi": "otsikko"
},
"description": {
"fi": "kuvaus"
},
"spatial": {
"full_address": "Annankatu 5",
"geographic_name": "Random test location",
"altitude_in_meters": 5,
"reference": {
"url": "http://example.fi/spatial/1234"
},
"as_wkt": "POINT(27.74209 64.23567)",
"custom_wkt": ["POINT(27.73833 64.23114)"]
},
"lifecycle_event": {
"url": "http://example.fi/lifecycle_event/code/planned"
},
"event_outcome": {
"url": "http://example.fi/event_outcome/code/success"
},
"outcome_description": {
"en": "Descriptive outcome"
},
"temporal": {
"end_date": "2023-06-20T00:00:00.000Z",
"start_date": "2023-06-19T00:00:00.000Z"
},
"is_associated_with": [
{
"organization": {
"url": "http://example.fi/url-1234"
},
"person": {
"name": "Olemassaolematon Henkilö"
}
}
]
}
]
}
{
"provenance": [
{
"title": {
"fi": "otsikko"
},
"description": {
"fi": "kuvaus"
},
"event_outcome": {
"identifier": "http://example.fi/event_outcome/code/success",
"in_scheme": "http://example.fi/event_outcome",
"pref_label": {
"en": "Success",
"fi": "Onnistunut",
"sv": "Framgångsrik",
"und": "Onnistunut"
}
},
"lifecycle_event": {
"identifier": "http://example.fi/lifecycle_event/code/planned",
"in_scheme": "http://example.fi/lifecycle_event",
"pref_label": {
"en": "Planned",
"fi": "Suunniteltu",
"und": "Planned"
}
},
"outcome_description": {
"en": "Descriptive outcome"
},
"spatial": {
"full_address": "Annankatu 5",
"geographic_name": "Random test location",
"altitude_in_meters": "5",
"place_uri": {
"identifier": "http://example.fi/spatial/1234"
},
"as_wkt": ["POINT(27.74209 64.23567)", "POINT(27.73833 64.23114)"]
},
"temporal": {
"end_date": "2023-06-20T00:00:00.000Z",
"start_date": "2023-06-19T00:00:00.000Z"
},
"was_associated_with": [
{
"@type": "Person",
"member_of": {
"@type": "Organization",
"identifier": "http://example.fi/url-1234",
"name": {
"en": "example organization",
"fi": "esimerkkiorganisaatio",
"sv": "exampelorganisation",
"und": "example organization"
}
},
"name": "Olemassaolematon Henkilö"
}
]
}
]
}
Remote resources#
Remote resources have gained support for title and description in multiple languages. Some other fields have been removed or simplified:
V1-V2 field | V3 field | Notes |
---|---|---|
access_url [object] | access_url [url] | |
checksum [object] | checksum [algorithm:value] | e.g. "sha256:f00f" |
description [dict] | description [dict] | |
download_url [object] | download_url [url] | |
title [dict] | title [dict] | |
identifier [str] | Not used in V3 | |
modified [date] | Not used in V3 | |
byte_size [int] | Not used in V3 | |
license [list] | Not used in V3 | |
resource_type [object] | Not used in V3 | |
has_object_characteristics [object] | Not used in V3 |
Query parameters#
V3 query parameters follow the hierarchy structure of the object schema. Consider the following V3 dataset:
Example
POST /v3/datasets
{
"data_catalog": "urn:data-catalog-example",
"title": {
"fi": "V3 Esimerkkiaineisto",
"en": "A V3 Test Dataset"
},
"persistent_identifier": "test-identifier:1234567890",
"actors": [
{
"roles": ["creator", "publisher"],
"person": {
"name": "Teppo Testaaja"
},
"organization": {
"url": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320"
}
}
],
"description": {
"fi": "V3 esimerkkiaineiston kuvaus.",
"en": "Description of the V3 test dataset."
},
"state": "published"
}
In this example, you could find this dataset using the query parameter title=A V3 Test Dataset
. Or, if you would like to find this dataset with person name, you would use query parameter actors__person__name=Teppo+Testaaja
, as the field actors
is a list of objects that include a person
object with the field name
.
Changes in query parameters#
These are the changed query parameters used in datasets listing (/v3/datasets
). Complete list of query parameters can be found in the Swagger documentation.
V1-V2 parameter name | V3 parameter name | Notes |
---|---|---|
actor_filter | actors__role | |
actor_filter | actors__person__name | |
actor_filter | actors__organization__pref_label | |
api_version | not used in V3 | |
contract_org_identifier | not used in V3 | |
curator | not used in V3 | See actor__filter changes. |
data_catalog | data_catalog__id | |
data_catalog | data_catalog__title | |
editor_permissions_user | only_owned_or_shared | All related query parameters not implemented yet |
fields | Not implemented yet | |
include_legacy | not used in V3 | |
include_user_metadata | not used in V3 | File and directory specific metadata is included in files and directories endpoint response. See Dataset-specific file metadata. |
latest | latest_versions | |
metadata_owner_org | metadata_owner__organization | |
metadata_provider_user | metadata_owner__user | |
N/A | access_rights__access_type__pref_label | |
N/A | actors__roles_creator | |
N/A | deprecated | |
N/A | expand_catalog | Include full data catalog in dataset response instead of just an identifier. |
N/A | field_of_science__pref_label | |
N/A | file_type | |
N/A | format | Format of response, json or api (HTML). |
N/A | has_files | |
N/A | include_nulls | Include also null values in the response. |
N/A | include_removed | Include also removed datasets in response. |
N/A | infrastructure__pref_label | |
N/A | keyword | |
N/A | pagination | Use pagination true/false. |
N/A | preservation__contract | |
N/A | projects__title | |
N/A | search | Free text search from PID, title, theme, actors, description, keywords, relation id's, and other identifiers. |
N/A | storage_services | Storage service(s) used for dataset files, separated by a comma. |
N/A | strict | Enable/disable errors on unknown query parameters/request values. |
N/A | title | |
owner_id | not used in V3 | |
pas_filter | not used in V3 | |
preferred_identifier | persistent_identifier | |
preservation_state | preservation__state | |
projects | csc_projects | |
research_dataset_fields | not used in V3 | Research dataset object not present in V3. |
user_created | metadata_owner__user |
Endpoints#
Note
You can check the supported http methods from the Swagger documentation, this table lists only the resource path changes.
V1-V2 endpoint | V3 endpoint |
---|---|
/datasets/identifiers |
|
/datasets/list |
/datasets |
/datasets/metadata_versions |
:no-entry: |
/datasets/unique_preferred_identifiers |
:no-entry: |
/datasets/{CRID}/editor_permissions/users |
RPC endpoints#
There are no longer separate /rpc/
endpoints.
They have been moved under /v3/
together with the former /rest/
style endpoints.
Info
Endpoints with flush functionality (hard delete) will accept flush parameter only in non-production environments.
V1-V2 endpoint | V3 endpoint |
---|---|
/rpc/datasets/get_minimal_dataset_template |
|
/rpc/datasets/set_preservation_identifier |
Not used in V3 |
/rpc/datasets/refresh_directory_content |
Not used in V3 |
/rpc/datasets/fix_deprecated |
Not used in V3 |
/rpc/datasets/flush_user_data |
DELETE /v3/users/<id> |
/rpc/files/delete_project |
DELETE /v3/files?csc_project={project} |
/rpc/files/flush_project |
DELETE /v3/files?csc_project={project}&flush=true |
/rpc/statistics/* |
Examples#
Finding reference data#
To find an entry for a reference data field, query parameter pref_label
can be used when accessing the reference data endpoint of the field.
In this example, "finnish" is searched from the languages-endpoint.
The result contains 4 different entries and the most suitable one can be included in a dataset using the url
of the entry.
See reference data API changes for a full list of reference data fields and their endpoints.
Finding Finnish language
GET /v3/reference-data/languages?pref_label=finnish
{
"count": 4,
"results": [
{
"url": "http://lexvo.org/id/iso639-3/fkv",
"in_scheme": "http://lexvo.org/id/",
"pref_label": {
"fi": "Ruijan murteet",
"en": "Kven Finnish",
"sv": "Kvänska"
}
},
{
"url": "http://lexvo.org/id/iso639-3/rmf",
"in_scheme": "http://lexvo.org/id/",
"pref_label": {
"en": "Kalo Finnish Romani",
"ru": "Финский кало",
"pms": "Lenga romani kalo finlandèisa"
}
},
{
"url": "http://lexvo.org/id/iso639-3/fse",
"in_scheme": "http://lexvo.org/id/",
"pref_label": {
"fi": "Suomalainen viittomakieli",
"en": "Finnish Sign Language"
}
},
{
"url": "http://lexvo.org/id/iso639-3/fin",
"in_scheme": "http://lexvo.org/id/",
"pref_label": {
"fi": "suomi",
"en": "Finnish",
"sv": "finska"
}
}
]
}
Creating a dataset#
This example compares the payloads of V3 and V2 dataset creation.
Creating a dataset JSON payload
POST /v3/datasets
{
"data_catalog": "urn:data-catalog-example",
"title": {
"fi": "V3 Esimerkkiaineisto",
"en": "A V3 Test Dataset"
},
"persistent_identifier": "test-identifier:1234567890",
"actors": [
{
"roles": ["creator", "publisher"],
"person": {
"name": "Teppo Testaaja"
},
"organization": {
"url": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320"
}
}
],
"theme": [
{
"url": "http://www.yso.fi/onto/koko/p36817"
}
],
"language": [
{
"url": "http://lexvo.org/id/iso639-3/fin"
}
],
"description": {
"fi": "V3 esimerkkiaineiston kuvaus.",
"en": "Description of the V3 test dataset."
},
"access_rights": {
"license": [
{
"url": "http://uri.suomi.fi/codelist/fairdata/license/code/CC-BY-4.0"
}
],
"access_type": {
"url": "http://uri.suomi.fi/codelist/fairdata/access_type/code/open"
},
"description": {
"fi": "Saatavuustietojen kuvaus.",
"en": "Description of the access rights."
}
},
"field_of_science": [
{
"url": "http://www.yso.fi/onto/okm-tieteenala/ta112"
}
],
"state": "published"
}
POST /rest/v2/datasets
{
"data_catalog": {
"identifier": "urn:data-catalog-test"
},
"research_dataset": {
"title": {
"fi": "V2 Esimerkkiaineisto",
"en": "A V2 Test Dataset"
},
"preferred_identifier": "test-identifier:1234567890",
"creator": [
{
"name": "Teppo Testaaja",
"@type": "Person",
"member_of": {
"@type": "Organization",
"identifier": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320"
}
}
],
"publisher": {
"name": "Teppo Testaaja",
"@type": "Person",
"member_of": {
"@type": "Organization",
"identifier": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320"
}
},
"theme": [
{
"identifier": "http://www.yso.fi/onto/koko/p36817"
}
],
"language": [
{
"identifier": "http://lexvo.org/id/iso639-3/fin"
}
],
"description": {
"fi": "V2 esimerkkiaineiston kuvaus.",
"en": "Description of the V2 test dataset."
},
"access_rights": {
"license": [
{
"identifier": "http://uri.suomi.fi/codelist/fairdata/license/code/CC-BY-4.0"
}
],
"access_type": {
"identifier": "http://uri.suomi.fi/codelist/fairdata/access_type/code/open"
}
},
"field_of_science": [
{
"identifier": "http://www.yso.fi/onto/okm-tieteenala/ta112"
}
]
},
"state": "published",
"metadata_provider_org": "csc.fi",
"metadata_provider_user": "teppo"
}
Modifying a dataset#
A dataset can be modified using either a PATCH or a PUT request. With a PATCH request, only the changed fields need to be included in the request body. Unchanged fields can be omitted. Fields in the target dataset will be replaced with the values found in the request. To clear a field, set it to null.
Note that the whole field will be replaced, so lists, for example, need to contain all values (changed and unchanged) that are to be found in the final result.
This is valid PATCH request for the above V3 dataset:
Example
PATCH /v3/datasets/{id}
GET /v3/datasets/{id}
{
"access_rights": {
"description": {
"en": "Description of the access rights.",
"fi": "Saatavuustietojen kuvaus."
},
"license": [
{
"url": "http://uri.suomi.fi/codelist/fairdata/license/code/CC-BY-4.0",
"in_scheme": "http://uri.suomi.fi/codelist/fairdata/license",
"pref_label": {
"en": "Creative Commons Attribution 4.0 International (CC BY 4.0)",
"fi": "Creative Commons Nimeä 4.0 Kansainvälinen (CC BY 4.0)"
}
}
],
"access_type": {
"url": "http://uri.suomi.fi/codelist/fairdata/access_type/code/open",
"in_scheme": "http://uri.suomi.fi/codelist/fairdata/access_type",
"pref_label": {
"en": "Open",
"fi": "Avoin"
}
},
"restriction_grounds": []
},
"actors": [
{
"roles": ["creator", "publisher"],
"person": {
"name": "Teppo Testaaja"
},
"organization": {
"pref_label": {
"en": "CSC – IT Center for Science",
"fi": "CSC - Tieteen tietotekniikan keskus Oy",
"sv": "CSC – IT Center for Science",
"und": "CSC - Tieteen tietotekniikan keskus Oy"
},
"url": "http://uri.suomi.fi/codelist/fairdata/organization/code/09206320",
"in_scheme": "http://uri.suomi.fi/codelist/fairdata/organization"
}
}
],
"cumulative_state": 0,
"data_catalog": "urn:data-catalog-example",
"description": {
"fi": "Uusi kuvaus",
"en": "New description"
},
"field_of_science": [
{
"url": "http://www.yso.fi/onto/okm-tieteenala/ta112",
"in_scheme": "http://www.yso.fi/onto/okm-tieteenala/conceptscheme",
"pref_label": {
"en": "Statistics and probability",
"fi": "Tilastotiede",
"sv": "Statistik"
}
}
],
"infrastructure": [],
"issued": "2024-04-03",
"keyword": [],
"language": [
{
"url": "http://lexvo.org/id/iso639-3/fin",
"in_scheme": "http://lexvo.org/id/",
"pref_label": {
"fi": "suomi",
"en": "Finnish",
"sv": "finska"
}
},
{
"url": "http://lexvo.org/id/iso639-3/eng",
"in_scheme": "http://lexvo.org/id/",
"pref_label": {
"fi": "englanti",
"en": "English",
"sv": "engelska"
}
}
],
"metadata_owner": {
"user": "teppo",
"organization": "csc.fi"
},
"other_identifiers": [],
"persistent_identifier": "test-identifier:12345678990",
"projects": [],
"provenance": [],
"relation": [],
"remote_resources": [],
"spatial": [],
"state": "published",
"temporal": [],
"theme": [
{
"url": "http://www.yso.fi/onto/koko/p36817",
"in_scheme": "http://www.yso.fi/onto/koko/",
"pref_label": {
"en": "testing",
"fi": "testaus",
"se": "testen",
"sv": "testning"
}
}
],
"title": {
"fi": "Uusi otsikko",
"en": "New title"
},
"created": "2024-04-03T14:13:57Z",
"modified": "2024-04-03T14:13:57Z",
"dataset_versions": [
{
"title": {
"en": "New title",
"fi": "Uusi otsikko"
},
"persistent_identifier": "test-identifier:12345678990",
"state": "published",
"created": "2024-04-03T14:13:57Z",
"version": 1
}
],
"published_revision": 2,
"draft_revision": 0,
"version": 1
}
A dataset can also be modified using a PUT request, that replaces the whole target dataset. When using a PUT request, both changed and unchanged fields need to be present in the request body.
Minimal valid dataset template#
{
"data_catalog": "urn:nbn:fi:att:data-catalog-harvested-test",
"persistent_identifier": "12345",
"title": {
"fi": "Esimerkkiaineisto",
"en": "Example Dataset"
},
"description": {
"fi": "Esimerkkiaineiston kuvaus.",
"en": "Description of the example dataset."
},
"access_rights": {
"access_type": {
"url": "http://uri.suomi.fi/codelist/fairdata/access_type/code/open"
},
"license": [
{
"url": "http://uri.suomi.fi/codelist/fairdata/license/code/CC-BY-4.0"
}
]
},
"actors": [
{
"roles": ["creator", "publisher"],
"person": {
"name": "Teppo Testaaja"
}
}
],
"state": "published"
}
{
"data_catalog": "urn:nbn:fi:att:data-catalog-test",
"generate_pid_on_publish": "URN",
"title": {
"fi": "Esimerkkiaineisto",
"en": "Example Dataset"
},
"description": {
"fi": "Esimerkkiaineiston kuvaus.",
"en": "Description of the example dataset."
},
"access_rights": {
"access_type": {
"url": "http://uri.suomi.fi/codelist/fairdata/access_type/code/open"
},
"license": [
{
"url": "http://uri.suomi.fi/codelist/fairdata/license/code/CC-BY-4.0"
}
]
},
"actors": [
{
"roles": ["creator", "publisher"],
"person": {
"name": "Teppo Testaaja"
}
}
],
"state": "published"
}
Reference data API changes#
Reference data has been moved from ElasticSearch to the Metax database.
Searching for an entry with a specific label can be done with the pref_label
query parameter e.g. ?pref_label=somelabel
.
V1-V2 endpoint | V3 endpoint |
---|---|
/es/reference_data/access_type/_search | /v3/reference-data/access-types |
/es/reference_data/contributor_role/_search | /v3/reference-data/contributor-roles |
/es/reference_data/contributor_type/_search | /v3/reference-data/contributor-types |
/es/reference_data/event_outcome/_search | /v3/reference-data/event-outcomes |
/es/reference_data/field_of_science/_search | /v3/reference-data/fields-of-science |
/es/reference_data/file_format_version/_search | /v3/reference-data/file-format-versions |
/es/reference_data/file_type/_search | /v3/reference-data/file-types |
/es/reference_data/funder_type/_search | /v3/reference-data/funder-types |
/es/reference_data/identifier_type/_search | /v3/reference-data/identifier-types |
/es/reference_data/keyword/_search | /v3/reference-data/themes |
/es/reference_data/language/_search | /v3/reference-data/languages |
/es/reference_data/license/_search | /v3/reference-data/licenses |
/es/reference_data/lifecycle_event/_search | /v3/reference-data/lifecycle-events |
/es/reference_data/location/_search | /v3/reference-data/locations |
/es/reference_data/mime_type/_search | |
/es/reference_data/preservation_event/_search | /v3/reference-data/preservation-events |
/es/reference_data/relation_type/_search | /v3/reference-data/relation-types |
/es/reference_data/research_infra/_search | /v3/reference-data/research-infras |
/es/reference_data/resource_type/_search | /v3/reference-data/resource-types |
/es/reference_data/restriction_grounds/_search | /v3/reference-data/restriction-grounds |
/es/reference_data/use_category/_search | /v3/reference-data/use-categories |
/es/organization_data/organization/_search | /v3/organizations |
Reference data field changes#
In earlier Metax versions reference data fields were name differently in ElasticSearch and as part of a dataset. In V3 the fields have the same names as part of a dataset and in the reference data endpoints.
V1-V2 Dataset | V1-V2 ElasticSearch | V3 field name |
---|---|---|
identifier [url] | uri [url] | url [url] |
in_scheme [url] | scheme [url] | in_scheme [url] |
pref_label [dict] | label [dict] | pref_label [dict] |
id [str] | ||
type [str] | ||
code [str] | ||
internal_code | ||
parent_ids | broader [obj] | |
child_ids [list] | children [list] | |
has_children [bool] | ||
same_as [list] | same_as [list] | |
name* [dict] | label [dict] | pref_label [dict] |
is_part_of* [obj] | parent_id [str] | parent [obj] |
as_wkt** [list] | wkt [str] | as_wkt [str] |
file_format*** | input_file_format | file_format |
format_version*** | output_format_version | format_version |
* Organization data.
** Location data.
*** File format version data.
Files API changes#
Files#
File identifier in external storage service has been renamed from identifier
to storage_identifier
.
The storage_identifier
value is only unique per storage service and the same value
may exist in multiple services.
Field names#
Field | V1/V2 | V3 |
---|---|---|
File id | id [int] | id [uuid] |
File id in external service | identifier [str] | storage_identifier [str] |
External service | file_storage [str] e.g. urn:nbn:fi:att:file-storage-ida |
storage_service [str] e.g. ida |
Project identifier | project_identifier [str] | csc_project [str] |
Modification date in external service | file_modified [datetime] | modified [datetime] |
Freeze date in external service | file_frozen [datetime] | frozen [datetime] |
File removal date from Metax | file_removed | removed [datetime] |
Deletion date | file_deleted [datetime] | :no-entry: |
Upload date in external service | file_uploaded [datetime] | :no-entry: |
File extension | file_format [str] | :no-entry: |
File characteristics | file_characteristics [object] | characteristics |
File characteristics file format | file_characteristics.file_format [object] | characteristics.file_format_version.file_format |
File characteristics format version | file_characteristics.format_version [object] | characteristics.file_format_version.format_version |
File characteristics extension | file_characteristics_extension [object] | characteristics_extension |
Parent directory | parent_directory [obj] | :no-entry: |
Full file path | file_path [str] | pathname [str] |
File name | file_name [str] | filename, determined from pathname [str] |
Open access | open_access [bool] | :no-entry: |
PAS compatible | pas_compatible [bool] | is_pas_compatible [bool] |
File size in bytes | byte_size [int] | size [int] |
Checksum algorithm | checksum_algorithm | checksum [algorithm:value], e.g. "sha256:f00f" |
Checksum value | checksum_value | merged with checksum_algorithm |
Checksum check date | checksum_checked | :no-entry: |
Directories#
Directories no longer exist as persistent database objects. They are instead generated dynamically
based on filtered file results when browsing the /v3/directories
endpoint.
When browsing directories and the query parameter dataset=<id>
is set, the directory file_count
and size
values correspond to total count and size of directory files belonging to the dataset.
When exclude_dataset=true
is also set, the returned counts are for directory
files not belonging to the dataset.
See Directory object fields for available directory fields.
File storages#
In V1/V2, a file storage is an object reprenting an external service where files are stored.
In V3, file storages represent a collection of files in an external service. For example,
each IDA project has its own file storage object, identified by
{"storage_service": "ida", "csc_project": <project> }
.
File storages are created automatically when files are added and are not exposed directly through the API.
See Storage services for supported storage services.
File endpoints changes#
In v3, automatic identifier type detection (internal id
or external storage_identifier
) in endpoint
paths has been removed. The <id>
in a V3 file endpoint path always refers to the internal id
.
To operate on an existing file using storage_identifier
instead of id
, bulk file endpoints can be used.
Bulk file operations now have their own endpoints:
put-many
, post-many
, patch-many
, delete-many
.
The bulk endpoints support omitting the Metax file id
if
the storage service and file identifier in the storage are specified:
{"storage_identifier": <external id>, "storage_service": <service>}
.
The put-many
endpoint will clear any existing file fields that are not specified in the request.
Directories no longer have an identifier, so the /rest/directories/<id>
endpoints
have been removed. To get details for a directory,
/v3/directories?storage_service=<service>&csc_project=<project>&path=<path>
contains the directory details for <path>
in the parent_directory
object.
Many of the parameters for /v3/files
and /v3/directories
have been renamed or have other changes.
For a full list of supported parameters, see the Swagger documentation.
Examples#
Here are some of the common files API requests and how they map to Metax V3:
Action | V1/V2 | V3 |
---|---|---|
List files | GET /rest/v1/files |
GET /v3/files |
List removed files | GET /rest/v1/files?removed=true |
GET /v3/files?include_removed=true (includes non-removed files) |
Get file (using Metax id) | GET /rest/v1/files/<id> |
GET /v3/files/<id> |
Get removed file (using Metax id) | GET /rest/v1/files/<id>?removed=true (includes non-removed) |
GET /v3/files/<id>?include_removed=true (includes non-removed files) |
Get file (using external id) | GET /rest/v1/files/<id> |
GET /v3/files?file_storage=*&storage_identifier=<id>&pagination=false returns list |
Create file | POST /rest/v1/files |
POST /v3/files |
Create files (array) | POST /rest/v1/files |
POST /v3/files/post-many |
Update files (array) | PATCH /rest/v1/files |
POST /v3/files/put-many |
Update or create files (array) | n/a | POST /v3/files/patch-many |
Delete files | DELETE /rest/v1/files (array of ids) |
POST /v3/files/delete-many (array of file objects) |
Restore files (array) | POST /rest/v1/files/restore |
not implemented yet |
File datasets (using Metax id) | POST /rest/v1/files/datasets |
POST /v3/files/datasets |
File datasets (using external id) | POST /rest/v1/files/datasets |
POST /v3/files/datasets?storage_service=<service> |
List directory contents by path | GET /rest/v1/directories/files?csc_project=<project>&path=<path> |
GET /v3/directories?storage_service=<service>&csc_project=<project>&path=<path> |
Dataset files#
In Metax V3 datasets provide a summary of contained files in the fileset
object:
Example
Updating dataset files is performed by specifying directory_actions
or file_actions
in fileset
object when
updating dataset.
See Datasets API for details.
There are endpoints for browsing files either as a flat list or as a directory tree:
- To get list of dataset files, use
GET /v3/datasets/<id>/files
. - To browse dataset directory tree, use
GET /v3/datasets/<id>/directories
.
The dataset file and directory endpoints support the same parameters as corresponding
/v3/files
and /v3/directories
endpoints and use pagination by default.
This is a change from Metax V2 which does not support pagination in the dataset files endpoint.
Dataset-specific file metadata#
Dataset-specific directory and file metadata used to be
under directories
and files
objects in the dataset.
In Metax V3 the metadata is included in dataset_metadata
objects when browsing
fileset associated with a dataset:
- viewing
/v3/datasets/<id>/files
- viewing
/v3/datasets/<id>/directories
- viewing
/v3/files
withdataset=<id>
- viewing
/v3/directories
withdataset=<id>
Dataset-specific directory metadata is only visible when browsing directories.
-
- ↩
-
Is solved in authorization implementation ↩
-
Is solved in versioning implementation. Django-simple-versioning is used as implementation base. ↩
-
Is solved in the PublishingChannels implementation ↩
-
PAS will have its own data-catalog in V3 ↩
-
django-model-utils third-party library SoftDeletableModel provides is_removed field, it can be customized, but it is unclear how much to just use removed timestamp without the bool field. ↩