Files API#
The files API /v3/files supports creating file metadata objects that can then be associated with datasets. Write operations to the API are generally restricted to specific services. For example, freezing files in Fairdata IDA creates new Metax file metadata entries for the frozen files.
For end users, browsing files in Metax and associating them to a dataset requires extra permissions, like belonging to the IDA project the file is in.
For more information about IDA and how to become a user see https://www.fairdata.fi/en/ida/.
Concepts#
Files#
A Metax file object represents a file stored in a service. Files may be associated with multiple datasets, and files can have additional dataset-specific metadata.
Directories#
A directory is a collection of files and subdirectories. Directories are determined dynamically from file paths when using the directory browsing API. A directory path may be associated with dataset-specific metadata.
Storage services and file storages#
Each file is associated with a storage service such as IDA,
defined with storage_service field.
Storage services may have additional parameters that are used
for organizing files. For example, each file in IDA belongs
to a project as specified by the csc_project field.
A set of storage_service and related parameters define
a file storage in Metax. A single dataset may only have
files from a single file storage.
Below is a list of currently supported services.
| Service | storage_service value | csc_project required |
|---|---|---|
| Fairdata IDA | ida | yes |
| Fairdata PAS | pas | yes |
Browsing files in Metax#
Authenticated users can access files with the /v3/files endpoint. There is also a separate read-only endpoint /v3/directories that allows browsing files of a file storage in the format of a directory hierarchy.
For example, to browse frozen IDA files:
GET /v3/files?storage_service=ida&csc_project=<project>List of all files in IDA project with pagination.GET /v3/files?storage_service=ida&csc_project=<project>&pagination=falseList all files in IDA project without pagination. Not recommended for large projects.GET /v3/files?file_storage=ida&storage_identifier=<id>&pagination=falseReturns IDA file with specifiedstorage_identifierin a list.GET /v3/directories?storage_service=ida&csc_project=<project>View root directory contents of an IDA project.GET /v3/directories?storage_service=ida&csc_project=<project>&path=/dir/subdir/View contents of/dir/subdir/of an IDA project.
For examples on browsing dataset files or directories, see Datasets API.
Directory API response format#
The responses from the /v3/directories endpoint include
parent_directoryobject containing current directorydirectorieslist of subdirectoriesfileslist of files
When pagination is enabled, the data is in the results object in the response.
This differs from the usual pagination where results is a list.
Pagination counts subdirectories and files together with directories first.
Pagination is enabled by default.
The file_count and size values for a directory include all files
in a directory, including subdirectories.
Example directory response
This is an example of what the response for GET /v3/directories?storage_service=ida&csc_project=project&path=/data/ might look like.{
"count": 2,
"next": null,
"previous": null,
"results": {
"directory": {
"storage_service": "ida",
"csc_project": "project",
"name": "data",
"pathname": "/data/",
"file_count": 5,
"size": 1024,
"created": "2022-11-12T12:34:00+02:00",
"modified": "2022-11-13T14:34:00+02:00",
"parent_url": "https://m3.fd-dev.csc.fi:8100/v3/directories?storage_service=ida&csc_project=project&path=/"
},
"directories": [
{
"storage_service": "ida",
"csc_project": "project",
"name": "subdirectory",
"pathname": "/data/subdirectory/",
"file_count": 4,
"size": 0,
"created": "2022-11-13T14:34:00+02:00",
"modified": "2022-11-13T14:34:00+02:00",
"url": "https://m3.fd-dev.csc.fi:8100/v3/directories?storage_service=ida&project=project&path=/data/subdirectory/"
}
],
"files": [
{
"id": "e8524528-bfef-4731-8314-c5fe10ba3487",
"storage_identifier": "file1-id",
"pathname": "/data/file1.txt",
"filename": "file1.txt",
"size": 1024,
"storage_service": "ida",
"csc_project": "project",
"checksum": "md5:bd0f1dff407071e5db8eb57dde4847a3",
"frozen": "2022-11-12T13:20:00+02:00",
"modified": "2022-11-12T12:34:00+02:00",
"user": "fd_test_fairdata_user",
"pas_process_running": "false"
}
]
}
}
Creating and updating files#
To create a single file, send a POST /v3/files request with a JSON file payload.
Example file payload
To update a file, call PATCH /v3/files/<id> where the payload includes the
field values you want to change. Use null to remove a field value.
Bulk file creation and updating#
It may be more convenient to operate on multiple files in a single request. There are bulk endpoints that accept an array of file objects:
/v3/files/put-manyCreate new files or replace existing files./v3/files/post-manyCreate new files. Error if file already exists./v3/files/patch-manyPartially update existing files. Error if file does not exist./v3/files/delete-manyDeletes existing files. Error if file does not exist.
The bulk endpoints support omitting the Metax file id if
the storage service and file identifier in the storage are specified:
{"storage_identifier": <external id>, "storage_service": <service>}.
When replacing an existing file, put-many will attempt to clear
any existing file fields that are not specified in the request.
The response of the bulk file operations is in the following shape:
{
"success": [
{
"object": <created/updated file object>,
"action": <action "insert", "update" or "delete">
},
...
]
"failed": [
{
"object": <failed input object>,
"errors": <object describing errors>
}
]
}
By default, any errors in a creating or updating a file cause the entire request to fail with a 400
status code, with the failed objects listed in the failed array. When the query parameter ignore_errors
is enabled, the response code will be
200when everything was successful207when there were both successful and failed operations400when all operations failed.
Note
The 207 Multi-Status code is defined in the WebDAV protocol and is not
part of the standard HTTP codes. Metax does not implement WebDAV and
only uses the code to indicate that a request was partially successful.
Files API fields#
File object fields#
Bolded fields are required when creating a file.
| Field | key | value | read only |
|---|---|---|---|
| Metax identifier | id | uuid | x |
| Storage service | storage_service | str | |
| CSC project identifier | csc_project* | str | |
| File identifier in external service | storage_identifier* | str | |
| File path | pathname | str, e.g. /data/file.txt | |
| File name (determined from path) | filename | str, e.g. file.txt | x |
| Freeze date in external service | frozen | datetime | |
| When file was removed from service | removed | datetime (null if not removed) | x |
| Modification date in external service | modified | datetime | |
| File size in bytes | size | int | |
| Checksum | checksum | str, e.g. md5:ffa123f... | |
| Is PAS compatible | is_pas_compatible | bool or null | |
| Dataset-specific metadata | dataset_metadata** | object | x |
| User | user | str | |
| Is PAS process running | pas_process_running*** | bool |
* Required depending on storage service.
** Only available when viewing files of a dataset.
*** Only PAS user can update this field. Default: False.
Directory object fields#
| Field | key | value |
|---|---|---|
| Earliest file modification date | created | datetime |
| Most recent modification date of file | modified | datetime |
| Storage service | storage_service | str |
| CSC project identifier | csc_project | str |
| Directory name | name | str, e.g. subdir |
| Directory path | pathname | str ending with /, e.g. /data/subdir/ |
| Dataset-specific metadata | dataset_metadata* | object |
| Total file count incl. subdirectories | file_count | int |
| Number of files in some public dataset | published_file_count** | int |
| Total byte size incl. subdirectories | size | int |
* Only available when viewing directories of a dataset.
** Only available when using count_published query parameter.