1 - INTRODUCTION
dsarch is the primary tool for archiving geoscience data onto CISL’s Geoscience Data Exchange (GDEX) Servers. It records metadata for every dataset — file names, data periods, file counts, and sizes — in the Geoscience Data Exchange Database (GDEXDB). Datasets can be placed under DOI/Version control by assigning a Digital Object Identifier (DOI) that links all web data files to a specific version for citation and long-term reference. Files can be organized into nested sub-products called groups. Any metadata stored in GDEXDB can also be retrieved or updated through dsarch.
A few terms used throughout this document:
GDEX |
Geoscience Data Exchange, NCAR’s data publication platform |
GDEXDB |
the metadata database backing GDEX |
Specialist |
an authorized owner who manages a dataset |
dNNNNNN |
a dataset number; six digits prefixed with ‘d’ (e.g., d260001) |
Group |
a named sub-product within a dataset; groups can be nested |
Saved file |
an archived file kept for internal use or NCAR users |
Web file |
an archived file published on the GDEX Web Server |
Help file |
documentation or software file (Document or Software) |
Quasar |
the Globus Quasar Server used for long-term tape backup |
Before using dsarch, register the dataset in GDEXDB through Metadata Manager (https://gdex.ucar.edu/metaman/). Once registered, dsarch can perform the following major functions:
Set up a dataset and its groups/subgroups in GDEXDB to prepare it for data archiving
Add, update, and terminate DOI/Version controls for a given dataset
Archive data files by copying them from the working area to GDEX Servers
Call gatherxml to evaluate file content metadata for archived files
Cross-archive data files by duplicating them between bOreas and GDEX Data Disk Servers
Save file information for archived data into GDEXDB
Retrieve dataset, group, and file information stored in GDEXDB
Move data files from one dataset or group to another, or to a different location within the same dataset
Tar file lists from one or more datasets and back them up onto the Globus Quasar Server, with an optional disaster recovery copy
Restore damaged data files from the Quasar backup server
Remove files from GDEX Servers
Automatically resume interrupted dsarch commands using records in GDEXDB
Regenerate filelist and main webpages on demand after dataset changes
The dataset flag -UD (-UseDSARCH) must be set to at least ‘Y’ before dsarch can archive files or write to GDEXDB. This flag also controls file list publication: setting it to ‘P’ or ‘W’ publishes file lists for GDEX Server data files, making them visible in the dataset’s user interface on the GDEX Web Server. See the -SD (-SetDataset) action and the -UD (-UseDSARCH) option for details.
When a file is moved and its name or path changes, dsarch records the new name as the primary entry in GDEXDB and retains the original as a linked alias. Both names resolve to the same file for usage tracking. Always use dsarch — not system move commands — to move data files, so that GDEXDB stays accurate.
Use the companion utility ‘dsquasar’ to back up archived data files to the Globus Quasar Server. ‘dsquasar’ finds files not yet backed up, assembles input file lists (targeting 1-3 GB per list), and calls dsarch to package and upload them as tar files to the Quasar server.
dsarch includes safeguards to prevent accidental operations on the wrong dataset. Input files must be named starting with the dataset number in the format ‘dNNNNNN.*’, where ‘*’ matches one or more valid filename characters. dsarch also verifies that the specialist running it is an authorized owner of the dataset; if not, execution stops — unless Mode option -MD (-MyDataset) is supplied, which overrides the ownership check.
If an archive action (-AW, -AS, -AH, -AQ) fails due to a storage system outage, dsarch automatically retries it once the system recovers. Retry behavior is enabled only when option -BP (-d) is used to submit the action as a background batch process; the retry state is tracked in GDEXDB.
Once added, a DOI/Version control record cannot be removed, but it can be terminated when a dataset is retired and its data are no longer current. Even after termination, the file list can be reconstructed from the retained DOI records. Files under DOI/Version control are locked — they cannot be moved or deleted — though new files may still be added to operational datasets. Archived data files are also immutable unless new data are appended to them.
Only Web data files are placed under DOI/Version control. Saved data staged on disk for internal use or NCAR user access is not version- controlled and remains the responsibility of the dataset owner.
This document first covers general dsarch usage, then describes each Action option in detail, followed by Mode and Information (Info for short) options. Examples are provided throughout each section.