Dataset Metadata Model
A Dataset is a collection of data gathered by a project using a single sampling protocol (data collection method). Projects may have 1 or more datasets. In the context of PPSR Core, datasets represent the observations collected by the community of contributors.
The Dataset Metadata Mmodel (DMM) is a metadata model that describes a collection of observations. Dataset level metadata provides context for a collection of observational records and expresses information associated with and common to all records within a dataset. The dataset metadata enables datasets to be discovered and accessed by a range of factors which assist users, especially 3rd party users, in making informed decisions about the suitability of a dataset for their particular usage requirements. It helps researchers understand a group of observations:
- Title and description of the dataset
- Graphical elements associated with the dataset
- Method/survey protocol used
- Temporal range of the dataset
- Licence and ownership of the dataset
- Quality assurance methods applied to the dataset (pre, during and post recording)
- Data access methods
- Constraints and biases affecting the usage of the data
- Data management plan
Information Sharing Example
People need to know what datasets exist where about which topics. They need to be able to efficiently search for all available datasets about a topic, discover all available relevant datasets, access them, and then be able to discern whether they can use the datasets for their decision making or research needs. For example, imagine a policy maker in Australia hoping to find all data available on invasive species to make decisions about how to prioritize available funding to combat the worst invasions. Imagine the delight when this policy maker finds out they can search for available datasets (perhaps on a search engine such as the citizen science cloud, for example) and find datasets generated by both active citizen science projects currently addressing invasive species hosted by the Atlas of Living Australia Biocollect platform as well as historic citizen science projects conducted in Australia by Earthwatch volunteers that had also mapped invasive species populations several years previously.
Entity Relationship Diagram
[current approved version: 2020.0]
The Dataset ERD describes the relationships between class entities in the Dataset Metadata Model. Each dataset contains a set of Core Attributes which represent the core terms associated with a project. The Extension Attributes are optional terms associated with a project.
;Core Attributes
[current approved version: 2020.0]
Core attributes are the main fields associated with a dataset. The table below lists all Core Attributes; their field name & a description of how it is used.
Many of the core terms are mandatory. Every dataset instance is required to have an entry in this field.
Entity | Attribute or Entity Name | Description | Data or Entity Type | Obligation | Multiplicity | Synonym term in other standards |
---|---|---|---|---|---|---|
activity | An activity is analagous to a survey and comprises 2 components: a metadata schema; and an observational data model (ie. the data schema into which observational records are created). The data schema definition will represent a specific data collection protocol. In the context of an activity/survey, these exist as a singular pair of objects. Usage of an activity is always made in the context of an event, ie. A non-persistent time-based usage of an observational data schema. Observational data schemas are domain and protocol specific, and may be published in other repositories. | Class | Optional | 0:n | dcmitype:Event | |
activity | activityId | A globally unique identifier for an activity. | text | Mandatory | 1:1 | |
activity | datasetMetadataSchema | The datasetMetadataSchema (DMM) describes the metadata pertaining to the specific observationalDataSchema selected and it's associated data. There is a 1:1 relationship between the datasetMetadataSchema and the observationalDataSchema. The datasetMetadataSchema is consistent for all classes of observationalDataDomains. This is a class object. | Class | Mandatory | 1:1 | |
datasetMetadataSchema | dmmCoreTerms | The set of core terms which comprise the PPSR-Core Dataset Metadata Model (DMM). These are the minimum set of attributes required to adequately describe a dataset and enable exchange of dataset metadata between data catalogues. | Class | Optional | 1:1 | dcat:CatalogRecord |
dmmCoreTerms | dcterms:identifier | Persistent identifier of a dataset (associated with the project). Should equate to the datasetExternalId if data is stored in an external repository. | text | Mandatory | 1:1 | dwcterms:datasetID cosi:hasIdentifierValue |
dmmCoreTerms | dcterms:dateSubmitted | The date a dataset submission was published into a receiving system. Uses the ISO 8601:2004 (E) dateTime standard | date time | Mandatory | 1:1 | prov:generatedAtTime CI_Citation date |
dmmCoreTerms | dcterms:modified | The most recent dateTime at which the resource was changed. Uses the ISO 8601:2004 (E) dateTime standard", | date time | Mandatory | 1:1 | prov:generatedAtTime |
dmmCoreTerms | datasetStatus | Indicator of the current status of a dataset (e.g. if it already published) | vocabulary | Mandatory | 1:1 | cosi:hasStatus |
dmmCoreTerms | dcterms:title | The name of the dataset for discovery and citation purposes. | text | Mandatory | 1:1 | dwcterms:datasetName CI_Citation.title cosi:hasTitle |
dmmCoreTerms | dcterms:abstract | Abstract or description of the dataset. | text | Mandatory | 1:1 | cosi:hasDescription |
dmmCoreTerms | dcterms:accessRights | Category of rights to use IP contained in the dataset or a type of use applied to the dataset. | vocabulary | Mandatory | 1:1 | dct:rights |
dmmCoreTerms | dcterms:bibliographicCitation | Format to be structured as follows: 'Author/Rightsholder. (Year). Title of data set (Version number) [Description of form]. Retrieved from https://<website url>'. The attribution text string to be cited by people who use the dataset. | text | Optional | 0:1 | |
dmmCoreTerms | dcterms:rightsHolder | The name of the organisation which is the legal custodian of the dataset. | text | Mandatory | 1:1 | prov:wasAttributedTo |
dmmCoreTerms | dcterms:license | License applied to the dataset. | vocabulary | Mandatory | 1:1 | cosi:hasLicenceInformation |
dmmCoreTerms | dcterms:language | The machine language the dataset and associated metadata is encoded in. Uses Unicode Standard UTF-8 (ISO/IEC 10646:2014 plus Amendment 1). | text | Optional | 0:n | MD_DataIdentification.characterSet |
dmmCoreTerms | datesetStartDate | The date on which the dataset collection survey commences. This may reflect the earliest record in the dataset or when a survey is open to begin data recording. This date may be => the projectStartDate. Uses the ISO 8601:2004 (E) dateTime standard. | date time | Mandatory | 1:1 | |
dmmCoreTerms | datasetEndDate | The date on which the dataset collection survey concluded. Uses the ISO 8601:2004 (E) dateTime standard. | date time | Optional | 0:1 | |
dmmCoreTerms | methodType | The type of methodology or sampling protocol used to collect the dataset. | vocabulary | Mandatory | 1:1 | |
dmmCoreTerms | dataAccessMethod | A list of available methods for people to access the dataset. | vocabulary | Mandatory | 1:n | |
datasetMetadataSchema | methodSpecification | Details of the methodology or sampling protocol used to collect the dataset. | Class | Mandatory | 1:1 | cosi:hasRelatedMaterial |
methodSpecification | samplingProtocolDomain | The name of the methodology or sampling protocol used to collect the dataset. | vocabulary | Optional | 0:1 | |
methodSpecification | samplingProtocolMethod | The sampling protocol method used for a given survey. | vocabulary | Optional | 0:1 | dcterms:samplingProtocol dwcterms:samplingProtocol cosi:hasProcedure |
methodSpecification | methodAbstract | Description of the methodology or sampling protocol used to collect the dataset. | text | Optional | 0:1 | |
methodSpecification | methodUrl | URL address of an officially published article which describes the methodology or sampling protocol used to collect the dataset. | url | Optional | 0:1 | |
methodSpecification | methodDocUrl | URL link to an uploaded document artefact which describes the methodology or sampling protocol used to collect the dataset. | url | Optional | 0:1 | |
datasetMetadataSchema | observationalDataModel | The observationalDataDomain contains an array of different domain schemas (eg. biodiversity, water, atmosphere, ecology, geology, geomorphology, astronomy, etc.). Each domain will contain an array of standard protocols which apply in that domain context. The domains listed are not a comprehensive list and are expected to be appended to over time as new domains are specified and appropriate samplingProtocol standards are defined for them. This class object serves only to structurally differentiate and describe the different domains and is not a structural element of the observationalDataModel (ODM). | Class | Mandatory | 1:1 | dcat:Dataset |
Extension attributes
[current approved version: 2020.0]
Extension attributes are the fields whose inclusion is not mandatory for all systems that are compliant with PPSR Core. The table below lists all Extension Attributes; their field name, a description of how it is used. Every system is encouraged to include these fields to ensure greater interoperability between systems.
Entity | Attribute or Entity Name | Description | Data or Entity Type | Obligation | Multiplicity | Synonym term in other standards |
---|---|---|---|---|---|---|
Project | projectId | Globally unique identifier (GUID) for the project. System generated. | Text | Optional | 0:1 | dcterms:identifier cosi:hasIdentifier |
datasetMetadataSchema | dmmExtensionTerms | The set of extension terms which comprise the PPSR-Core Dataset Metadata Model (DMM). These terms enhance the description of a dataset and improve the ability of users of the dataset to understand or interpret fitness for use. | Class | Optional | 0:1 | dcat:CatalogRecord |
dmmExtensionTerms | datasetUpdateFrequency | How often the dataset is updated. | vocabulary | Optional | 0:1 | dcterms:accrualPeriodicity |
dmmExtensionTerms | datasetExternalUrl | Web location where the dataset will be published. | text | Optional | 0:n | |
dmmExtensionTerms | dcat:downloadURL | A URL from which dataset observation records can be accessed and downloaded. | url | Optional | 0:1 | |
dmmExtensionTerms | datasetGeographicCoverage | Geographic/spatial scope of coverage of the collection sites of data records within the dataset. Uses OGC GeoAPI (09-083r3) standard. | geoObject | Optional | 0:n | |
dmmExtensionTerms | cosi:hasHypothesis | The experimental hypothesis underpinning the experimental design for which the dataset was collected. | text | Optional | 0:1 | |
dmmExtensionTerms | cosi:hasInstrument | Details of instrumentation used in the data recording. | text | Optional | 0:n | |
dmmExtensionTerms | dataQualityAssuranceMethod | Description of the types of data quality assurance methods that were applied in capturing, curating and managing the dataset. | vocabulary | Optional | 0:n | |
dmmExtensionTerms | dataQualityAssuranceDescription | Detailed description of the methods used to quality assure the dataset both during capture and post processing. This is important for data users to understand the processes applied to the data to verify or enhance it's quality for use. | text | Optional | 0:1 | |
dmmExtensionTerms | usageGuide | Description of any constraints and biases in the dataset which are associated with how the data collection methodology was applied, eg. Concentration of data points along access networks, targeted/non-random approaches causing bias towards certain factors at the expense of other factors, etc. | text | Optional | 0:1 | |
dmmExtensionTerms | activityCount | Number of data recording events in the dataset | integer | Optional | 0:1 | |
dmmExtensionTerms | datasetAssociatedMedia | Image(s) and/or other media used to graphically enhance or represent the dataset. This is a class object. | Class | Optional | 0:n | |
datasetAssociatedMedia | datasetAssociatedMediaType | The category of media type representing the type of dataset media item chosen | vocabulary | Mandatory | 1:1 | foaf:img |
datasetAssociatedMedia | datasetAssociatedMediaFile | Media file upload representing the type of dataset media chosen | mediaFile | Mandatory | 1:1 | foaf:img |
datasetAssociatedMedia | datasetAssociatedMediaCredit | Attribution credit for the logo image or other media | text | Mandatory | 1:1 | dcterms:bibliographicCitation |
dmmExtensionTerms | dataAccuracyDeclarations | Generalised categories that best reflect the accuracy of records in the dataset. | Class | Optional | 0:4 | |
dataAccuracyDeclarations | spatialAccuracy | A generalised category that best reflects the least spatially accurate record in the dataset. | vocabulary | Optional | 0:1 | |
dataAccuracyDeclarations | temporalAccuracy | A generalised category that best reflects the least accurate record in the dataset in respect to date of the observation. | vocabulary | Optional | 0:1 | |
dataAccuracyDeclarations | speciesIdentificationAccuracy | A generalised category that best reflects the least accurate record in the dataset for species identification. Choose 'Not applicable' species fields are not included in the dataset. | vocabulary | Optional | 0:1 | |
dataAccuracyDeclarations | nonTaxonomicAccuracy | A generalised category that best reflects the least accurate record in the dataset in respect to non-biodiversity attributes. | vocabulary | Optional | 0:1 | |
dmmExtensionTerms | dataManagementPlan | Details of a data management plan associated with the dataset. | Class | Optional | 0:1 | cosi:hasRelatedMaterial |
dataManagementPlan | isDataManagementPolicyDocumented | Indicator of whether a data management plan has been prepared for the dataset. | boolean | Mandatory | 1:1 | |
dataManagementPlan | dataManagementPolicyDescription | Description of data management policy | text | Optional | 0:1 | |
dataManagementPlan | dataManagementPolicyURL | Link to data management policy description | url | Optional | 0:1 | |
dataManagementPlan | dataManagementPolicyDocument | Document describing data management policy | url | Optional | 0:1 | |
dataManagementPlan | dataManagementPrinciplesConformance | Assessment of the conformance of the data management principles applied to the dataset with standard GEOlabels. | text | Optional | 0:1 |
Vocabulary
[current approved version: 2020.0]
The Vocabulary for Dataset defines enumerations for attributes above. These are controlled lists of defined terms. These terms may be used either as provided in full or as a reduced subset relevant to the purpose for which they are being used. They should not be modified or augmented with additional terms as this would prevent shareability and effective aggregation.
Provisional
The vocabulary is part of the published standard. Be warned that this vocabulary is subject to larger changes than the core terms & attributes. Reaching consensus with the larger scientific community is important to us. If you are interested in help with this work, please see the contribute page.
Entity | Attribute Name | Vocabulary terms | Comments |
---|---|---|---|
dmmCoreTerms | dcterms:accessRights | Open access Embargoed access Restricted access Pending public release Metadata only access | Need to validate these terms and adjust as necessary. Must ensure mutual exclusivity and comprehensive coverage. |
dmmCoreTerms | dcterms:license | Creative Commons zero (CC 0) Creative Commons Attribution (4.0) international (CC-BY 4.0) Creative Commons Attribution Non-commercial (CC-BY-NC) | |
dmmCoreTerms | datasetStatus | Active - unpublished - unverified Active - unpublished - partially verified Active - unpublished - fully verified Active - published - unverified Active - published - partially verified Active - published - fully verified Complete - unpublished - unverified Complete - unpublished - partially verified Complete - unpublished - fully verified Complete - published - unverified Complete - published - partially verified Complete - published - fully verified Archived - unpublished - unverified Archived - unpublished - partially verified Archived - unpublished - fully verified Archived - published - unverified Archived - published - partially verified Archived - published - fully verified | Need to validate these terms and adjust as necessary. Must ensure mutual exclusivity and comprehensive coverage. |
dmmCoreTerms | methodType | Opportunistic/ad-hoc observation Systematic method-based survey | |
samplingProtocolDomain | samplingProtocolMethodEcology | Air quality - Fixed sensor Air quality - Mobile sensor Bat survey - Echolocation recorder Bat survey - Harp trapping Beach profile survey - Emery method Beach profile survey - Optical method Bird survey - Distance sample (along transect) Bird survey - Fixed-area Bird survey - Fixed-time Bird survey - Fixed-time & Fixed-area Bird survey - Mist netting Fauna survey - 2-Ha track plot method Fauna survey - Active search Fauna survey - Aerial distance sampler method Fauna survey - Cage trapping Fauna survey - Call playback Fauna survey - Camera trapping Fauna survey - Elliot trapping Fauna survey - Funnel trapping Fauna survey - Hair tubes Fauna survey - Nest box monitoring Fauna survey - Pitfall trapping Fauna survey - Scat survey Fauna survey - Spotlight search Fauna survey - Strip transect aerial survey Fauna survey - Turtle trapping Fish survey - Electrofishing Fish survey - Set net/trap Fish survey - Sweep netting Insect survey - Black light Insect survey - Malaise trap Insect survey - Baited trap Insect survey - Glue trap Insect survey - Sweep netting Riparian condition assessment - Rapid Appraisal of Riparian Condition (RARC) Vegetation condition assessment Vegetation survey - General transect & plot Vegetation survey - Intensive inventory Vegetation survey - Step point method Water quality - Standardised physical/chemical attribute measurements Water quality - Macroinvertebrate survey | Possible additional sampling protocols may include: samplingProtocolWater samplingProtocolMarine samplingProtocolLimnology samplingProtocolClimate samplingProtocolAtmosphere samplingProtocolSoils samplingProtocolGeology samplingProtocolChemistry samplingProtocolPhysics etc. Methods should be unique within a vocabulary, but may occur in more than one vocabulary. Domain-based protocols vocabularies may already exist for other domains, but they have not been identified as part of this current activity. |
dmmCoreTerms | dataAccessMethod | Open access structured raw data download from this system Open access opaque raw data file attached in this system Limited structured raw data access in this system - via request (subject to embargo) Opaque raw data file attached in this system - via request Open access structured raw data download from external source Closed access structured raw data download from external source Application Programming Interface (API) Raw data not available Only derived/interpreted data products available | |
dmmExtensionTerms | datasetUpdateFrequency | Triennial Biennial Annual Semi-annual Three times a year Quarterly Bi-monthly Monthly Semi-monthly Bi-weekly Three times a month Weekly Semi-weekly Three times a week Daily Continuous Irregular No further updates | Need to validate these terms and adjust as necessary. Must ensure mutual exclusivity and comprehensive coverage. |
dmmExtensionTerms | dataQualityAssuranceMethod | Data owner curated Subject matter expert record verification Crowd-sourced record verification Record annotation System supported data attribute configuration No DQ methods used Not applicable | Need to validate these terms and adjust as necessary. Must ensure mutual exclusivity and comprehensive coverage. |
dataAccuracyDeclarations | spatialAccuracy | High Medium Low | |
dataAccuracyDeclarations | temporalAccuracy | High Medium Low | |
dataAccuracyDeclarations | speciesIdentificationAccuracy | High Medium Low | |
dataAccuracyDeclarations | nonTaxonomicAccuracy | High Medium Low | |
datasetAssociatedMedia | datasetAssociatedMediaType | Image file Image URL Audio file Audio URL Video file Video URL |