comparison env/lib/python3.9/site-packages/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/salad.md @ 0:4f3585e2f14b draft default tip

"planemo upload commit 60cee0fc7c0cda8592644e1aad72851dec82c959"
author shellac
date Mon, 22 Mar 2021 18:12:50 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4f3585e2f14b
1 # Semantic Annotations for Linked Avro Data (SALAD)
2
3 Author:
4
5 * Peter Amstutz <peter.amstutz@curoverse.com>, Curoverse
6
7 Contributors:
8
9 * The developers of Apache Avro
10 * The developers of JSON-LD
11 * Nebojša Tijanić <nebojsa.tijanic@sbgenomics.com>, Seven Bridges Genomics
12
13 # Abstract
14
15 Salad is a schema language for describing structured linked data documents
16 in JSON or YAML documents. A Salad schema provides rules for
17 preprocessing, structural validation, and link checking for documents
18 described by a Salad schema. Salad builds on JSON-LD and the Apache Avro
19 data serialization system, and extends Avro with features for rich data
20 modeling such as inheritance, template specialization, object identifiers,
21 and object references. Salad was developed to provide a bridge between the
22 record oriented data modeling supported by Apache Avro and the Semantic
23 Web.
24
25 # Status of This Document
26
27 This document is the product of the [Common Workflow Language working
28 group](https://groups.google.com/forum/#!forum/common-workflow-language). The
29 latest version of this document is available in the "schema_salad" repository at
30
31 https://github.com/common-workflow-language/schema_salad
32
33 The products of the CWL working group (including this document) are made available
34 under the terms of the Apache License, version 2.0.
35
36 <!--ToC-->
37
38 # Introduction
39
40 The JSON data model is an extremely popular way to represent structured
41 data. It is attractive because of its relative simplicity and is a
42 natural fit with the standard types of many programming languages.
43 However, this simplicity means that basic JSON lacks expressive features
44 useful for working with complex data structures and document formats, such
45 as schemas, object references, and namespaces.
46
47 JSON-LD is a W3C standard providing a way to describe how to interpret a
48 JSON document as Linked Data by means of a "context". JSON-LD provides a
49 powerful solution for representing object references and namespaces in JSON
50 based on standard web URIs, but is not itself a schema language. Without a
51 schema providing a well defined structure, it is difficult to process an
52 arbitrary JSON-LD document as idiomatic JSON because there are many ways to
53 express the same data that are logically equivalent but structurally
54 distinct.
55
56 Several schema languages exist for describing and validating JSON data,
57 such as the Apache Avro data serialization system, however none understand
58 linked data. As a result, to fully take advantage of JSON-LD to build the
59 next generation of linked data applications, one must maintain separate
60 JSON schema, JSON-LD context, RDF schema, and human documentation, despite
61 significant overlap of content and obvious need for these documents to stay
62 synchronized.
63
64 Schema Salad is designed to address this gap. It provides a schema
65 language and processing rules for describing structured JSON content
66 permitting URI resolution and strict document validation. The schema
67 language supports linked data through annotations that describe the linked
68 data interpretation of the content, enables generation of JSON-LD context
69 and RDF schema, and production of RDF triples by applying the JSON-LD
70 context. The schema language also provides for robust support of inline
71 documentation.
72
73 ## Introduction to v1.0
74
75 This is the second version of of the Schema Salad specification. It is
76 developed concurrently with v1.0 of the Common Workflow Language for use in
77 specifying the Common Workflow Language, however Schema Salad is intended to be
78 useful to a broader audience. Compared to the draft-1 schema salad
79 specification, the following changes have been made:
80
81 * Use of [mapSubject and mapPredicate](#Identifier_maps) to transform maps to lists of records.
82 * Resolution of the [domain Specific Language for types](#Domain_Specific_Language_for_types)
83 * Consolidation of the formal [schema into section 5](#Schema).
84
85 ## References to Other Specifications
86
87 **Javascript Object Notation (JSON)**: http://json.org
88
89 **JSON Linked Data (JSON-LD)**: http://json-ld.org
90
91 **YAML**: http://yaml.org
92
93 **Avro**: https://avro.apache.org/docs/current/spec.html
94
95 **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986)
96
97 **Resource Description Framework (RDF)**: http://www.w3.org/RDF/
98
99 **UTF-8**: https://www.ietf.org/rfc/rfc2279.txt)
100
101 ## Scope
102
103 This document describes the syntax, data model, algorithms, and schema
104 language for working with Salad documents. It is not intended to document
105 a specific implementation of Salad, however it may serve as a reference for
106 the behavior of conforming implementations.
107
108 ## Terminology
109
110 The terminology used to describe Salad documents is defined in the Concepts
111 section of the specification. The terms defined in the following list are
112 used in building those definitions and in describing the actions of an
113 Salad implementation:
114
115 **may**: Conforming Salad documents and Salad implementations are permitted but
116 not required to be interpreted as described.
117
118 **must**: Conforming Salad documents and Salad implementations are required
119 to be interpreted as described; otherwise they are in error.
120
121 **error**: A violation of the rules of this specification; results are
122 undefined. Conforming implementations may detect and report an error and may
123 recover from it.
124
125 **fatal error**: A violation of the rules of this specification; results
126 are undefined. Conforming implementations must not continue to process the
127 document and may report an error.
128
129 **at user option**: Conforming software may or must (depending on the modal verb in
130 the sentence) behave as described; if it does, it must provide users a means to
131 enable or disable the behavior described.
132
133 # Document model
134
135 ## Data concepts
136
137 An **object** is a data structure equivalent to the "object" type in JSON,
138 consisting of a unordered set of name/value pairs (referred to here as
139 **fields**) and where the name is a string and the value is a string, number,
140 boolean, array, or object.
141
142 A **document** is a file containing a serialized object, or an array of
143 objects.
144
145 A **document type** is a class of files that share a common structure and
146 semantics.
147
148 A **document schema** is a formal description of the grammar of a document type.
149
150 A **base URI** is a context-dependent URI used to resolve relative references.
151
152 An **identifier** is a URI that designates a single document or single
153 object within a document.
154
155 A **vocabulary** is the set of symbolic field names and enumerated symbols defined
156 by a document schema, where each term maps to absolute URI.
157
158 ## Syntax
159
160 Conforming Salad documents are serialized and loaded using YAML syntax and
161 UTF-8 text encoding. Salad documents are written using the JSON-compatible
162 subset of YAML. Features of YAML such as headers and type tags that are
163 not found in the standard JSON data model must not be used in conforming
164 Salad documents. It is a fatal error if the document is not valid YAML.
165
166 A Salad document must consist only of either a single root object or an
167 array of objects.
168
169 ## Document context
170
171 ### Implied context
172
173 The implicit context consists of the vocabulary defined by the schema and
174 the base URI. By default, the base URI must be the URI that was used to
175 load the document. It may be overridden by an explicit context.
176
177 ### Explicit context
178
179 If a document consists of a root object, this object may contain the
180 fields `$base`, `$namespaces`, `$schemas`, and `$graph`:
181
182 * `$base`: Must be a string. Set the base URI for the document used to
183 resolve relative references.
184
185 * `$namespaces`: Must be an object with strings as values. The keys of
186 the object are namespace prefixes used in the document; the values of
187 the object are the prefix expansions.
188
189 * `$schemas`: Must be an array of strings. This field may list URI
190 references to documents in RDF-XML format which will be queried for RDF
191 schema data. The subjects and predicates described by the RDF schema
192 may provide additional semantic context for the document, and may be
193 used for validation of prefixed extension fields found in the document.
194
195 Other directives beginning with `$` must be ignored.
196
197 ## Document graph
198
199 If a document consists of a single root object, this object may contain the
200 field `$graph`. This field must be an array of objects. If present, this
201 field holds the primary content of the document. A document that consists
202 of array of objects at the root is an implicit graph.
203
204 ## Document metadata
205
206 If a document consists of a single root object, metadata about the
207 document, such as authorship, may be declared in the root object.
208
209 ## Document schema
210
211 Document preprocessing, link validation and schema validation require a
212 document schema. A schema may consist of:
213
214 * At least one record definition object which defines valid fields that
215 make up a record type. Record field definitions include the valid types
216 that may be assigned to each field and annotations to indicate fields
217 that represent identifiers and links, described below in "Semantic
218 Annotations".
219
220 * Any number of enumerated type objects which define a set of finite set of symbols that are
221 valid value of the type.
222
223 * Any number of documentation objects which allow in-line documentation of the schema.
224
225 The schema for defining a salad schema (the metaschema) is described in
226 detail in "Schema validation".
227
228 ### Record field annotations
229
230 In a document schema, record field definitions may include the field
231 `jsonldPredicate`, which may be either a string or object. Implementations
232 must use the following document preprocessing of fields by the following
233 rules:
234
235 * If the value of `jsonldPredicate` is `@id`, the field is an identifier
236 field.
237
238 * If the value of `jsonldPredicate` is an object, and contains that
239 object contains the field `_type` with the value `@id`, the field is a
240 link field.
241
242 * If the value of `jsonldPredicate` is an object, and contains that
243 object contains the field `_type` with the value `@vocab`, the field is a
244 vocabulary field, which is a subtype of link field.
245
246 ## Document traversal
247
248 To perform document document preprocessing, link validation and schema
249 validation, the document must be traversed starting from the fields or
250 array items of the root object or array and recursively visiting each child
251 item which contains an object or arrays.
252
253 # Document preprocessing
254
255 After processing the explicit context (if any), document preprocessing
256 begins. Starting from the document root, object fields values or array
257 items which contain objects or arrays are recursively traversed
258 depth-first. For each visited object, field names, identifier fields, link
259 fields, vocabulary fields, and `$import` and `$include` directives must be
260 processed as described in this section. The order of traversal of child
261 nodes within a parent node is undefined.