comparison env/lib/python3.9/site-packages/cwltool/schemas/v1.2.0-dev4/salad/schema_salad/metaschema/salad.md @ 0:4f3585e2f14b draft default tip

"planemo upload commit 60cee0fc7c0cda8592644e1aad72851dec82c959"
author shellac
date Mon, 22 Mar 2021 18:12:50 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4f3585e2f14b
1 # Semantic Annotations for Linked Avro Data (SALAD)
2
3 Author:
4
5 * Peter Amstutz <peter.amstutz@curii.com>, Arvados Project, Curii Corporation
6
7 Contributors:
8
9 * The developers of Apache Avro
10 * The developers of JSON-LD
11 * Nebojša Tijanić <nebojsa.tijanic@sbgenomics.com>, Seven Bridges Genomics
12
13 # Abstract
14
15 Salad is a schema language for describing structured linked data documents
16 in JSON or YAML documents. A Salad schema provides rules for
17 preprocessing, structural validation, and link checking for documents
18 described by a Salad schema. Salad builds on JSON-LD and the Apache Avro
19 data serialization system, and extends Avro with features for rich data
20 modeling such as inheritance, template specialization, object identifiers,
21 and object references. Salad was developed to provide a bridge between the
22 record oriented data modeling supported by Apache Avro and the Semantic
23 Web.
24
25 # Status of This Document
26
27 This document is the product of the [Common Workflow Language working
28 group](https://groups.google.com/forum/#!forum/common-workflow-language). The
29 latest version of this document is available in the "schema_salad" repository at
30
31 https://github.com/common-workflow-language/schema_salad
32
33 The products of the CWL working group (including this document) are made available
34 under the terms of the Apache License, version 2.0.
35
36 <!--ToC-->
37
38 # Introduction
39
40 The JSON data model is an extremely popular way to represent structured
41 data. It is attractive because of its relative simplicity and is a
42 natural fit with the standard types of many programming languages.
43 However, this simplicity means that basic JSON lacks expressive features
44 useful for working with complex data structures and document formats, such
45 as schemas, object references, and namespaces.
46
47 JSON-LD is a W3C standard providing a way to describe how to interpret a
48 JSON document as Linked Data by means of a "context". JSON-LD provides a
49 powerful solution for representing object references and namespaces in JSON
50 based on standard web URIs, but is not itself a schema language. Without a
51 schema providing a well defined structure, it is difficult to process an
52 arbitrary JSON-LD document as idiomatic JSON because there are many ways to
53 express the same data that are logically equivalent but structurally
54 distinct.
55
56 Several schema languages exist for describing and validating JSON data,
57 such as the Apache Avro data serialization system, however none understand
58 linked data. As a result, to fully take advantage of JSON-LD to build the
59 next generation of linked data applications, one must maintain separate
60 JSON schema, JSON-LD context, RDF schema, and human documentation, despite
61 significant overlap of content and obvious need for these documents to stay
62 synchronized.
63
64 Schema Salad is designed to address this gap. It provides a schema
65 language and processing rules for describing structured JSON content
66 permitting URI resolution and strict document validation. The schema
67 language supports linked data through annotations that describe the linked
68 data interpretation of the content, enables generation of JSON-LD context
69 and RDF schema, and production of RDF triples by applying the JSON-LD
70 context. The schema language also provides for robust support of inline
71 documentation.
72
73 ## Introduction to v1.1
74
75 This is the third version of of the Schema Salad specification. It is
76 developed concurrently with v1.1 of the Common Workflow Language for use in
77 specifying the Common Workflow Language, however Schema Salad is intended to be
78 useful to a broader audience. Compared to the v1.0 schema salad
79 specification, the following changes have been made:
80
81 * Support for `default` values on record fields to specify default values
82 * Add subscoped fields (fields which introduce a new inner scope for identifiers)
83 * Add the *inVocab* flag (default true) to indicate if a type is added to the vocabulary of well known terms or must be prefixed
84 * Add *secondaryFilesDSL* micro DSL (domain specific language) to convert text strings to a secondaryFiles record type used in CWL
85 * The `$mixin` feature has been removed from the specification, as it
86 is poorly documented, not included in conformance testing,
87 and not widely supported.
88
89 ## References to Other Specifications
90
91 **Javascript Object Notation (JSON)**: http://json.org
92
93 **JSON Linked Data (JSON-LD)**: http://json-ld.org
94
95 **YAML**: https://yaml.org/spec/1.2/spec.html
96
97 **Avro**: https://avro.apache.org/docs/current/spec.html
98
99 **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986)
100
101 **Resource Description Framework (RDF)**: http://www.w3.org/RDF/
102
103 **UTF-8**: https://www.ietf.org/rfc/rfc2279.txt)
104
105 ## Scope
106
107 This document describes the syntax, data model, algorithms, and schema
108 language for working with Salad documents. It is not intended to document
109 a specific implementation of Salad, however it may serve as a reference for
110 the behavior of conforming implementations.
111
112 ## Terminology
113
114 The terminology used to describe Salad documents is defined in the Concepts
115 section of the specification. The terms defined in the following list are
116 used in building those definitions and in describing the actions of an
117 Salad implementation:
118
119 **may**: Conforming Salad documents and Salad implementations are permitted but
120 not required to be interpreted as described.
121
122 **must**: Conforming Salad documents and Salad implementations are required
123 to be interpreted as described; otherwise they are in error.
124
125 **error**: A violation of the rules of this specification; results are
126 undefined. Conforming implementations may detect and report an error and may
127 recover from it.
128
129 **fatal error**: A violation of the rules of this specification; results
130 are undefined. Conforming implementations must not continue to process the
131 document and may report an error.
132
133 **at user option**: Conforming software may or must (depending on the modal verb in
134 the sentence) behave as described; if it does, it must provide users a means to
135 enable or disable the behavior described.
136
137 # Document model
138
139 ## Data concepts
140
141 An **object** is a data structure equivalent to the "object" type in JSON,
142 consisting of a unordered set of name/value pairs (referred to here as
143 **fields**) and where the name is a string and the value is a string, number,
144 boolean, array, or object.
145
146 A **document** is a file containing a serialized object, or an array of
147 objects.
148
149 A **document type** is a class of files that share a common structure and
150 semantics.
151
152 A **document schema** is a formal description of the grammar of a document type.
153
154 A **base URI** is a context-dependent URI used to resolve relative references.
155
156 An **identifier** is a URI that designates a single document or single
157 object within a document.
158
159 A **vocabulary** is the set of symbolic field names and enumerated symbols defined
160 by a document schema, where each term maps to absolute URI.
161
162 ## Syntax
163
164 Conforming Salad v1.1 documents are serialized and loaded using a
165 subset of YAML 1.2 syntax and UTF-8 text encoding. Salad documents
166 are written using the [JSON-compatible subset of YAML described in
167 section 10.2](https://yaml.org/spec/1.2/spec.html#id2803231). The
168 following features of YAML must not be used in conforming Salad
169 documents:
170
171 * Use of explicit node tags with leading `!` or `!!`
172 * Use of anchors with leading `&` and aliases with leading `*`
173 * %YAML directives
174 * %TAG directives
175
176 It is a fatal error if the document is not valid YAML.
177
178 A Salad document must consist only of either a single root object or an
179 array of objects.
180
181 ## Document context
182
183 ### Implied context
184
185 The implicit context consists of the vocabulary defined by the schema and
186 the base URI. By default, the base URI must be the URI that was used to
187 load the document. It may be overridden by an explicit context.
188
189 ### Explicit context
190
191 If a document consists of a root object, this object may contain the
192 fields `$base`, `$namespaces`, `$schemas`, and `$graph`:
193
194 * `$base`: Must be a string. Set the base URI for the document used to
195 resolve relative references.
196
197 * `$namespaces`: Must be an object with strings as values. The keys of
198 the object are namespace prefixes used in the document; the values of
199 the object are the prefix expansions.
200
201 * `$schemas`: Must be an array of strings. This field may list URI
202 references to documents in RDF-XML format which will be queried for RDF
203 schema data. The subjects and predicates described by the RDF schema
204 may provide additional semantic context for the document, and may be
205 used for validation of prefixed extension fields found in the document.
206
207 Other directives beginning with `$` must be ignored.
208
209 ## Document graph
210
211 If a document consists of a single root object, this object may contain the
212 field `$graph`. This field must be an array of objects. If present, this
213 field holds the primary content of the document. A document that consists
214 of array of objects at the root is an implicit graph.
215
216 ## Document metadata
217
218 If a document consists of a single root object, metadata about the
219 document, such as authorship, may be declared in the root object.
220
221 ## Document schema
222
223 Document preprocessing, link validation and schema validation require a
224 document schema. A schema may consist of:
225
226 * At least one record definition object which defines valid fields that
227 make up a record type. Record field definitions include the valid types
228 that may be assigned to each field and annotations to indicate fields
229 that represent identifiers and links, described below in "Semantic
230 Annotations".
231
232 * Any number of enumerated type objects which define a set of finite set of symbols that are
233 valid value of the type.
234
235 * Any number of documentation objects which allow in-line documentation of the schema.
236
237 The schema for defining a salad schema (the metaschema) is described in
238 detail in the [Schema](#Schema) section.
239
240 ## Record field annotations
241
242 In a document schema, record field definitions may include the field
243 `jsonldPredicate`, which may be either a string or object. Implementations
244 must use the following document preprocessing of fields by the following
245 rules:
246
247 * If the value of `jsonldPredicate` is `@id`, the field is an identifier
248 field.
249
250 * If the value of `jsonldPredicate` is an object, and contains that
251 object contains the field `_type` with the value `@id`, the field is a
252 link field subject to [link validation](#Link_validation).
253
254 * If the value of `jsonldPredicate` is an object which contains the
255 field `_type` with the value `@vocab`, the field value is subject to
256 [vocabulary resolution](#Vocabulary_resolution).
257
258 ## Document traversal
259
260 To perform document document preprocessing, link validation and schema
261 validation, the document must be traversed starting from the fields or
262 array items of the root object or array and recursively visiting each child
263 item which contains an object or arrays.
264
265 ## Short names
266
267 The "short name" of an fully qualified identifier is the portion of
268 the identifier following the final slash `/` of either the fragment
269 identifier following `#` or the path portion, if there is no fragment.
270 Some examples:
271
272 * the short name of `http://example.com/foo` is `foo`
273 * the short name of `http://example.com/#bar` is `bar`
274 * the short name of `http://example.com/foo/bar` is `bar`
275 * the short name of `http://example.com/foo#bar` is `bar`
276 * the short name of `http://example.com/#foo/bar` is `bar`
277 * the short name of `http://example.com/foo#bar/baz` is `baz`
278
279 ## Inheritance and specialization
280
281 A record definition may inherit from one or more record definitions
282 with the `extends` field. This copies the fields defined in the
283 parent record(s) as the base for the new record. A record definition
284 may `specialize` type declarations of the fields inherited from the
285 base record. For each field inherited from the base record, any
286 instance of the type in `specializeFrom` is replaced with the type in
287 `specializeTo`. The type in `specializeTo` should extend from the
288 type in `specializeFrom`.
289
290 A record definition may be `abstract`. This means the record
291 definition is not used for validation on its own, but may be extended
292 by other definitions. If an abstract type appears in a field
293 definition, it is logically replaced with a union of all concrete
294 subtypes of the abstract type. In other words, the field value does
295 not validate as the abstract type, but must validate as some concrete
296 type that inherits from the abstract type.
297
298 # Document preprocessing
299
300 After processing the explicit context (if any), document preprocessing
301 begins. Starting from the document root, object fields values or array
302 items which contain objects or arrays are recursively traversed
303 depth-first. For each visited object, field names, identifier fields, link
304 fields, vocabulary fields, and `$import` and `$include` directives must be
305 processed as described in this section. The order of traversal of child
306 nodes within a parent node is undefined.