Mercurial > repos > shellac > sam_consensus_v3
comparison env/lib/python3.9/site-packages/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/salad.md @ 0:4f3585e2f14b draft default tip
"planemo upload commit 60cee0fc7c0cda8592644e1aad72851dec82c959"
author | shellac |
---|---|
date | Mon, 22 Mar 2021 18:12:50 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4f3585e2f14b |
---|---|
1 # Semantic Annotations for Linked Avro Data (SALAD) | |
2 | |
3 Author: | |
4 | |
5 * Peter Amstutz <peter.amstutz@curoverse.com>, Curoverse | |
6 | |
7 Contributors: | |
8 | |
9 * The developers of Apache Avro | |
10 * The developers of JSON-LD | |
11 * Nebojša Tijanić <nebojsa.tijanic@sbgenomics.com>, Seven Bridges Genomics | |
12 | |
13 # Abstract | |
14 | |
15 Salad is a schema language for describing structured linked data documents | |
16 in JSON or YAML documents. A Salad schema provides rules for | |
17 preprocessing, structural validation, and link checking for documents | |
18 described by a Salad schema. Salad builds on JSON-LD and the Apache Avro | |
19 data serialization system, and extends Avro with features for rich data | |
20 modeling such as inheritance, template specialization, object identifiers, | |
21 and object references. Salad was developed to provide a bridge between the | |
22 record oriented data modeling supported by Apache Avro and the Semantic | |
23 Web. | |
24 | |
25 # Status of This Document | |
26 | |
27 This document is the product of the [Common Workflow Language working | |
28 group](https://groups.google.com/forum/#!forum/common-workflow-language). The | |
29 latest version of this document is available in the "schema_salad" repository at | |
30 | |
31 https://github.com/common-workflow-language/schema_salad | |
32 | |
33 The products of the CWL working group (including this document) are made available | |
34 under the terms of the Apache License, version 2.0. | |
35 | |
36 <!--ToC--> | |
37 | |
38 # Introduction | |
39 | |
40 The JSON data model is an extremely popular way to represent structured | |
41 data. It is attractive because of its relative simplicity and is a | |
42 natural fit with the standard types of many programming languages. | |
43 However, this simplicity means that basic JSON lacks expressive features | |
44 useful for working with complex data structures and document formats, such | |
45 as schemas, object references, and namespaces. | |
46 | |
47 JSON-LD is a W3C standard providing a way to describe how to interpret a | |
48 JSON document as Linked Data by means of a "context". JSON-LD provides a | |
49 powerful solution for representing object references and namespaces in JSON | |
50 based on standard web URIs, but is not itself a schema language. Without a | |
51 schema providing a well defined structure, it is difficult to process an | |
52 arbitrary JSON-LD document as idiomatic JSON because there are many ways to | |
53 express the same data that are logically equivalent but structurally | |
54 distinct. | |
55 | |
56 Several schema languages exist for describing and validating JSON data, | |
57 such as the Apache Avro data serialization system, however none understand | |
58 linked data. As a result, to fully take advantage of JSON-LD to build the | |
59 next generation of linked data applications, one must maintain separate | |
60 JSON schema, JSON-LD context, RDF schema, and human documentation, despite | |
61 significant overlap of content and obvious need for these documents to stay | |
62 synchronized. | |
63 | |
64 Schema Salad is designed to address this gap. It provides a schema | |
65 language and processing rules for describing structured JSON content | |
66 permitting URI resolution and strict document validation. The schema | |
67 language supports linked data through annotations that describe the linked | |
68 data interpretation of the content, enables generation of JSON-LD context | |
69 and RDF schema, and production of RDF triples by applying the JSON-LD | |
70 context. The schema language also provides for robust support of inline | |
71 documentation. | |
72 | |
73 ## Introduction to v1.0 | |
74 | |
75 This is the second version of of the Schema Salad specification. It is | |
76 developed concurrently with v1.0 of the Common Workflow Language for use in | |
77 specifying the Common Workflow Language, however Schema Salad is intended to be | |
78 useful to a broader audience. Compared to the draft-1 schema salad | |
79 specification, the following changes have been made: | |
80 | |
81 * Use of [mapSubject and mapPredicate](#Identifier_maps) to transform maps to lists of records. | |
82 * Resolution of the [domain Specific Language for types](#Domain_Specific_Language_for_types) | |
83 * Consolidation of the formal [schema into section 5](#Schema). | |
84 | |
85 ## References to Other Specifications | |
86 | |
87 **Javascript Object Notation (JSON)**: http://json.org | |
88 | |
89 **JSON Linked Data (JSON-LD)**: http://json-ld.org | |
90 | |
91 **YAML**: http://yaml.org | |
92 | |
93 **Avro**: https://avro.apache.org/docs/current/spec.html | |
94 | |
95 **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) | |
96 | |
97 **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ | |
98 | |
99 **UTF-8**: https://www.ietf.org/rfc/rfc2279.txt) | |
100 | |
101 ## Scope | |
102 | |
103 This document describes the syntax, data model, algorithms, and schema | |
104 language for working with Salad documents. It is not intended to document | |
105 a specific implementation of Salad, however it may serve as a reference for | |
106 the behavior of conforming implementations. | |
107 | |
108 ## Terminology | |
109 | |
110 The terminology used to describe Salad documents is defined in the Concepts | |
111 section of the specification. The terms defined in the following list are | |
112 used in building those definitions and in describing the actions of an | |
113 Salad implementation: | |
114 | |
115 **may**: Conforming Salad documents and Salad implementations are permitted but | |
116 not required to be interpreted as described. | |
117 | |
118 **must**: Conforming Salad documents and Salad implementations are required | |
119 to be interpreted as described; otherwise they are in error. | |
120 | |
121 **error**: A violation of the rules of this specification; results are | |
122 undefined. Conforming implementations may detect and report an error and may | |
123 recover from it. | |
124 | |
125 **fatal error**: A violation of the rules of this specification; results | |
126 are undefined. Conforming implementations must not continue to process the | |
127 document and may report an error. | |
128 | |
129 **at user option**: Conforming software may or must (depending on the modal verb in | |
130 the sentence) behave as described; if it does, it must provide users a means to | |
131 enable or disable the behavior described. | |
132 | |
133 # Document model | |
134 | |
135 ## Data concepts | |
136 | |
137 An **object** is a data structure equivalent to the "object" type in JSON, | |
138 consisting of a unordered set of name/value pairs (referred to here as | |
139 **fields**) and where the name is a string and the value is a string, number, | |
140 boolean, array, or object. | |
141 | |
142 A **document** is a file containing a serialized object, or an array of | |
143 objects. | |
144 | |
145 A **document type** is a class of files that share a common structure and | |
146 semantics. | |
147 | |
148 A **document schema** is a formal description of the grammar of a document type. | |
149 | |
150 A **base URI** is a context-dependent URI used to resolve relative references. | |
151 | |
152 An **identifier** is a URI that designates a single document or single | |
153 object within a document. | |
154 | |
155 A **vocabulary** is the set of symbolic field names and enumerated symbols defined | |
156 by a document schema, where each term maps to absolute URI. | |
157 | |
158 ## Syntax | |
159 | |
160 Conforming Salad documents are serialized and loaded using YAML syntax and | |
161 UTF-8 text encoding. Salad documents are written using the JSON-compatible | |
162 subset of YAML. Features of YAML such as headers and type tags that are | |
163 not found in the standard JSON data model must not be used in conforming | |
164 Salad documents. It is a fatal error if the document is not valid YAML. | |
165 | |
166 A Salad document must consist only of either a single root object or an | |
167 array of objects. | |
168 | |
169 ## Document context | |
170 | |
171 ### Implied context | |
172 | |
173 The implicit context consists of the vocabulary defined by the schema and | |
174 the base URI. By default, the base URI must be the URI that was used to | |
175 load the document. It may be overridden by an explicit context. | |
176 | |
177 ### Explicit context | |
178 | |
179 If a document consists of a root object, this object may contain the | |
180 fields `$base`, `$namespaces`, `$schemas`, and `$graph`: | |
181 | |
182 * `$base`: Must be a string. Set the base URI for the document used to | |
183 resolve relative references. | |
184 | |
185 * `$namespaces`: Must be an object with strings as values. The keys of | |
186 the object are namespace prefixes used in the document; the values of | |
187 the object are the prefix expansions. | |
188 | |
189 * `$schemas`: Must be an array of strings. This field may list URI | |
190 references to documents in RDF-XML format which will be queried for RDF | |
191 schema data. The subjects and predicates described by the RDF schema | |
192 may provide additional semantic context for the document, and may be | |
193 used for validation of prefixed extension fields found in the document. | |
194 | |
195 Other directives beginning with `$` must be ignored. | |
196 | |
197 ## Document graph | |
198 | |
199 If a document consists of a single root object, this object may contain the | |
200 field `$graph`. This field must be an array of objects. If present, this | |
201 field holds the primary content of the document. A document that consists | |
202 of array of objects at the root is an implicit graph. | |
203 | |
204 ## Document metadata | |
205 | |
206 If a document consists of a single root object, metadata about the | |
207 document, such as authorship, may be declared in the root object. | |
208 | |
209 ## Document schema | |
210 | |
211 Document preprocessing, link validation and schema validation require a | |
212 document schema. A schema may consist of: | |
213 | |
214 * At least one record definition object which defines valid fields that | |
215 make up a record type. Record field definitions include the valid types | |
216 that may be assigned to each field and annotations to indicate fields | |
217 that represent identifiers and links, described below in "Semantic | |
218 Annotations". | |
219 | |
220 * Any number of enumerated type objects which define a set of finite set of symbols that are | |
221 valid value of the type. | |
222 | |
223 * Any number of documentation objects which allow in-line documentation of the schema. | |
224 | |
225 The schema for defining a salad schema (the metaschema) is described in | |
226 detail in "Schema validation". | |
227 | |
228 ### Record field annotations | |
229 | |
230 In a document schema, record field definitions may include the field | |
231 `jsonldPredicate`, which may be either a string or object. Implementations | |
232 must use the following document preprocessing of fields by the following | |
233 rules: | |
234 | |
235 * If the value of `jsonldPredicate` is `@id`, the field is an identifier | |
236 field. | |
237 | |
238 * If the value of `jsonldPredicate` is an object, and contains that | |
239 object contains the field `_type` with the value `@id`, the field is a | |
240 link field. | |
241 | |
242 * If the value of `jsonldPredicate` is an object, and contains that | |
243 object contains the field `_type` with the value `@vocab`, the field is a | |
244 vocabulary field, which is a subtype of link field. | |
245 | |
246 ## Document traversal | |
247 | |
248 To perform document document preprocessing, link validation and schema | |
249 validation, the document must be traversed starting from the fields or | |
250 array items of the root object or array and recursively visiting each child | |
251 item which contains an object or arrays. | |
252 | |
253 # Document preprocessing | |
254 | |
255 After processing the explicit context (if any), document preprocessing | |
256 begins. Starting from the document root, object fields values or array | |
257 items which contain objects or arrays are recursively traversed | |
258 depth-first. For each visited object, field names, identifier fields, link | |
259 fields, vocabulary fields, and `$import` and `$include` directives must be | |
260 processed as described in this section. The order of traversal of child | |
261 nodes within a parent node is undefined. |