comparison env/lib/python3.9/site-packages/cwltool/schemas/v1.0/concepts.md @ 0:4f3585e2f14b draft default tip

"planemo upload commit 60cee0fc7c0cda8592644e1aad72851dec82c959"
author shellac
date Mon, 22 Mar 2021 18:12:50 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4f3585e2f14b
1 ## References to other specifications
2
3 **Javascript Object Notation (JSON)**: http://json.org
4
5 **JSON Linked Data (JSON-LD)**: http://json-ld.org
6
7 **YAML**: http://yaml.org
8
9 **Avro**: https://avro.apache.org/docs/1.8.1/spec.html
10
11 **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986)
12
13 **Internationalized Resource Identifiers (IRIs)**:
14 https://tools.ietf.org/html/rfc3987
15
16 **Portable Operating System Interface (POSIX.1-2008)**: http://pubs.opengroup.org/onlinepubs/9699919799/
17
18 **Resource Description Framework (RDF)**: http://www.w3.org/RDF/
19
20 ## Scope
21
22 This document describes CWL syntax, execution, and object model. It
23 is not intended to document a CWL specific implementation, however it may
24 serve as a reference for the behavior of conforming implementations.
25
26 ## Terminology
27
28 The terminology used to describe CWL documents is defined in the
29 Concepts section of the specification. The terms defined in the
30 following list are used in building those definitions and in describing the
31 actions of a CWL implementation:
32
33 **may**: Conforming CWL documents and CWL implementations are permitted but
34 not required to behave as described.
35
36 **must**: Conforming CWL documents and CWL implementations are required to behave
37 as described; otherwise they are in error.
38
39 **error**: A violation of the rules of this specification; results are
40 undefined. Conforming implementations may detect and report an error and may
41 recover from it.
42
43 **fatal error**: A violation of the rules of this specification; results are
44 undefined. Conforming implementations must not continue to execute the current
45 process and may report an error.
46
47 **at user option**: Conforming software may or must (depending on the modal verb in
48 the sentence) behave as described; if it does, it must provide users a means to
49 enable or disable the behavior described.
50
51 **deprecated**: Conforming software may implement a behavior for backwards
52 compatibility. Portable CWL documents should not rely on deprecated behavior.
53 Behavior marked as deprecated may be removed entirely from future revisions of
54 the CWL specification.
55
56 # Data model
57
58 ## Data concepts
59
60 An **object** is a data structure equivalent to the "object" type in JSON,
61 consisting of a unordered set of name/value pairs (referred to here as
62 **fields**) and where the name is a string and the value is a string, number,
63 boolean, array, or object.
64
65 A **document** is a file containing a serialized object, or an array of objects.
66
67 A **process** is a basic unit of computation which accepts input data,
68 performs some computation, and produces output data. Examples include
69 CommandLineTools, Workflows, and ExpressionTools.
70
71 An **input object** is an object describing the inputs to an invocation of
72 a process.
73
74 An **output object** is an object describing the output resulting from an
75 invocation of a process.
76
77 An **input schema** describes the valid format (required fields, data types)
78 for an input object.
79
80 An **output schema** describes the valid format for an output object.
81
82 **Metadata** is information about workflows, tools, or input items.
83
84 ## Syntax
85
86 CWL documents must consist of an object or array of objects represented using
87 JSON or YAML syntax. Upon loading, a CWL implementation must apply the
88 preprocessing steps described in the
89 [Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html).
90 An implementation may formally validate the structure of a CWL document using
91 SALAD schemas located at
92 https://github.com/common-workflow-language/common-workflow-language/tree/master/v1.0
93
94 ## Identifiers
95
96 If an object contains an `id` field, that is used to uniquely identify the
97 object in that document. The value of the `id` field must be unique over the
98 entire document. Identifiers may be resolved relative to either the document
99 base and/or other identifiers following the rules are described in the
100 [Schema Salad specification](SchemaSalad.html#Identifier_resolution).
101
102 An implementation may choose to only honor references to object types for
103 which the `id` field is explicitly listed in this specification.
104
105 ## Document preprocessing
106
107 An implementation must resolve [$import](SchemaSalad.html#Import) and
108 [$include](SchemaSalad.html#Import) directives as described in the
109 [Schema Salad specification](SchemaSalad.html).
110
111 Another transformation defined in Schema salad is simplification of data type definitions.
112 Type `<T>` ending with `?` should be transformed to `[<T>, "null"]`.
113 Type `<T>` ending with `[]` should be transformed to `{"type": "array", "items": <T>}`
114
115 ## Extensions and metadata
116
117 Input metadata (for example, a lab sample identifier) may be represented within
118 a tool or workflow using input parameters which are explicitly propagated to
119 output. Future versions of this specification may define additional facilities
120 for working with input/output metadata.
121
122 Implementation extensions not required for correct execution (for example,
123 fields related to GUI presentation) and metadata about the tool or workflow
124 itself (for example, authorship for use in citations) may be provided as
125 additional fields on any object. Such extensions fields must use a namespace
126 prefix listed in the `$namespaces` section of the document as described in the
127 [Schema Salad specification](SchemaSalad.html#Explicit_context).
128
129 Implementation extensions which modify execution semantics must be [listed in
130 the `requirements` field](#Requirements_and_hints).
131
132 # Execution model
133
134 ## Execution concepts
135
136 A **parameter** is a named symbolic input or output of process, with an
137 associated datatype or schema. During execution, values are assigned to
138 parameters to make the input object or output object used for concrete
139 process invocation.
140
141 A **CommandLineTool** is a process characterized by the execution of a
142 standalone, non-interactive program which is invoked on some input,
143 produces output, and then terminates.
144
145 A **workflow** is a process characterized by multiple subprocess steps,
146 where step outputs are connected to the inputs of downstream steps to
147 form a directed acylic graph, and independent steps may run concurrently.
148
149 A **runtime environment** is the actual hardware and software environment when
150 executing a command line tool. It includes, but is not limited to, the
151 hardware architecture, hardware resources, operating system, software runtime
152 (if applicable, such as the specific Python interpreter or the specific Java
153 virtual machine), libraries, modules, packages, utilities, and data files
154 required to run the tool.
155
156 A **workflow platform** is a specific hardware and software implementation
157 capable of interpreting CWL documents and executing the processes specified by
158 the document. The responsibilities of the workflow platform may include
159 scheduling process invocation, setting up the necessary runtime environment,
160 making input data available, invoking the tool process, and collecting output.
161
162 A workflow platform may choose to only implement the Command Line Tool
163 Description part of the CWL specification.
164
165 It is intended that the workflow platform has broad leeway outside of this
166 specification to optimize use of computing resources and enforce policies
167 not covered by this specification. Some areas that are currently out of
168 scope for CWL specification but may be handled by a specific workflow
169 platform include:
170
171 * Data security and permissions
172 * Scheduling tool invocations on remote cluster or cloud compute nodes.
173 * Using virtual machines or operating system containers to manage the runtime
174 (except as described in [DockerRequirement](CommandLineTool.html#DockerRequirement)).
175 * Using remote or distributed file systems to manage input and output files.
176 * Transforming file paths.
177 * Determining if a process has previously been executed, and if so skipping it
178 and reusing previous results.
179 * Pausing, resuming or checkpointing processes or workflows.
180
181 Conforming CWL processes must not assume anything about the runtime
182 environment or workflow platform unless explicitly declared though the use
183 of [process requirements](#Requirements_and_hints).
184
185 ## Generic execution process
186
187 The generic execution sequence of a CWL process (including workflows and
188 command line line tools) is as follows.
189
190 1. Load, process and validate a CWL document, yielding a process object.
191 2. Load input object.
192 3. Validate the input object against the `inputs` schema for the process.
193 4. Validate process requirements are met.
194 5. Perform any further setup required by the specific process type.
195 6. Execute the process.
196 7. Capture results of process execution into the output object.
197 8. Validate the output object against the `outputs` schema for the process.
198 9. Report the output object to the process caller.
199
200 ## Requirements and hints
201
202 A **process requirement** modifies the semantics or runtime
203 environment of a process. If an implementation cannot satisfy all
204 requirements, or a requirement is listed which is not recognized by the
205 implementation, it is a fatal error and the implementation must not attempt
206 to run the process, unless overridden at user option.
207
208 A **hint** is similar to a requirement; however, it is not an error if an
209 implementation cannot satisfy all hints. The implementation may report a
210 warning if a hint cannot be satisfied.
211
212 Requirements are inherited. A requirement specified in a Workflow applies
213 to all workflow steps; a requirement specified on a workflow step will
214 apply to the process implementation of that step and any of its substeps.
215
216 If the same process requirement appears at different levels of the
217 workflow, the most specific instance of the requirement is used, that is,
218 an entry in `requirements` on a process implementation such as
219 CommandLineTool will take precedence over an entry in `requirements`
220 specified in a workflow step, and an entry in `requirements` on a workflow
221 step takes precedence over the workflow. Entries in `hints` are resolved
222 the same way.
223
224 Requirements override hints. If a process implementation provides a
225 process requirement in `hints` which is also provided in `requirements` by
226 an enclosing workflow or workflow step, the enclosing `requirements` takes
227 precedence.
228
229 ## Parameter references
230
231 Parameter references are denoted by the syntax `$(...)` and may be used in any
232 field permitting the pseudo-type `Expression`, as specified by this document.
233 Conforming implementations must support parameter references. Parameter
234 references use the following subset of
235 [Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/)
236 syntax, but they are designed to not require a Javascript engine for evaluation.
237
238 In the following BNF grammar, character classes, and grammar rules are denoted
239 in '{}', '-' denotes exclusion from a character class, '(())' denotes grouping,
240 '|' denotes alternates, trailing '*' denotes zero or more repeats, '+' denote
241 one or more repeats, '/' escapes these special characters, and all other
242 characters are literal values.
243
244 <p>
245 <table class="table">
246 <tr><td>symbol:: </td><td>{Unicode alphanumeric}+</td></tr>
247 <tr><td>singleq:: </td><td>[' (( {character - '} | \' ))* ']</td></tr>
248 <tr><td>doubleq:: </td><td>[" (( {character - "} | \" ))* "]</td></tr>
249 <tr><td>index:: </td><td>[ {decimal digit}+ ]</td></tr>
250 <tr><td>segment:: </td><td>. {symbol} | {singleq} | {doubleq} | {index}</td></tr>
251 <tr><td>parameter reference::</td><td>$( {symbol} {segment}*)</td></tr>
252 </table>
253 </p>
254
255 Use the following algorithm to resolve a parameter reference:
256
257 1. Match the leading symbol as the key
258 2. Look up the key in the parameter context (described below) to get the current value.
259 It is an error if the key is not found in the parameter context.
260 3. If there are no subsequent segments, terminate and return current value
261 4. Else, match the next segment
262 5. Extract the symbol, string, or index from the segment as the key
263 6. Look up the key in current value and assign as new current value. If
264 the key is a symbol or string, the current value must be an object.
265 If the key is an index, the current value must be an array or string.
266 It is an error if the key does not match the required type, or the key is not found or out
267 of range.
268 7. Repeat steps 3-6
269
270 The root namespace is the parameter context. The following parameters must
271 be provided:
272
273 * `inputs`: The input object to the current Process.
274 * `self`: A context-specific value. The contextual values for 'self' are
275 documented for specific fields elsewhere in this specification. If
276 a contextual value of 'self' is not documented for a field, it
277 must be 'null'.
278 * `runtime`: An object containing configuration details. Specific to the
279 process type. An implementation may provide
280 opaque strings for any or all fields of `runtime`. These must be
281 filled in by the platform after processing the Tool but before actual
282 execution. Parameter references and expressions may only use the
283 literal string value of the field and must not perform computation on
284 the contents, except where noted otherwise.
285
286 If the value of a field has no leading or trailing non-whitespace
287 characters around a parameter reference, the effective value of the field
288 becomes the value of the referenced parameter, preserving the return type.
289
290 If the value of a field has non-whitespace leading or trailing characters
291 around a parameter reference, it is subject to string interpolation. The
292 effective value of the field is a string containing the leading characters,
293 followed by the string value of the parameter reference, followed by the
294 trailing characters. The string value of the parameter reference is its
295 textual JSON representation with the following rules:
296
297 * Leading and trailing quotes are stripped from strings
298 * Objects entries are sorted by key
299
300 Multiple parameter references may appear in a single field. This case
301 must be treated as a string interpolation. After interpolating the first
302 parameter reference, interpolation must be recursively applied to the
303 trailing characters to yield the final string value.
304
305 ## Expressions
306
307 An expression is a fragment of [Javascript/ECMAScript
308 5.1](http://www.ecma-international.org/ecma-262/5.1/) code evaluated by the
309 workflow platform to affect the inputs, outputs, or
310 behavior of a process. In the generic execution sequence, expressions may
311 be evaluated during step 5 (process setup), step 6 (execute process),
312 and/or step 7 (capture output). Expressions are distinct from regular
313 processes in that they are intended to modify the behavior of the workflow
314 itself rather than perform the primary work of the workflow.
315
316 To declare the use of expressions, the document must include the process
317 requirement `InlineJavascriptRequirement`. Expressions may be used in any
318 field permitting the pseudo-type `Expression`, as specified by this
319 document.
320
321 Expressions are denoted by the syntax `$(...)` or `${...}`. A code
322 fragment wrapped in the `$(...)` syntax must be evaluated as a
323 [ECMAScript expression](http://www.ecma-international.org/ecma-262/5.1/#sec-11). A
324 code fragment wrapped in the `${...}` syntax must be evaluated as a
325 [ECMAScript function body](http://www.ecma-international.org/ecma-262/5.1/#sec-13)
326 for an anonymous, zero-argument function. Expressions must return a valid JSON
327 data type: one of null, string, number, boolean, array, object. Other return
328 values must result in a `permanentFailure`. Implementations must permit any
329 syntactically valid Javascript and account for nesting of parenthesis or braces
330 and that strings that may contain parenthesis or braces when scanning for
331 expressions.
332
333 The runtime must include any code defined in the ["expressionLib" field of
334 InlineJavascriptRequirement](#InlineJavascriptRequirement) prior to
335 executing the actual expression.
336
337 Before executing the expression, the runtime must initialize as global
338 variables the fields of the parameter context described above.
339
340 The effective value of the field after expression evaluation follows the
341 same rules as parameter references discussed above. Multiple expressions
342 may appear in a single field.
343
344 Expressions must be evaluated in an isolated context (a "sandbox") which
345 permits no side effects to leak outside the context. Expressions also must
346 be evaluated in [Javascript strict mode](http://www.ecma-international.org/ecma-262/5.1/#sec-4.2.2).
347
348 The order in which expressions are evaluated is undefined except where
349 otherwise noted in this document.
350
351 An implementation may choose to implement parameter references by
352 evaluating as a Javascript expression. The results of evaluating
353 parameter references must be identical whether implemented by Javascript
354 evaluation or some other means.
355
356 Implementations may apply other limits, such as process isolation, timeouts,
357 and operating system containers/jails to minimize the security risks associated
358 with running untrusted code embedded in a CWL document.
359
360 Exceptions thrown from an exception must result in a `permanentFailure` of the
361 process.
362
363 ## Executing CWL documents as scripts
364
365 By convention, a CWL document may begin with `#!/usr/bin/env cwl-runner`
366 and be marked as executable (the POSIX "+x" permission bits) to enable it
367 to be executed directly. A workflow platform may support this mode of
368 operation; if so, it must provide `cwl-runner` as an alias for the
369 platform's CWL implementation.
370
371 A CWL input object document may similarly begin with `#!/usr/bin/env
372 cwl-runner` and be marked as executable. In this case, the input object
373 must include the field `cwl:tool` supplying an IRI to the default CWL
374 document that should be executed using the fields of the input object as
375 input parameters.
376
377 ## Discovering CWL documents on a local filesystem
378
379 To discover CWL documents look in the following locations:
380
381 `/usr/share/commonwl/`
382
383 `/usr/local/share/commonwl/`
384
385 `$XDG_DATA_HOME/commonwl/` (usually `$HOME/.local/share/commonwl`)
386
387 `$XDG_DATA_HOME` is from the [XDG Base Directory
388 Specification](http://standards.freedesktop.org/basedir-spec/basedir-spec-0.6.html)