comparison env/lib/python3.9/site-packages/cwltool/schemas/v1.1/concepts.md @ 0:4f3585e2f14b draft default tip

"planemo upload commit 60cee0fc7c0cda8592644e1aad72851dec82c959"
author shellac
date Mon, 22 Mar 2021 18:12:50 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4f3585e2f14b
1 ## References to other specifications
2
3 **Javascript Object Notation (JSON)**: http://json.org
4
5 **JSON Linked Data (JSON-LD)**: http://json-ld.org
6
7 **YAML**: http://yaml.org
8
9 **Avro**: https://avro.apache.org/docs/1.8.1/spec.html
10
11 **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986)
12
13 **Internationalized Resource Identifiers (IRIs)**:
14 https://tools.ietf.org/html/rfc3987
15
16 **Portable Operating System Interface (POSIX.1-2008)**: http://pubs.opengroup.org/onlinepubs/9699919799/
17
18 **Resource Description Framework (RDF)**: http://www.w3.org/RDF/
19
20 ## Scope
21
22 This document describes CWL syntax, execution, and object model. It
23 is not intended to document a CWL specific implementation, however it may
24 serve as a reference for the behavior of conforming implementations.
25
26 ## Terminology
27
28 The terminology used to describe CWL documents is defined in the
29 Concepts section of the specification. The terms defined in the
30 following list are used in building those definitions and in describing the
31 actions of a CWL implementation:
32
33 **may**: Conforming CWL documents and CWL implementations are permitted but
34 not required to behave as described.
35
36 **must**: Conforming CWL documents and CWL implementations are required to behave
37 as described; otherwise they are in error.
38
39 **error**: A violation of the rules of this specification; results are
40 undefined. Conforming implementations may detect and report an error and may
41 recover from it.
42
43 **fatal error**: A violation of the rules of this specification; results are
44 undefined. Conforming implementations must not continue to execute the current
45 process and may report an error.
46
47 **at user option**: Conforming software may or must (depending on the modal verb in
48 the sentence) behave as described; if it does, it must provide users a means to
49 enable or disable the behavior described.
50
51 **deprecated**: Conforming software may implement a behavior for backwards
52 compatibility. Portable CWL documents should not rely on deprecated behavior.
53 Behavior marked as deprecated may be removed entirely from future revisions of
54 the CWL specification.
55
56 # Data model
57
58 ## Data concepts
59
60 An **object** is a data structure equivalent to the "object" type in JSON,
61 consisting of a unordered set of name/value pairs (referred to here as
62 **fields**) and where the name is a string and the value is a string, number,
63 boolean, array, or object.
64
65 A **document** is a file containing a serialized object, or an array of objects.
66
67 A **process** is a basic unit of computation which accepts input data,
68 performs some computation, and produces output data. Examples include
69 CommandLineTools, Workflows, and ExpressionTools.
70
71 An **input object** is an object describing the inputs to an invocation of
72 a process.
73
74 An **output object** is an object describing the output resulting from an
75 invocation of a process.
76
77 An **input schema** describes the valid format (required fields, data types)
78 for an input object.
79
80 An **output schema** describes the valid format for an output object.
81
82 **Metadata** is information about workflows, tools, or input items.
83
84 ## Syntax
85
86 CWL documents must consist of an object or array of objects represented using
87 JSON or YAML syntax. Upon loading, a CWL implementation must apply the
88 preprocessing steps described in the
89 [Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html).
90 An implementation may formally validate the structure of a CWL document using
91 SALAD schemas located at
92 https://github.com/common-workflow-language/common-workflow-language/tree/master/v1.1
93
94 ### map
95
96 Note: This section is non-normative.
97 > type: array<ComplexType> |
98 > map<`key_field`, ComplexType>
99
100 The above syntax in the CWL specifications means there are two or more ways to write the given value.
101
102 Option one is a array and is the most verbose option.
103
104 Option one generic example:
105 ```
106 some_cwl_field:
107 - key_field: a_complex_type1
108 field2: foo
109 field3: bar
110 - key_field: a_complex_type2
111 field2: foo2
112 field3: bar2
113 - key_field: a_complex_type3
114 ```
115
116 Option one specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter):
117 > array<InputParameter> |
118 > map<`id`, `type` | InputParameter>
119
120
121 ```
122 inputs:
123 - id: workflow_input01
124 type: string
125 - id: workflow_input02
126 type: File
127 format: http://edamontology.org/format_2572
128 ```
129
130 Option two is enabled by the `map<…>` syntax. Instead of an array of entries we
131 use a mapping, where one field of the `ComplexType` (here named `key_field`)
132 becomes the key in the map, and its value is the rest of the `ComplexType`
133 without the key field. If all of the other fields of the `ComplexType` are
134 optional and unneeded, then we can indicate this with an empty mapping as the
135 value: `a_complex_type3: {}`
136
137 Option two generic example:
138 ```
139 some_cwl_field:
140 a_complex_type1: # this was the "key_field" from above
141 field2: foo
142 field3: bar
143 a_complex_type2:
144 field2: foo2
145 field3: bar2
146 a_complex_type3: {} # we accept the defualt values for "field2" and "field3"
147 ```
148
149 Option two specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter):
150 > array&lt;InputParameter&gt; |
151 > map&lt;`id`, `type` | InputParameter&gt;
152
153
154 ```
155 inputs:
156 workflow_input01:
157 type: string
158 workflow_input02:
159 type: File
160 format: http://edamontology.org/format_2572
161 ```
162
163 Option two specific example using [SoftwareRequirement](#SoftwareRequirement).[packages](#SoftwarePackage):
164 > array&lt;SoftwarePackage&gt; |
165 > map&lt;`package`, `specs` | SoftwarePackage&gt;
166
167
168 ```
169 hints:
170 SoftwareRequirement:
171 packages:
172 sourmash:
173 specs: [ https://doi.org/10.21105/joss.00027 ]
174 screed:
175 version: [ "1.0" ]
176 python: {}
177 ```
178 `
179 Sometimes we have a third and even more compact option denoted like this:
180 > type: array&lt;ComplexType&gt; |
181 > map&lt;`key_field`, `field2` | ComplexType&gt;
182
183 For this example, if we only need the `key_field` and `field2` when specifying
184 our `ComplexType`s (because the other fields are optional and we are fine with
185 their default values) then we can abbreviate.
186
187 Option three generic example:
188 ```
189 some_cwl_field:
190 a_complex_type1: foo # we accept the default value for field3
191 a_complex_type2: foo2 # we accept the default value for field3
192 a_complex_type3: {} # we accept the default values for "field2" and "field3"
193 ```
194
195 Option three specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter):
196 > array&lt;InputParameter&gt; |
197 > map&lt;`id`, `type` | InputParameter&gt;
198
199
200 ```
201 inputs:
202 workflow_input01: string
203 workflow_input02: File # we accept the default of no File format
204 ```
205
206 Option three specific example using [SoftwareRequirement](#SoftwareRequirement).[packages](#SoftwarePackage):
207 > array&lt;SoftwarePackage&gt; |
208 > map&lt;`package`, `specs` | SoftwarePackage&gt;
209
210
211 ```
212 hints:
213 SoftwareRequirement:
214 packages:
215 sourmash: [ https://doi.org/10.21105/joss.00027 ]
216 python: {}
217 ```
218
219
220 What if some entries we want to mix the option 2 and 3? You can!
221
222 Mixed option 2 and 3 generic example:
223 ```
224 some_cwl_field:
225 my_complex_type1: foo # we accept the default value for field3
226 my_complex_type2:
227 field2: foo2
228 field3: bar2 # we did not accept the default value for field3
229 # so we had to use the slightly expanded syntax
230 my_complex_type3: {} # as before, we accept the default values for both
231 # "field2" and "field3"
232 ```
233
234 Mixed option 2 and 3 specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter):
235 > array&lt;InputParameter&gt; |
236 > map&lt;`id`, `type` | InputParameter&gt;
237
238
239 ```
240 inputs:
241 workflow_input01: string
242 workflow_input02: # we use the longer way
243 type: File # because we want to specify the "format" too
244 format: http://edamontology.org/format_2572
245 workflow_input03: {} # back to the short form as this entry
246 # uses the default of no "type" just like the prior
247 # examples
248 ```
249
250 Mixed option 2 and 3 specific example using [SoftwareRequirement](#SoftwareRequirement).[packages](#SoftwarePackage):
251 > array&lt;SoftwarePackage&gt; |
252 > map&lt;`package`, `specs` | SoftwarePackage&gt;
253
254
255 ```
256 hints:
257 SoftwareRequirement:
258 packages:
259 sourmash: [ https://doi.org/10.21105/joss.00027 ]
260 screed:
261 specs: [ https://github.com/dib-lab/screed ]
262 version: [ "1.0" ]
263 python: {}
264 ```
265
266 Note: The `map<…>` (compact) versions are optional, the verbose option #1 is
267 always allowed, but for presentation reasons option 3 and 2 may be preferred
268 by human readers.
269
270 The normative explanation for these variations, aimed at implementors, is in the
271 [Schema Salad specification](SchemaSalad.html#Identifier_maps).
272
273 ## Identifiers
274
275 If an object contains an `id` field, that is used to uniquely identify the
276 object in that document. The value of the `id` field must be unique over the
277 entire document. Identifiers may be resolved relative to either the document
278 base and/or other identifiers following the rules are described in the
279 [Schema Salad specification](SchemaSalad.html#Identifier_resolution).
280
281 An implementation may choose to only honor references to object types for
282 which the `id` field is explicitly listed in this specification.
283
284 ## Document preprocessing
285
286 An implementation must resolve [$import](SchemaSalad.html#Import) and
287 [$include](SchemaSalad.html#Import) directives as described in the
288 [Schema Salad specification](SchemaSalad.html).
289
290 Another transformation defined in Schema salad is simplification of data type definitions.
291 Type `<T>` ending with `?` should be transformed to `[<T>, "null"]`.
292 Type `<T>` ending with `[]` should be transformed to `{"type": "array", "items": <T>}`
293
294 ## Extensions and metadata
295
296 Input metadata (for example, a lab sample identifier) may be represented within
297 a tool or workflow using input parameters which are explicitly propagated to
298 output. Future versions of this specification may define additional facilities
299 for working with input/output metadata.
300
301 Implementation extensions not required for correct execution (for example,
302 fields related to GUI presentation) and metadata about the tool or workflow
303 itself (for example, authorship for use in citations) may be provided as
304 additional fields on any object. Such extensions fields must use a namespace
305 prefix listed in the `$namespaces` section of the document as described in the
306 [Schema Salad specification](SchemaSalad.html#Explicit_context).
307
308 Implementation extensions which modify execution semantics must be [listed in
309 the `requirements` field](#Requirements_and_hints).
310
311 # Execution model
312
313 ## Execution concepts
314
315 A **parameter** is a named symbolic input or output of process, with an
316 associated datatype or schema. During execution, values are assigned to
317 parameters to make the input object or output object used for concrete
318 process invocation.
319
320 A **CommandLineTool** is a process characterized by the execution of a
321 standalone, non-interactive program which is invoked on some input,
322 produces output, and then terminates.
323
324 A **workflow** is a process characterized by multiple subprocess steps,
325 where step outputs are connected to the inputs of downstream steps to
326 form a directed acylic graph, and independent steps may run concurrently.
327
328 A **runtime environment** is the actual hardware and software environment when
329 executing a command line tool. It includes, but is not limited to, the
330 hardware architecture, hardware resources, operating system, software runtime
331 (if applicable, such as the specific Python interpreter or the specific Java
332 virtual machine), libraries, modules, packages, utilities, and data files
333 required to run the tool.
334
335 A **workflow platform** is a specific hardware and software implementation
336 capable of interpreting CWL documents and executing the processes specified by
337 the document. The responsibilities of the workflow platform may include
338 scheduling process invocation, setting up the necessary runtime environment,
339 making input data available, invoking the tool process, and collecting output.
340
341 A workflow platform may choose to only implement the Command Line Tool
342 Description part of the CWL specification.
343
344 It is intended that the workflow platform has broad leeway outside of this
345 specification to optimize use of computing resources and enforce policies
346 not covered by this specification. Some areas that are currently out of
347 scope for CWL specification but may be handled by a specific workflow
348 platform include:
349
350 * Data security and permissions
351 * Scheduling tool invocations on remote cluster or cloud compute nodes.
352 * Using virtual machines or operating system containers to manage the runtime
353 (except as described in [DockerRequirement](CommandLineTool.html#DockerRequirement)).
354 * Using remote or distributed file systems to manage input and output files.
355 * Transforming file paths.
356 * Determining if a process has previously been executed, and if so skipping it
357 and reusing previous results.
358 * Pausing, resuming or checkpointing processes or workflows.
359
360 Conforming CWL processes must not assume anything about the runtime
361 environment or workflow platform unless explicitly declared though the use
362 of [process requirements](#Requirements_and_hints).
363
364 ## Generic execution process
365
366 The generic execution sequence of a CWL process (including workflows and
367 command line line tools) is as follows.
368
369 1. Load input object.
370 1. Load, process and validate a CWL document, yielding one or more process objects.
371 The [`$namespaces`](SchemaSalad.html#Explicit_context) present in the CWL document
372 are also used when validating and processing the input object.
373 1. If there are multiple process objects (due to [`$graph`](SchemaSalad.html#Document_graph))
374 and which process object to start with is not specified in the input object (via
375 a [`cwl:tool`](#Executing_CWL_documents_as_scripts) entry) or by any other means
376 (like a URL fragment) then choose the process with the `id` of "#main" or "main".
377 1. Validate the input object against the `inputs` schema for the process.
378 1. Validate process requirements are met.
379 1. Perform any further setup required by the specific process type.
380 1. Execute the process.
381 1. Capture results of process execution into the output object.
382 1. Validate the output object against the `outputs` schema for the process.
383 1. Report the output object to the process caller.
384
385 ## Requirements and hints
386
387 A **process requirement** modifies the semantics or runtime
388 environment of a process. If an implementation cannot satisfy all
389 requirements, or a requirement is listed which is not recognized by the
390 implementation, it is a fatal error and the implementation must not attempt
391 to run the process, unless overridden at user option.
392
393 A **hint** is similar to a requirement; however, it is not an error if an
394 implementation cannot satisfy all hints. The implementation may report a
395 warning if a hint cannot be satisfied.
396
397 Optionally, implementations may allow requirements to be specified in the input
398 object document as an array of requirements under the field name
399 `cwl:requirements`. If implementations allow this, then such requirements
400 should be combined with any requirements present in the corresponding Process
401 as if they were specified there.
402
403 Requirements specified in a parent Workflow are inherited by step processes
404 if they are valid for that step. If the substep is a CommandLineTool
405 only the `InlineJavascriptRequirement`, `SchemaDefRequirement`, `DockerRequirement`,
406 `SoftwareRequirement`, `InitialWorkDirRequirement`, `EnvVarRequirement`,
407 `ShellCommandRequirement`, `ResourceRequirement` are valid.
408
409 *As good practice, it is best to have process requirements be self-contained,
410 such that each process can run successfully by itself.*
411
412 If the same process requirement appears at different levels of the
413 workflow, the most specific instance of the requirement is used, that is,
414 an entry in `requirements` on a process implementation such as
415 CommandLineTool will take precedence over an entry in `requirements`
416 specified in a workflow step, and an entry in `requirements` on a workflow
417 step takes precedence over the workflow. Entries in `hints` are resolved
418 the same way.
419
420 Requirements override hints. If a process implementation provides a
421 process requirement in `hints` which is also provided in `requirements` by
422 an enclosing workflow or workflow step, the enclosing `requirements` takes
423 precedence.
424
425 ## Parameter references
426
427 Parameter references are denoted by the syntax `$(...)` and may be used in any
428 field permitting the pseudo-type `Expression`, as specified by this document.
429 Conforming implementations must support parameter references. Parameter
430 references use the following subset of
431 [Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/)
432 syntax, but they are designed to not require a Javascript engine for evaluation.
433
434 In the following [BNF
435 grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form), character
436 classes, and grammar rules are denoted in '{}', '-' denotes exclusion from a
437 character class, '(())' denotes grouping, '|' denotes alternates, trailing
438 '*' denotes zero or more repeats, '+' denote one or more repeats, '/' escapes
439 these special characters, and all other characters are literal values.
440
441 <p>
442 <table class="table">
443 <tr><td>symbol:: </td><td>{Unicode alphanumeric}+</td></tr>
444 <tr><td>singleq:: </td><td>[' (( {character - '} | \' ))* ']</td></tr>
445 <tr><td>doubleq:: </td><td>[" (( {character - "} | \" ))* "]</td></tr>
446 <tr><td>index:: </td><td>[ {decimal digit}+ ]</td></tr>
447 <tr><td>segment:: </td><td>. {symbol} | {singleq} | {doubleq} | {index}</td></tr>
448 <tr><td>parameter reference::</td><td>$( {symbol} {segment}*)</td></tr>
449 </table>
450 </p>
451
452 Use the following algorithm to resolve a parameter reference:
453
454 1. Match the leading symbol as the key
455 2. Look up the key in the parameter context (described below) to get the current value.
456 It is an error if the key is not found in the parameter context.
457 3. If there are no subsequent segments, terminate and return current value
458 4. Else, match the next segment
459 5. Extract the symbol, string, or index from the segment as the key
460 6. Look up the key in current value and assign as new current value. If
461 the key is a symbol or string, the current value must be an object.
462 If the key is an index, the current value must be an array or string.
463 It is an error if the key does not match the required type, or the key is not found or out
464 of range.
465 7. Repeat steps 3-6
466
467 The root namespace is the parameter context. The following parameters must
468 be provided:
469
470 * `inputs`: The input object to the current Process.
471 * `self`: A context-specific value. The contextual values for 'self' are
472 documented for specific fields elsewhere in this specification. If
473 a contextual value of 'self' is not documented for a field, it
474 must be 'null'.
475 * `runtime`: An object containing configuration details. Specific to the
476 process type. An implementation may provide
477 opaque strings for any or all fields of `runtime`. These must be
478 filled in by the platform after processing the Tool but before actual
479 execution. Parameter references and expressions may only use the
480 literal string value of the field and must not perform computation on
481 the contents, except where noted otherwise.
482
483 If the value of a field has no leading or trailing non-whitespace
484 characters around a parameter reference, the effective value of the field
485 becomes the value of the referenced parameter, preserving the return type.
486
487 If the value of a field has non-whitespace leading or trailing characters
488 around a parameter reference, it is subject to string interpolation. The
489 effective value of the field is a string containing the leading characters,
490 followed by the string value of the parameter reference, followed by the
491 trailing characters. The string value of the parameter reference is its
492 textual JSON representation with the following rules:
493
494 * Leading and trailing quotes are stripped from strings
495 * Objects entries are sorted by key
496
497 Multiple parameter references may appear in a single field. This case
498 must be treated as a string interpolation. After interpolating the first
499 parameter reference, interpolation must be recursively applied to the
500 trailing characters to yield the final string value.
501
502 ## Expressions
503
504 An expression is a fragment of [Javascript/ECMAScript
505 5.1](http://www.ecma-international.org/ecma-262/5.1/) code evaluated by the
506 workflow platform to affect the inputs, outputs, or
507 behavior of a process. In the generic execution sequence, expressions may
508 be evaluated during step 5 (process setup), step 6 (execute process),
509 and/or step 7 (capture output). Expressions are distinct from regular
510 processes in that they are intended to modify the behavior of the workflow
511 itself rather than perform the primary work of the workflow.
512
513 To declare the use of expressions, the document must include the process
514 requirement `InlineJavascriptRequirement`. Expressions may be used in any
515 field permitting the pseudo-type `Expression`, as specified by this
516 document.
517
518 Expressions are denoted by the syntax `$(...)` or `${...}`. A code
519 fragment wrapped in the `$(...)` syntax must be evaluated as a
520 [ECMAScript expression](http://www.ecma-international.org/ecma-262/5.1/#sec-11). A
521 code fragment wrapped in the `${...}` syntax must be evaluated as a
522 [ECMAScript function body](http://www.ecma-international.org/ecma-262/5.1/#sec-13)
523 for an anonymous, zero-argument function. Expressions must return a valid JSON
524 data type: one of null, string, number, boolean, array, object. Other return
525 values must result in a `permanentFailure`. Implementations must permit any
526 syntactically valid Javascript and account for nesting of parenthesis or braces
527 and that strings that may contain parenthesis or braces when scanning for
528 expressions.
529
530 The runtime must include any code defined in the ["expressionLib" field of
531 InlineJavascriptRequirement](#InlineJavascriptRequirement) prior to
532 executing the actual expression.
533
534 Before executing the expression, the runtime must initialize as global
535 variables the fields of the parameter context described above.
536
537 The effective value of the field after expression evaluation follows the
538 same rules as parameter references discussed above. Multiple expressions
539 may appear in a single field.
540
541 Expressions must be evaluated in an isolated context (a "sandbox") which
542 permits no side effects to leak outside the context. Expressions also must
543 be evaluated in [Javascript strict mode](http://www.ecma-international.org/ecma-262/5.1/#sec-4.2.2).
544
545 The order in which expressions are evaluated is undefined except where
546 otherwise noted in this document.
547
548 An implementation may choose to implement parameter references by
549 evaluating as a Javascript expression. The results of evaluating
550 parameter references must be identical whether implemented by Javascript
551 evaluation or some other means.
552
553 Implementations may apply other limits, such as process isolation, timeouts,
554 and operating system containers/jails to minimize the security risks associated
555 with running untrusted code embedded in a CWL document.
556
557 Exceptions thrown from an exception must result in a `permanentFailure` of the
558 process.
559
560 ## Executing CWL documents as scripts
561
562 By convention, a CWL document may begin with `#!/usr/bin/env cwl-runner`
563 and be marked as executable (the POSIX "+x" permission bits) to enable it
564 to be executed directly. A workflow platform may support this mode of
565 operation; if so, it must provide `cwl-runner` as an alias for the
566 platform's CWL implementation.
567
568 A CWL input object document may similarly begin with `#!/usr/bin/env
569 cwl-runner` and be marked as executable. In this case, the input object
570 must include the field `cwl:tool` supplying an IRI to the default CWL
571 document that should be executed using the fields of the input object as
572 input parameters.
573
574 The `cwl-runner` interface is required for conformance testing and is
575 documented in [cwl-runner.cwl](cwl-runner.cwl).
576
577 ## Discovering CWL documents on a local filesystem
578
579 To discover CWL documents look in the following locations:
580
581 `/usr/share/commonwl/`
582
583 `/usr/local/share/commonwl/`
584
585 `$XDG_DATA_HOME/commonwl/` (usually `$HOME/.local/share/commonwl`)
586
587 `$XDG_DATA_HOME` is from the [XDG Base Directory
588 Specification](http://standards.freedesktop.org/basedir-spec/basedir-spec-0.6.html)