Mercurial > repos > shellac > sam_consensus_v3
comparison env/lib/python3.9/site-packages/cwltool/schemas/v1.2/concepts.md @ 0:4f3585e2f14b draft default tip
"planemo upload commit 60cee0fc7c0cda8592644e1aad72851dec82c959"
author | shellac |
---|---|
date | Mon, 22 Mar 2021 18:12:50 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4f3585e2f14b |
---|---|
1 ## References to other specifications | |
2 | |
3 **Javascript Object Notation (JSON)**: http://json.org | |
4 | |
5 **JSON Linked Data (JSON-LD)**: http://json-ld.org | |
6 | |
7 **YAML**: http://yaml.org | |
8 | |
9 **Avro**: https://avro.apache.org/docs/1.8.1/spec.html | |
10 | |
11 **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) | |
12 | |
13 **Internationalized Resource Identifiers (IRIs)**: | |
14 https://tools.ietf.org/html/rfc3987 | |
15 | |
16 **Portable Operating System Interface (POSIX.1-2008)**: http://pubs.opengroup.org/onlinepubs/9699919799/ | |
17 | |
18 **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ | |
19 | |
20 **XDG Base Directory Specification**: https://specifications.freedesktop.org/basedir-spec/basedir-spec-0.6.html | |
21 | |
22 | |
23 ## Scope | |
24 | |
25 This document describes CWL syntax, execution, and object model. It | |
26 is not intended to document a CWL specific implementation, however it may | |
27 serve as a reference for the behavior of conforming implementations. | |
28 | |
29 ## Terminology | |
30 | |
31 The terminology used to describe CWL documents is defined in the | |
32 Concepts section of the specification. The terms defined in the | |
33 following list are used in building those definitions and in describing the | |
34 actions of a CWL implementation: | |
35 | |
36 **may**: Conforming CWL documents and CWL implementations are permitted but | |
37 not required to behave as described. | |
38 | |
39 **must**: Conforming CWL documents and CWL implementations are required to behave | |
40 as described; otherwise they are in error. | |
41 | |
42 **error**: A violation of the rules of this specification; results are | |
43 undefined. Conforming implementations may detect and report an error and may | |
44 recover from it. | |
45 | |
46 **fatal error**: A violation of the rules of this specification; results are | |
47 undefined. Conforming implementations must not continue to execute the current | |
48 process and may report an error. | |
49 | |
50 **at user option**: Conforming software may or must (depending on the modal verb in | |
51 the sentence) behave as described; if it does, it must provide users a means to | |
52 enable or disable the behavior described. | |
53 | |
54 **deprecated**: Conforming software may implement a behavior for backwards | |
55 compatibility. Portable CWL documents should not rely on deprecated behavior. | |
56 Behavior marked as deprecated may be removed entirely from future revisions of | |
57 the CWL specification. | |
58 | |
59 # Data model | |
60 | |
61 ## Data concepts | |
62 | |
63 An **object** is a data structure equivalent to the "object" type in JSON, | |
64 consisting of a unordered set of name/value pairs (referred to here as | |
65 **fields**) and where the name is a string and the value is a string, number, | |
66 boolean, array, or object. | |
67 | |
68 A **document** is a file containing a serialized object, or an array of objects. | |
69 | |
70 A **process** is a basic unit of computation which accepts input data, | |
71 performs some computation, and produces output data. Examples include | |
72 CommandLineTools, Workflows, and ExpressionTools. | |
73 | |
74 An **input object** is an object describing the inputs to an | |
75 invocation of a process. The fields of the input object are referred | |
76 to as "input parameters". | |
77 | |
78 An **output object** is an object describing the output resulting from | |
79 an invocation of a process. The fields of the output object are | |
80 referred to as "output parameters". | |
81 | |
82 An **input schema** describes the valid format (required fields, data types) | |
83 for an input object. | |
84 | |
85 An **output schema** describes the valid format for an output object. | |
86 | |
87 **Metadata** is information about workflows, tools, or input items. | |
88 | |
89 ## Syntax | |
90 | |
91 CWL documents must consist of an object or array of objects represented using | |
92 JSON or YAML syntax. Upon loading, a CWL implementation must apply the | |
93 preprocessing steps described in the | |
94 [Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html). | |
95 An implementation may formally validate the structure of a CWL document using | |
96 SALAD schemas located at https://github.com/common-workflow-language/cwl-v1.2/ | |
97 | |
98 CWL documents commonly reference other CWL documents. Each document | |
99 must declare the `cwlVersion` of that document. Implementations must | |
100 validate against the document's declared version. Implementations | |
101 should allow workflows to reference documents of both newer and older | |
102 CWL versions (up to the highest version of CWL supported by that | |
103 implementation). Where the runtime enviroment or runtime behavior has | |
104 changed between versions, for that portion of the execution an | |
105 implementation must provide runtime enviroment and behavior consistent | |
106 with the document's declared version. An implementation must not | |
107 expose a newer feature when executing a document that specifies an | |
108 older version that does not not include that feature. | |
109 | |
110 ### map | |
111 | |
112 Note: This section is non-normative. | |
113 > type: array<ComplexType> | | |
114 > map<`key_field`, ComplexType> | |
115 | |
116 The above syntax in the CWL specifications means there are two or more ways to write the given value. | |
117 | |
118 Option one is a array and is the most verbose option. | |
119 | |
120 Option one generic example: | |
121 ``` | |
122 some_cwl_field: | |
123 - key_field: a_complex_type1 | |
124 field2: foo | |
125 field3: bar | |
126 - key_field: a_complex_type2 | |
127 field2: foo2 | |
128 field3: bar2 | |
129 - key_field: a_complex_type3 | |
130 ``` | |
131 | |
132 Option one specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter): | |
133 > array<InputParameter> | | |
134 > map<`id`, `type` | InputParameter> | |
135 | |
136 | |
137 ``` | |
138 inputs: | |
139 - id: workflow_input01 | |
140 type: string | |
141 - id: workflow_input02 | |
142 type: File | |
143 format: http://edamontology.org/format_2572 | |
144 ``` | |
145 | |
146 Option two is enabled by the `map<…>` syntax. Instead of an array of entries we | |
147 use a mapping, where one field of the `ComplexType` (here named `key_field`) | |
148 becomes the key in the map, and its value is the rest of the `ComplexType` | |
149 without the key field. If all of the other fields of the `ComplexType` are | |
150 optional and unneeded, then we can indicate this with an empty mapping as the | |
151 value: `a_complex_type3: {}` | |
152 | |
153 Option two generic example: | |
154 ``` | |
155 some_cwl_field: | |
156 a_complex_type1: # this was the "key_field" from above | |
157 field2: foo | |
158 field3: bar | |
159 a_complex_type2: | |
160 field2: foo2 | |
161 field3: bar2 | |
162 a_complex_type3: {} # we accept the defualt values for "field2" and "field3" | |
163 ``` | |
164 | |
165 Option two specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter): | |
166 > array<InputParameter> | | |
167 > map<`id`, `type` | InputParameter> | |
168 | |
169 | |
170 ``` | |
171 inputs: | |
172 workflow_input01: | |
173 type: string | |
174 workflow_input02: | |
175 type: File | |
176 format: http://edamontology.org/format_2572 | |
177 ``` | |
178 | |
179 Option two specific example using [SoftwareRequirement](#SoftwareRequirement).[packages](#SoftwarePackage): | |
180 > array<SoftwarePackage> | | |
181 > map<`package`, `specs` | SoftwarePackage> | |
182 | |
183 | |
184 ``` | |
185 hints: | |
186 SoftwareRequirement: | |
187 packages: | |
188 sourmash: | |
189 specs: [ https://doi.org/10.21105/joss.00027 ] | |
190 screed: | |
191 version: [ "1.0" ] | |
192 python: {} | |
193 ``` | |
194 ` | |
195 Sometimes we have a third and even more compact option denoted like this: | |
196 > type: array<ComplexType> | | |
197 > map<`key_field`, `field2` | ComplexType> | |
198 | |
199 For this example, if we only need the `key_field` and `field2` when specifying | |
200 our `ComplexType`s (because the other fields are optional and we are fine with | |
201 their default values) then we can abbreviate. | |
202 | |
203 Option three generic example: | |
204 ``` | |
205 some_cwl_field: | |
206 a_complex_type1: foo # we accept the default value for field3 | |
207 a_complex_type2: foo2 # we accept the default value for field3 | |
208 a_complex_type3: {} # we accept the default values for "field2" and "field3" | |
209 ``` | |
210 | |
211 Option three specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter): | |
212 > array<InputParameter> | | |
213 > map<`id`, `type` | InputParameter> | |
214 | |
215 | |
216 ``` | |
217 inputs: | |
218 workflow_input01: string | |
219 workflow_input02: File # we accept the default of no File format | |
220 ``` | |
221 | |
222 Option three specific example using [SoftwareRequirement](#SoftwareRequirement).[packages](#SoftwarePackage): | |
223 > array<SoftwarePackage> | | |
224 > map<`package`, `specs` | SoftwarePackage> | |
225 | |
226 | |
227 ``` | |
228 hints: | |
229 SoftwareRequirement: | |
230 packages: | |
231 sourmash: [ https://doi.org/10.21105/joss.00027 ] | |
232 python: {} | |
233 ``` | |
234 | |
235 | |
236 What if some entries we want to mix the option 2 and 3? You can! | |
237 | |
238 Mixed option 2 and 3 generic example: | |
239 ``` | |
240 some_cwl_field: | |
241 my_complex_type1: foo # we accept the default value for field3 | |
242 my_complex_type2: | |
243 field2: foo2 | |
244 field3: bar2 # we did not accept the default value for field3 | |
245 # so we had to use the slightly expanded syntax | |
246 my_complex_type3: {} # as before, we accept the default values for both | |
247 # "field2" and "field3" | |
248 ``` | |
249 | |
250 Mixed option 2 and 3 specific example using [Workflow](Workflow.html#Workflow).[inputs](Workflow.html#WorkflowInputParameter): | |
251 > array<InputParameter> | | |
252 > map<`id`, `type` | InputParameter> | |
253 | |
254 | |
255 ``` | |
256 inputs: | |
257 workflow_input01: string | |
258 workflow_input02: # we use the longer way | |
259 type: File # because we want to specify the "format" too | |
260 format: http://edamontology.org/format_2572 | |
261 ``` | |
262 | |
263 Mixed option 2 and 3 specific example using [SoftwareRequirement](#SoftwareRequirement).[packages](#SoftwarePackage): | |
264 > array<SoftwarePackage> | | |
265 > map<`package`, `specs` | SoftwarePackage> | |
266 | |
267 | |
268 ``` | |
269 hints: | |
270 SoftwareRequirement: | |
271 packages: | |
272 sourmash: [ https://doi.org/10.21105/joss.00027 ] | |
273 screed: | |
274 specs: [ https://github.com/dib-lab/screed ] | |
275 version: [ "1.0" ] | |
276 python: {} | |
277 ``` | |
278 | |
279 Note: The `map<…>` (compact) versions are optional for users, the verbose option #1 is | |
280 always allowed, but for presentation reasons option 3 and 2 may be preferred | |
281 by human readers. Consumers of CWL must support all three options. | |
282 | |
283 The normative explanation for these variations, aimed at implementors, is in the | |
284 [Schema Salad specification](SchemaSalad.html#Identifier_maps). | |
285 | |
286 ## Identifiers | |
287 | |
288 If an object contains an `id` field, that is used to uniquely identify the | |
289 object in that document. The value of the `id` field must be unique over the | |
290 entire document. Identifiers may be resolved relative to either the document | |
291 base and/or other identifiers following the rules are described in the | |
292 [Schema Salad specification](SchemaSalad.html#Identifier_resolution). | |
293 | |
294 An implementation may choose to only honor references to object types for | |
295 which the `id` field is explicitly listed in this specification. | |
296 | |
297 ## Document preprocessing | |
298 | |
299 An implementation must resolve [$import](SchemaSalad.html#Import) and | |
300 [$include](SchemaSalad.html#Import) directives as described in the | |
301 [Schema Salad specification](SchemaSalad.html). | |
302 | |
303 Another transformation defined in Schema salad is simplification of data type definitions. | |
304 Type `<T>` ending with `?` should be transformed to `[<T>, "null"]`. | |
305 Type `<T>` ending with `[]` should be transformed to `{"type": "array", "items": <T>}` | |
306 | |
307 ## Extensions and metadata | |
308 | |
309 Input metadata (for example, a sample identifier) may be represented within | |
310 a tool or workflow using input parameters which are explicitly propagated to | |
311 output. Future versions of this specification may define additional facilities | |
312 for working with input/output metadata. | |
313 | |
314 Implementation extensions not required for correct execution (for example, | |
315 fields related to GUI presentation) and metadata about the tool or workflow | |
316 itself (for example, authorship for use in citations) may be provided as | |
317 additional fields on any object. Such extensions fields must use a namespace | |
318 prefix listed in the `$namespaces` section of the document as described in the | |
319 [Schema Salad specification](SchemaSalad.html#Explicit_context). | |
320 | |
321 It is recommended that concepts from schema.org are used whenever possible. | |
322 For the `$schema` field we recommend their RDF encoding: http://schema.org/version/latest/schema.rdf | |
323 | |
324 Implementation extensions which modify execution semantics must be [listed in | |
325 the `requirements` field](#Requirements_and_hints). | |
326 | |
327 ## Packed documents | |
328 | |
329 A "packed" CWL document is one that contains multiple process objects. | |
330 This makes it possible to store and transmit a Workflow together with | |
331 the processes of each of its steps in a single file. | |
332 | |
333 There are two methods to create packed documents: embedding and $graph. | |
334 These can be both appear in the same document. | |
335 | |
336 "Embedding" is where the entire process object is copied into the | |
337 `run` field of a workflow step. If the step process is a subworkflow, | |
338 it can be processed recursively to embed the processes of the | |
339 subworkflow steps, and so on. Embedded process objects may optionally | |
340 include `id` fields. | |
341 | |
342 A "$graph" document does not have a process object at the root. | |
343 Instead there is a [`$graph`](SchemaSalad.html#Document_graph) field | |
344 which consists of a list of process objects. Each process object must | |
345 have an `id` field. Workflow `run` fields cross-reference other | |
346 processes in the document `$graph` using the `id` of the process | |
347 object. | |
348 | |
349 All process objects in a packed document must validate and execute as | |
350 the `cwlVersion` appearing the top level. A `cwlVersion` field | |
351 appearing anywhere other than the top level must be ignored. | |
352 | |
353 When executing a packed document, the reference to the document may | |
354 include a fragment identifier. If present, the fragment identifier | |
355 specifies the `id` of the process to execute. | |
356 | |
357 If the reference to the packed document does not include a fragment | |
358 identifier, the runner must choose the top-level process object as the | |
359 entry point. If there is no top-level process object (as in the case | |
360 of `$graph`) then the runner must choose the process object with an id | |
361 of `#main`. If there is no `#main` object, the runner must return an | |
362 error. | |
363 | |
364 # Execution model | |
365 | |
366 ## Execution concepts | |
367 | |
368 A **parameter** is a named symbolic input or output of process, with | |
369 an associated datatype or schema. During execution, values are | |
370 assigned to parameters to make the input object or output object used | |
371 for concrete process invocation. | |
372 | |
373 A **CommandLineTool** is a process characterized by the execution of a | |
374 standalone, non-interactive program which is invoked on some input, | |
375 produces output, and then terminates. | |
376 | |
377 A **workflow** is a process characterized by multiple subprocess steps, | |
378 where step outputs are connected to the inputs of downstream steps to | |
379 form a directed acylic graph, and independent steps may run concurrently. | |
380 | |
381 A **runtime environment** is the actual hardware and software environment when | |
382 executing a command line tool. It includes, but is not limited to, the | |
383 hardware architecture, hardware resources, operating system, software runtime | |
384 (if applicable, such as the specific Python interpreter or the specific Java | |
385 virtual machine), libraries, modules, packages, utilities, and data files | |
386 required to run the tool. | |
387 | |
388 A **workflow platform** is a specific hardware and software implementation | |
389 capable of interpreting CWL documents and executing the processes specified by | |
390 the document. The responsibilities of the workflow platform may include | |
391 scheduling process invocation, setting up the necessary runtime environment, | |
392 making input data available, invoking the tool process, and collecting output. | |
393 | |
394 A **data link** is a connection from a "Source" parameter to a "Sink" | |
395 parameter. A data link expresses that when a value becomes available | |
396 for the source parameter, that value should be copied to the "sink" | |
397 parameter. Reflecting the direction of data flow, a data link is | |
398 described as "outgoing" from the source and "inbound" to the sink. | |
399 | |
400 A workflow platform may choose to only implement the Command Line Tool | |
401 Description part of the CWL specification. | |
402 | |
403 It is intended that the workflow platform has broad leeway outside of this | |
404 specification to optimize use of computing resources and enforce policies | |
405 not covered by this specification. Some areas that are currently out of | |
406 scope for CWL specification but may be handled by a specific workflow | |
407 platform include: | |
408 | |
409 * Data security and permissions | |
410 * Scheduling tool invocations on remote cluster or cloud compute nodes. | |
411 * Using virtual machines or operating system containers to manage the runtime | |
412 (except as described in [DockerRequirement](CommandLineTool.html#DockerRequirement)). | |
413 * Using remote or distributed file systems to manage input and output files. | |
414 * Transforming file paths. | |
415 * Pausing, resuming or checkpointing processes or workflows. | |
416 | |
417 Conforming CWL processes must not assume anything about the runtime | |
418 environment or workflow platform unless explicitly declared though the use | |
419 of [process requirements](#Requirements_and_hints). | |
420 | |
421 ## Generic execution process | |
422 | |
423 The generic execution sequence of a CWL process (including workflows | |
424 and command line line tools) is as follows. Processes are | |
425 modeled as functions that consume an input object and produce an | |
426 output object. | |
427 | |
428 1. Load input object. | |
429 1. Load, process and validate a CWL document, yielding one or more process objects. | |
430 The [`$namespaces`](SchemaSalad.html#Explicit_context) present in the CWL document | |
431 are also used when validating and processing the input object. | |
432 1. If there are multiple process objects (due to [`$graph`](SchemaSalad.html#Document_graph)) | |
433 and which process object to start with is not specified in the input object (via | |
434 a [`cwl:tool`](#Executing_CWL_documents_as_scripts) entry) or by any other means | |
435 (like a URL fragment) then choose the process with the `id` of "#main" or "main". | |
436 1. Validate the input object against the `inputs` schema for the process. | |
437 1. Validate process requirements are met. | |
438 1. Perform any further setup required by the specific process type. | |
439 1. Execute the process. | |
440 1. Capture results of process execution into the output object. | |
441 1. Validate the output object against the `outputs` schema for the process. | |
442 1. Report the output object to the process caller. | |
443 | |
444 ## Requirements and hints | |
445 | |
446 A **process requirement** modifies the semantics or runtime | |
447 environment of a process. If an implementation cannot satisfy all | |
448 requirements, or a requirement is listed which is not recognized by the | |
449 implementation, it is a fatal error and the implementation must not attempt | |
450 to run the process, unless overridden at user option. | |
451 | |
452 A **hint** is similar to a requirement; however, it is not an error if an | |
453 implementation cannot satisfy all hints. The implementation may report a | |
454 warning if a hint cannot be satisfied. | |
455 | |
456 Optionally, implementations may allow requirements to be specified in the input | |
457 object document as an array of requirements under the field name | |
458 `cwl:requirements`. If implementations allow this, then such requirements | |
459 should be combined with any requirements present in the corresponding Process | |
460 as if they were specified there. | |
461 | |
462 Requirements specified in a parent Workflow are inherited by step processes | |
463 if they are valid for that step. If the substep is a CommandLineTool | |
464 only the `InlineJavascriptRequirement`, `SchemaDefRequirement`, `DockerRequirement`, | |
465 `SoftwareRequirement`, `InitialWorkDirRequirement`, `EnvVarRequirement`, | |
466 `ShellCommandRequirement`, `ResourceRequirement` are valid. | |
467 | |
468 *As good practice, it is best to have process requirements be self-contained, | |
469 such that each process can run successfully by itself.* | |
470 | |
471 If the same process requirement appears at different levels of the | |
472 workflow, the most specific instance of the requirement is used, that is, | |
473 an entry in `requirements` on a process implementation such as | |
474 CommandLineTool will take precedence over an entry in `requirements` | |
475 specified in a workflow step, and an entry in `requirements` on a workflow | |
476 step takes precedence over the workflow. Entries in `hints` are resolved | |
477 the same way. | |
478 | |
479 Requirements override hints. If a process implementation provides a | |
480 process requirement in `hints` which is also provided in `requirements` by | |
481 an enclosing workflow or workflow step, the enclosing `requirements` takes | |
482 precedence. | |
483 | |
484 ## Parameter references | |
485 | |
486 Parameter references are denoted by the syntax `$(...)` and may be used in any | |
487 field permitting the pseudo-type `Expression`, as specified by this document. | |
488 Conforming implementations must support parameter references. Parameter | |
489 references use the following subset of | |
490 [Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/) | |
491 syntax, but they are designed to not require a Javascript engine for evaluation. | |
492 | |
493 In the following [BNF grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form), | |
494 character classes and grammar rules are denoted in '{}', '-' denotes | |
495 exclusion from a character class, '(())' denotes grouping, '|' denotes | |
496 alternates, trailing '*' denotes zero or more repeats, '+' denote one | |
497 or more repeats, and all other characters are literal values. | |
498 | |
499 <p> | |
500 <table class="table"> | |
501 <tr><td>symbol:: </td><td>{Unicode alphanumeric}+</td></tr> | |
502 <tr><td>singleq:: </td><td>[' (( {character - { | \ ' \} } ))* ']</td></tr> | |
503 <tr><td>doubleq:: </td><td>[" (( {character - { | \ " \} } ))* "]</td></tr> | |
504 <tr><td>index:: </td><td>[ {decimal digit}+ ]</td></tr> | |
505 <tr><td>segment:: </td><td>. {symbol} | {singleq} | {doubleq} | {index}</td></tr> | |
506 <tr><td>parameter reference::</td><td>$( {symbol} {segment}*)</td></tr> | |
507 </table> | |
508 </p> | |
509 | |
510 Use the following algorithm to resolve a parameter reference: | |
511 | |
512 1. Match the leading symbol as the key | |
513 2. Look up the key in the parameter context (described below) to get the current value. | |
514 It is an error if the key is not found in the parameter context. | |
515 3. If there are no subsequent segments, terminate and return current value | |
516 4. Else, match the next segment | |
517 5. Extract the symbol, string, or index from the segment as the key | |
518 6. Look up the key in current value and assign as new current value. If | |
519 the key is a symbol or string, the current value must be an object. | |
520 If the key is an index, the current value must be an array or string. | |
521 It is an error if the key does not match the required type, or the key is not found or out | |
522 of range. | |
523 7. Repeat steps 3-6 | |
524 | |
525 The root namespace is the parameter context. The following parameters must | |
526 be provided: | |
527 | |
528 * `inputs`: The input object to the current Process. | |
529 * `self`: A context-specific value. The contextual values for 'self' are | |
530 documented for specific fields elsewhere in this specification. If | |
531 a contextual value of 'self' is not documented for a field, it | |
532 must be 'null'. | |
533 * `runtime`: An object containing configuration details. Specific to the | |
534 process type. An implementation may provide | |
535 opaque strings for any or all fields of `runtime`. These must be | |
536 filled in by the platform after processing the Tool but before actual | |
537 execution. Parameter references and expressions may only use the | |
538 literal string value of the field and must not perform computation on | |
539 the contents, except where noted otherwise. | |
540 | |
541 If the value of a field has no leading or trailing non-whitespace | |
542 characters around a parameter reference, the effective value of the field | |
543 becomes the value of the referenced parameter, preserving the return type. | |
544 | |
545 ### String interpolation | |
546 | |
547 If the value of a field has non-whitespace leading or trailing characters | |
548 around a parameter reference, it is subject to string interpolation. The | |
549 effective value of the field is a string containing the leading characters, | |
550 followed by the string value of the parameter reference, followed by the | |
551 trailing characters. The string value of the parameter reference is its | |
552 textual JSON representation with the following rules: | |
553 | |
554 * Strings are replaced the literal text of the string, any escaped | |
555 characters replaced by the literal characters they represent, and | |
556 there are no leading or trailing quotes. | |
557 * Objects entries are sorted by key | |
558 | |
559 Multiple parameter references may appear in a single field. This case | |
560 must be treated as a string interpolation. After interpolating the first | |
561 parameter reference, interpolation must be recursively applied to the | |
562 trailing characters to yield the final string value. | |
563 | |
564 When text embedded in a CWL file represents code for another | |
565 programming language, the use of `$(...)` (and `${...}` in the case of | |
566 expressions) may conflict with the syntax of that language. For | |
567 example, when writing shell scripts, `$(...)` is used to execute a | |
568 command in a subshell and replace a portion of the command line with | |
569 the standard output of that command. | |
570 | |
571 The following escaping rules apply. The scanner makes a single pass | |
572 from start to end with 3-character lookahead. After performing a | |
573 replacement scanning resumes at the next character following the | |
574 replaced substring. | |
575 | |
576 1. The substrings `\$(` and `\${` are replaced by `$(` and `${` | |
577 respectively. No parameter or expression evaluation | |
578 interpolation occurs. | |
579 2. A double backslash `\\` is replaced by a single backslash `\`. | |
580 3. A substring starting with a backslash that does not match one of | |
581 the previous rules is left unchanged. | |
582 | |
583 ## Expressions (Optional) | |
584 | |
585 An expression is a fragment of [Javascript/ECMAScript | |
586 5.1](http://www.ecma-international.org/ecma-262/5.1/) code evaluated by the | |
587 workflow platform to affect the inputs, outputs, or | |
588 behavior of a process. In the generic execution sequence, expressions may | |
589 be evaluated during step 5 (process setup), step 6 (execute process), | |
590 and/or step 7 (capture output). Expressions are distinct from regular | |
591 processes in that they are intended to modify the behavior of the workflow | |
592 itself rather than perform the primary work of the workflow. | |
593 | |
594 Expressions in CWL are an optional feature and are not required to be | |
595 implemented by all consumers of CWL documents. They should be used sparingly, | |
596 when there is no other way to achieve the desired outcome. Excessive use of | |
597 expressions may be a signal that other refactoring of the tools or workflows | |
598 would benefit the author, runtime, and users of the CWL document in question. | |
599 | |
600 To declare the use of expressions, the document must include the process | |
601 requirement `InlineJavascriptRequirement`. Expressions may be used in any | |
602 field permitting the pseudo-type `Expression`, as specified by this | |
603 document. | |
604 | |
605 Expressions are denoted by the syntax `$(...)` or `${...}`. A code | |
606 fragment wrapped in the `$(...)` syntax must be evaluated as a | |
607 [ECMAScript expression](http://www.ecma-international.org/ecma-262/5.1/#sec-11). A | |
608 code fragment wrapped in the `${...}` syntax must be evaluated as a | |
609 [ECMAScript function body](http://www.ecma-international.org/ecma-262/5.1/#sec-13) | |
610 for an anonymous, zero-argument function. Expressions must return a valid JSON | |
611 data type: one of null, string, number, boolean, array, object. Other return | |
612 values must result in a `permanentFailure`. Implementations must permit any | |
613 syntactically valid Javascript and account for nesting of parenthesis or braces | |
614 and that strings that may contain parenthesis or braces when scanning for | |
615 expressions. | |
616 | |
617 The runtime must include any code defined in the ["expressionLib" field of | |
618 InlineJavascriptRequirement](#InlineJavascriptRequirement) prior to | |
619 executing the actual expression. | |
620 | |
621 Before executing the expression, the runtime must initialize as global | |
622 variables the fields of the parameter context described above. | |
623 | |
624 The effective value of the field after expression evaluation follows the | |
625 same rules as parameter references discussed above. Multiple expressions | |
626 may appear in a single field. | |
627 | |
628 Expressions must be evaluated in an isolated context (a "sandbox") which | |
629 permits no side effects to leak outside the context. Expressions also must | |
630 be evaluated in [Javascript strict mode](http://www.ecma-international.org/ecma-262/5.1/#sec-4.2.2). | |
631 | |
632 The order in which expressions are evaluated is undefined except where | |
633 otherwise noted in this document. | |
634 | |
635 An implementation may choose to implement parameter references by | |
636 evaluating as a Javascript expression. The results of evaluating | |
637 parameter references must be identical whether implemented by Javascript | |
638 evaluation or some other means. | |
639 | |
640 Implementations may apply other limits, such as process isolation, timeouts, | |
641 and operating system containers/jails to minimize the security risks associated | |
642 with running untrusted code embedded in a CWL document. | |
643 | |
644 Javascript exceptions thrown from a CWL expression must result in a | |
645 `permanentFailure` of the CWL process. | |
646 | |
647 ## Executing CWL documents as scripts | |
648 | |
649 By convention, a CWL document may begin with `#!/usr/bin/env cwl-runner` | |
650 and be marked as executable (the POSIX "+x" permission bits) to enable it | |
651 to be executed directly. A workflow platform may support this mode of | |
652 operation; if so, it must provide `cwl-runner` as an alias for the | |
653 platform's CWL implementation. | |
654 | |
655 A CWL input object document may similarly begin with `#!/usr/bin/env | |
656 cwl-runner` and be marked as executable. In this case, the input object | |
657 must include the field `cwl:tool` supplying an IRI to the default CWL | |
658 document that should be executed using the fields of the input object as | |
659 input parameters. | |
660 | |
661 The `cwl-runner` interface is required for conformance testing and is | |
662 documented in [cwl-runner.cwl](cwl-runner.cwl). | |
663 | |
664 ## Discovering CWL documents on a local filesystem | |
665 | |
666 To discover CWL documents look in the following locations: | |
667 | |
668 For each value in the `XDG_DATA_DIRS` environment variable (which is a `:` colon | |
669 separated list), check the `./commonwl` subdirectory. If `XDG_DATA_DIRS` is | |
670 unset or empty, then check using the default value for `XDG_DATA_DIRS`: | |
671 `/usr/local/share/:/usr/share/` (That is to say, check `/usr/share/commonwl/` | |
672 and `/usr/local/share/commonwl/`) | |
673 | |
674 Then check `$XDG_DATA_HOME/commonwl/`. | |
675 | |
676 If the `XDG_DATA_HOME` environment variable is unset, its default value is | |
677 `$HOME/.local/share` (That is to say, check `$HOME/.local/share/commonwl`) | |
678 | |
679 `$XDG_DATA_HOME` and `$XDG_DATA_DIRS` are from the [XDG Base Directory | |
680 Specification](http://standards.freedesktop.org/basedir-spec/basedir-spec-0.6.html) |