Mercurial > repos > shellac > guppy_basecaller
comparison env/lib/python3.7/site-packages/cwltool/schemas/v1.0/concepts.md @ 0:26e78fe6e8c4 draft
"planemo upload commit c699937486c35866861690329de38ec1a5d9f783"
| author | shellac |
|---|---|
| date | Sat, 02 May 2020 07:14:21 -0400 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:26e78fe6e8c4 |
|---|---|
| 1 ## References to other specifications | |
| 2 | |
| 3 **Javascript Object Notation (JSON)**: http://json.org | |
| 4 | |
| 5 **JSON Linked Data (JSON-LD)**: http://json-ld.org | |
| 6 | |
| 7 **YAML**: http://yaml.org | |
| 8 | |
| 9 **Avro**: https://avro.apache.org/docs/1.8.1/spec.html | |
| 10 | |
| 11 **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) | |
| 12 | |
| 13 **Internationalized Resource Identifiers (IRIs)**: | |
| 14 https://tools.ietf.org/html/rfc3987 | |
| 15 | |
| 16 **Portable Operating System Interface (POSIX.1-2008)**: http://pubs.opengroup.org/onlinepubs/9699919799/ | |
| 17 | |
| 18 **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ | |
| 19 | |
| 20 ## Scope | |
| 21 | |
| 22 This document describes CWL syntax, execution, and object model. It | |
| 23 is not intended to document a CWL specific implementation, however it may | |
| 24 serve as a reference for the behavior of conforming implementations. | |
| 25 | |
| 26 ## Terminology | |
| 27 | |
| 28 The terminology used to describe CWL documents is defined in the | |
| 29 Concepts section of the specification. The terms defined in the | |
| 30 following list are used in building those definitions and in describing the | |
| 31 actions of a CWL implementation: | |
| 32 | |
| 33 **may**: Conforming CWL documents and CWL implementations are permitted but | |
| 34 not required to behave as described. | |
| 35 | |
| 36 **must**: Conforming CWL documents and CWL implementations are required to behave | |
| 37 as described; otherwise they are in error. | |
| 38 | |
| 39 **error**: A violation of the rules of this specification; results are | |
| 40 undefined. Conforming implementations may detect and report an error and may | |
| 41 recover from it. | |
| 42 | |
| 43 **fatal error**: A violation of the rules of this specification; results are | |
| 44 undefined. Conforming implementations must not continue to execute the current | |
| 45 process and may report an error. | |
| 46 | |
| 47 **at user option**: Conforming software may or must (depending on the modal verb in | |
| 48 the sentence) behave as described; if it does, it must provide users a means to | |
| 49 enable or disable the behavior described. | |
| 50 | |
| 51 **deprecated**: Conforming software may implement a behavior for backwards | |
| 52 compatibility. Portable CWL documents should not rely on deprecated behavior. | |
| 53 Behavior marked as deprecated may be removed entirely from future revisions of | |
| 54 the CWL specification. | |
| 55 | |
| 56 # Data model | |
| 57 | |
| 58 ## Data concepts | |
| 59 | |
| 60 An **object** is a data structure equivalent to the "object" type in JSON, | |
| 61 consisting of a unordered set of name/value pairs (referred to here as | |
| 62 **fields**) and where the name is a string and the value is a string, number, | |
| 63 boolean, array, or object. | |
| 64 | |
| 65 A **document** is a file containing a serialized object, or an array of objects. | |
| 66 | |
| 67 A **process** is a basic unit of computation which accepts input data, | |
| 68 performs some computation, and produces output data. Examples include | |
| 69 CommandLineTools, Workflows, and ExpressionTools. | |
| 70 | |
| 71 An **input object** is an object describing the inputs to an invocation of | |
| 72 a process. | |
| 73 | |
| 74 An **output object** is an object describing the output resulting from an | |
| 75 invocation of a process. | |
| 76 | |
| 77 An **input schema** describes the valid format (required fields, data types) | |
| 78 for an input object. | |
| 79 | |
| 80 An **output schema** describes the valid format for an output object. | |
| 81 | |
| 82 **Metadata** is information about workflows, tools, or input items. | |
| 83 | |
| 84 ## Syntax | |
| 85 | |
| 86 CWL documents must consist of an object or array of objects represented using | |
| 87 JSON or YAML syntax. Upon loading, a CWL implementation must apply the | |
| 88 preprocessing steps described in the | |
| 89 [Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html). | |
| 90 An implementation may formally validate the structure of a CWL document using | |
| 91 SALAD schemas located at | |
| 92 https://github.com/common-workflow-language/common-workflow-language/tree/master/v1.0 | |
| 93 | |
| 94 ## Identifiers | |
| 95 | |
| 96 If an object contains an `id` field, that is used to uniquely identify the | |
| 97 object in that document. The value of the `id` field must be unique over the | |
| 98 entire document. Identifiers may be resolved relative to either the document | |
| 99 base and/or other identifiers following the rules are described in the | |
| 100 [Schema Salad specification](SchemaSalad.html#Identifier_resolution). | |
| 101 | |
| 102 An implementation may choose to only honor references to object types for | |
| 103 which the `id` field is explicitly listed in this specification. | |
| 104 | |
| 105 ## Document preprocessing | |
| 106 | |
| 107 An implementation must resolve [$import](SchemaSalad.html#Import) and | |
| 108 [$include](SchemaSalad.html#Import) directives as described in the | |
| 109 [Schema Salad specification](SchemaSalad.html). | |
| 110 | |
| 111 Another transformation defined in Schema salad is simplification of data type definitions. | |
| 112 Type `<T>` ending with `?` should be transformed to `[<T>, "null"]`. | |
| 113 Type `<T>` ending with `[]` should be transformed to `{"type": "array", "items": <T>}` | |
| 114 | |
| 115 ## Extensions and metadata | |
| 116 | |
| 117 Input metadata (for example, a lab sample identifier) may be represented within | |
| 118 a tool or workflow using input parameters which are explicitly propagated to | |
| 119 output. Future versions of this specification may define additional facilities | |
| 120 for working with input/output metadata. | |
| 121 | |
| 122 Implementation extensions not required for correct execution (for example, | |
| 123 fields related to GUI presentation) and metadata about the tool or workflow | |
| 124 itself (for example, authorship for use in citations) may be provided as | |
| 125 additional fields on any object. Such extensions fields must use a namespace | |
| 126 prefix listed in the `$namespaces` section of the document as described in the | |
| 127 [Schema Salad specification](SchemaSalad.html#Explicit_context). | |
| 128 | |
| 129 Implementation extensions which modify execution semantics must be [listed in | |
| 130 the `requirements` field](#Requirements_and_hints). | |
| 131 | |
| 132 # Execution model | |
| 133 | |
| 134 ## Execution concepts | |
| 135 | |
| 136 A **parameter** is a named symbolic input or output of process, with an | |
| 137 associated datatype or schema. During execution, values are assigned to | |
| 138 parameters to make the input object or output object used for concrete | |
| 139 process invocation. | |
| 140 | |
| 141 A **CommandLineTool** is a process characterized by the execution of a | |
| 142 standalone, non-interactive program which is invoked on some input, | |
| 143 produces output, and then terminates. | |
| 144 | |
| 145 A **workflow** is a process characterized by multiple subprocess steps, | |
| 146 where step outputs are connected to the inputs of downstream steps to | |
| 147 form a directed acylic graph, and independent steps may run concurrently. | |
| 148 | |
| 149 A **runtime environment** is the actual hardware and software environment when | |
| 150 executing a command line tool. It includes, but is not limited to, the | |
| 151 hardware architecture, hardware resources, operating system, software runtime | |
| 152 (if applicable, such as the specific Python interpreter or the specific Java | |
| 153 virtual machine), libraries, modules, packages, utilities, and data files | |
| 154 required to run the tool. | |
| 155 | |
| 156 A **workflow platform** is a specific hardware and software implementation | |
| 157 capable of interpreting CWL documents and executing the processes specified by | |
| 158 the document. The responsibilities of the workflow platform may include | |
| 159 scheduling process invocation, setting up the necessary runtime environment, | |
| 160 making input data available, invoking the tool process, and collecting output. | |
| 161 | |
| 162 A workflow platform may choose to only implement the Command Line Tool | |
| 163 Description part of the CWL specification. | |
| 164 | |
| 165 It is intended that the workflow platform has broad leeway outside of this | |
| 166 specification to optimize use of computing resources and enforce policies | |
| 167 not covered by this specification. Some areas that are currently out of | |
| 168 scope for CWL specification but may be handled by a specific workflow | |
| 169 platform include: | |
| 170 | |
| 171 * Data security and permissions | |
| 172 * Scheduling tool invocations on remote cluster or cloud compute nodes. | |
| 173 * Using virtual machines or operating system containers to manage the runtime | |
| 174 (except as described in [DockerRequirement](CommandLineTool.html#DockerRequirement)). | |
| 175 * Using remote or distributed file systems to manage input and output files. | |
| 176 * Transforming file paths. | |
| 177 * Determining if a process has previously been executed, and if so skipping it | |
| 178 and reusing previous results. | |
| 179 * Pausing, resuming or checkpointing processes or workflows. | |
| 180 | |
| 181 Conforming CWL processes must not assume anything about the runtime | |
| 182 environment or workflow platform unless explicitly declared though the use | |
| 183 of [process requirements](#Requirements_and_hints). | |
| 184 | |
| 185 ## Generic execution process | |
| 186 | |
| 187 The generic execution sequence of a CWL process (including workflows and | |
| 188 command line line tools) is as follows. | |
| 189 | |
| 190 1. Load, process and validate a CWL document, yielding a process object. | |
| 191 2. Load input object. | |
| 192 3. Validate the input object against the `inputs` schema for the process. | |
| 193 4. Validate process requirements are met. | |
| 194 5. Perform any further setup required by the specific process type. | |
| 195 6. Execute the process. | |
| 196 7. Capture results of process execution into the output object. | |
| 197 8. Validate the output object against the `outputs` schema for the process. | |
| 198 9. Report the output object to the process caller. | |
| 199 | |
| 200 ## Requirements and hints | |
| 201 | |
| 202 A **process requirement** modifies the semantics or runtime | |
| 203 environment of a process. If an implementation cannot satisfy all | |
| 204 requirements, or a requirement is listed which is not recognized by the | |
| 205 implementation, it is a fatal error and the implementation must not attempt | |
| 206 to run the process, unless overridden at user option. | |
| 207 | |
| 208 A **hint** is similar to a requirement; however, it is not an error if an | |
| 209 implementation cannot satisfy all hints. The implementation may report a | |
| 210 warning if a hint cannot be satisfied. | |
| 211 | |
| 212 Requirements are inherited. A requirement specified in a Workflow applies | |
| 213 to all workflow steps; a requirement specified on a workflow step will | |
| 214 apply to the process implementation of that step and any of its substeps. | |
| 215 | |
| 216 If the same process requirement appears at different levels of the | |
| 217 workflow, the most specific instance of the requirement is used, that is, | |
| 218 an entry in `requirements` on a process implementation such as | |
| 219 CommandLineTool will take precedence over an entry in `requirements` | |
| 220 specified in a workflow step, and an entry in `requirements` on a workflow | |
| 221 step takes precedence over the workflow. Entries in `hints` are resolved | |
| 222 the same way. | |
| 223 | |
| 224 Requirements override hints. If a process implementation provides a | |
| 225 process requirement in `hints` which is also provided in `requirements` by | |
| 226 an enclosing workflow or workflow step, the enclosing `requirements` takes | |
| 227 precedence. | |
| 228 | |
| 229 ## Parameter references | |
| 230 | |
| 231 Parameter references are denoted by the syntax `$(...)` and may be used in any | |
| 232 field permitting the pseudo-type `Expression`, as specified by this document. | |
| 233 Conforming implementations must support parameter references. Parameter | |
| 234 references use the following subset of | |
| 235 [Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/) | |
| 236 syntax, but they are designed to not require a Javascript engine for evaluation. | |
| 237 | |
| 238 In the following BNF grammar, character classes, and grammar rules are denoted | |
| 239 in '{}', '-' denotes exclusion from a character class, '(())' denotes grouping, | |
| 240 '|' denotes alternates, trailing '*' denotes zero or more repeats, '+' denote | |
| 241 one or more repeats, '/' escapes these special characters, and all other | |
| 242 characters are literal values. | |
| 243 | |
| 244 <p> | |
| 245 <table class="table"> | |
| 246 <tr><td>symbol:: </td><td>{Unicode alphanumeric}+</td></tr> | |
| 247 <tr><td>singleq:: </td><td>[' (( {character - '} | \' ))* ']</td></tr> | |
| 248 <tr><td>doubleq:: </td><td>[" (( {character - "} | \" ))* "]</td></tr> | |
| 249 <tr><td>index:: </td><td>[ {decimal digit}+ ]</td></tr> | |
| 250 <tr><td>segment:: </td><td>. {symbol} | {singleq} | {doubleq} | {index}</td></tr> | |
| 251 <tr><td>parameter reference::</td><td>$( {symbol} {segment}*)</td></tr> | |
| 252 </table> | |
| 253 </p> | |
| 254 | |
| 255 Use the following algorithm to resolve a parameter reference: | |
| 256 | |
| 257 1. Match the leading symbol as the key | |
| 258 2. Look up the key in the parameter context (described below) to get the current value. | |
| 259 It is an error if the key is not found in the parameter context. | |
| 260 3. If there are no subsequent segments, terminate and return current value | |
| 261 4. Else, match the next segment | |
| 262 5. Extract the symbol, string, or index from the segment as the key | |
| 263 6. Look up the key in current value and assign as new current value. If | |
| 264 the key is a symbol or string, the current value must be an object. | |
| 265 If the key is an index, the current value must be an array or string. | |
| 266 It is an error if the key does not match the required type, or the key is not found or out | |
| 267 of range. | |
| 268 7. Repeat steps 3-6 | |
| 269 | |
| 270 The root namespace is the parameter context. The following parameters must | |
| 271 be provided: | |
| 272 | |
| 273 * `inputs`: The input object to the current Process. | |
| 274 * `self`: A context-specific value. The contextual values for 'self' are | |
| 275 documented for specific fields elsewhere in this specification. If | |
| 276 a contextual value of 'self' is not documented for a field, it | |
| 277 must be 'null'. | |
| 278 * `runtime`: An object containing configuration details. Specific to the | |
| 279 process type. An implementation may provide | |
| 280 opaque strings for any or all fields of `runtime`. These must be | |
| 281 filled in by the platform after processing the Tool but before actual | |
| 282 execution. Parameter references and expressions may only use the | |
| 283 literal string value of the field and must not perform computation on | |
| 284 the contents, except where noted otherwise. | |
| 285 | |
| 286 If the value of a field has no leading or trailing non-whitespace | |
| 287 characters around a parameter reference, the effective value of the field | |
| 288 becomes the value of the referenced parameter, preserving the return type. | |
| 289 | |
| 290 If the value of a field has non-whitespace leading or trailing characters | |
| 291 around a parameter reference, it is subject to string interpolation. The | |
| 292 effective value of the field is a string containing the leading characters, | |
| 293 followed by the string value of the parameter reference, followed by the | |
| 294 trailing characters. The string value of the parameter reference is its | |
| 295 textual JSON representation with the following rules: | |
| 296 | |
| 297 * Leading and trailing quotes are stripped from strings | |
| 298 * Objects entries are sorted by key | |
| 299 | |
| 300 Multiple parameter references may appear in a single field. This case | |
| 301 must be treated as a string interpolation. After interpolating the first | |
| 302 parameter reference, interpolation must be recursively applied to the | |
| 303 trailing characters to yield the final string value. | |
| 304 | |
| 305 ## Expressions | |
| 306 | |
| 307 An expression is a fragment of [Javascript/ECMAScript | |
| 308 5.1](http://www.ecma-international.org/ecma-262/5.1/) code evaluated by the | |
| 309 workflow platform to affect the inputs, outputs, or | |
| 310 behavior of a process. In the generic execution sequence, expressions may | |
| 311 be evaluated during step 5 (process setup), step 6 (execute process), | |
| 312 and/or step 7 (capture output). Expressions are distinct from regular | |
| 313 processes in that they are intended to modify the behavior of the workflow | |
| 314 itself rather than perform the primary work of the workflow. | |
| 315 | |
| 316 To declare the use of expressions, the document must include the process | |
| 317 requirement `InlineJavascriptRequirement`. Expressions may be used in any | |
| 318 field permitting the pseudo-type `Expression`, as specified by this | |
| 319 document. | |
| 320 | |
| 321 Expressions are denoted by the syntax `$(...)` or `${...}`. A code | |
| 322 fragment wrapped in the `$(...)` syntax must be evaluated as a | |
| 323 [ECMAScript expression](http://www.ecma-international.org/ecma-262/5.1/#sec-11). A | |
| 324 code fragment wrapped in the `${...}` syntax must be evaluated as a | |
| 325 [ECMAScript function body](http://www.ecma-international.org/ecma-262/5.1/#sec-13) | |
| 326 for an anonymous, zero-argument function. Expressions must return a valid JSON | |
| 327 data type: one of null, string, number, boolean, array, object. Other return | |
| 328 values must result in a `permanentFailure`. Implementations must permit any | |
| 329 syntactically valid Javascript and account for nesting of parenthesis or braces | |
| 330 and that strings that may contain parenthesis or braces when scanning for | |
| 331 expressions. | |
| 332 | |
| 333 The runtime must include any code defined in the ["expressionLib" field of | |
| 334 InlineJavascriptRequirement](#InlineJavascriptRequirement) prior to | |
| 335 executing the actual expression. | |
| 336 | |
| 337 Before executing the expression, the runtime must initialize as global | |
| 338 variables the fields of the parameter context described above. | |
| 339 | |
| 340 The effective value of the field after expression evaluation follows the | |
| 341 same rules as parameter references discussed above. Multiple expressions | |
| 342 may appear in a single field. | |
| 343 | |
| 344 Expressions must be evaluated in an isolated context (a "sandbox") which | |
| 345 permits no side effects to leak outside the context. Expressions also must | |
| 346 be evaluated in [Javascript strict mode](http://www.ecma-international.org/ecma-262/5.1/#sec-4.2.2). | |
| 347 | |
| 348 The order in which expressions are evaluated is undefined except where | |
| 349 otherwise noted in this document. | |
| 350 | |
| 351 An implementation may choose to implement parameter references by | |
| 352 evaluating as a Javascript expression. The results of evaluating | |
| 353 parameter references must be identical whether implemented by Javascript | |
| 354 evaluation or some other means. | |
| 355 | |
| 356 Implementations may apply other limits, such as process isolation, timeouts, | |
| 357 and operating system containers/jails to minimize the security risks associated | |
| 358 with running untrusted code embedded in a CWL document. | |
| 359 | |
| 360 Exceptions thrown from an exception must result in a `permanentFailure` of the | |
| 361 process. | |
| 362 | |
| 363 ## Executing CWL documents as scripts | |
| 364 | |
| 365 By convention, a CWL document may begin with `#!/usr/bin/env cwl-runner` | |
| 366 and be marked as executable (the POSIX "+x" permission bits) to enable it | |
| 367 to be executed directly. A workflow platform may support this mode of | |
| 368 operation; if so, it must provide `cwl-runner` as an alias for the | |
| 369 platform's CWL implementation. | |
| 370 | |
| 371 A CWL input object document may similarly begin with `#!/usr/bin/env | |
| 372 cwl-runner` and be marked as executable. In this case, the input object | |
| 373 must include the field `cwl:tool` supplying an IRI to the default CWL | |
| 374 document that should be executed using the fields of the input object as | |
| 375 input parameters. | |
| 376 | |
| 377 ## Discovering CWL documents on a local filesystem | |
| 378 | |
| 379 To discover CWL documents look in the following locations: | |
| 380 | |
| 381 `/usr/share/commonwl/` | |
| 382 | |
| 383 `/usr/local/share/commonwl/` | |
| 384 | |
| 385 `$XDG_DATA_HOME/commonwl/` (usually `$HOME/.local/share/commonwl`) | |
| 386 | |
| 387 `$XDG_DATA_HOME` is from the [XDG Base Directory | |
| 388 Specification](http://standards.freedesktop.org/basedir-spec/basedir-spec-0.6.html) |
