Apply the ChEMBL chemical curation pipeline to a set of chemical structures in SDF
format. The pipeline is described in detail in the citation provided (Bento et al.,
2020).
- The pipeline consists of three components:
- a Standardizer which formats compounds according to defined rules and conventions, based mostly on FDA/IUPAC guidelines.
- a GetParent component that removes any salts and solvents from the compound to create its parent.
- a Checker to test the validity of chemical structures and flag any serious errors. Errors are given a code from 0 (least serious) to 10 (most serious), the highest of which is stored in the SDF field <MaxPenaltyScore>. A list of all errors encountered is recorded under <IssueMessages>.
Either one or more of these protocols can be applied in a single Galaxy job.
Input
One or more molecules in MOL/SDF format.
Output
A MOL/SD-file containing the processed molecules.