Tutorial

Note

Even if data for this tutorial are already in the docs/examples folder. You have to change at least cns_executable and host_executable with the path of your CNS_ executable (not supplied in this package) in the configuration file . Otherwise, this example will not work.

Structure calculation with EC restraints

We show here an example of de novo structure prediction from GREMLIN_ contacts combined with secondary structure prediction with the ariaec Command Line Interface (CLI) is the tool for converting and analyze contact map information. . The files related to this example can be found in the docs folder or here.

Configuration file

All the parameters for ariaec commands are encapsulated on a configuration file in INI format. Each time you need to overwrite the default parameters, another configuration file can be used with the updated parameters. There is no need to give all the parameters in order to have a correct configuration file.

A more detailed description of the parameters is in Configuration section.

Restraints & project conversion

The main command of this interface is ariaec setup which convert EC data and create an ARIA project XML file. Then we can follow the usual steps like an usual ARIA project.

(venv) [user@host tmp] > ariaec setup bpt1/data/BPT1_BOVIN.fa bpt1/data/BPT1_BOVIN_contacts.gremlin.out -t gremlin -s bpt1/data/BPT1_BOVIN.indextableplus -o /tmp -c bpt1/data/config.ini
================================================================================

                         ARIA Evolutive Contact toolbox

================================================================================

INFO     Initialize settings
INFO     Updating settings according to config file
INFO     Making output directories
INFO     Reading fasta file /tmp/bpt1/data/bpt1_bovin.fa
INFO     Amino acid sequence:   FCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCG
INFO     Checking if file /tmp/bpt1/data/BPT1_BOVIN.indextableplus correspond to indextableplus format
INFO     Format type correct (indextableplus)
INFO     Reading secondary structure file /tmp/bpt1/data/BPT1_BOVIN.indextableplus [indextableplus]
INFO     Loading ss dist file
INFO     Reading distance file /.conda/envs/aria/lib/python2.7/site-packages/aria/conbox/data/ss_dist.txt
INFO     Align secondary structure sequence with protein sequence
INFO     Reading /tmp/bpt1/data/BPT1_BOVIN_contacts.gremlin.out file
INFO     Filtering gremlin contact map
INFO     ...Position filter
INFO     Removed 21 contacts.
INFO     ...Conservation filter
INFO     Conservation filter only works with indextableplus files !
INFO     ...Secondary structure clash filter
INFO     Removed 1 contacts.
INFO     ...Disulfure bridge unicity filter
INFO     Removed 11 contacts.
INFO     Setting contact number with treshold 1.0
INFO     Update gremlin maplot
INFO     Update gremlin scoremap
INFO     Select top 53 contacts according to scoremap
writing to the file: /tmp/etc/BPT1_BOVIN.seq
INFO     Load molecule file and convert it into xml format
INFO     [SequenceList]: reading sequence /tmp/etc/BPT1_BOVIN.seq

                         reading /tmp/etc/BPT1_BOVIN.seq
INFO     Writing tbl files ...
INFO        Dihedral restraints for secondary structures (/tmp/tbl/BPT1_BOVIN_dihed.tbl)
INFO        Secondary structure restraints (/tmp/tbl/BPT1_BOVIN_ssdist.tbl)
INFO        Helix bond restraints (/tmp/tbl/BPT1_BOVIN_hbond.tbl)
INFO     Writing gremlin ARIA XML distance restraints
INFO     Using contact scores as selection criteria
INFO     Selecting 53 contacts
INFO     0%|          | 0/53 [00:00<?, ?it/s]
INFO     100%|##########| 53/53 [00:00<00:00, 2549.17it/s]
INFO     Write 53 xml distance restraints in /tmp/xml/BPT1_BOVIN_gremlin.xml
INFO     Loading aria template file /.conda/envs/aria/lib/python2.7/site-packages/aria/conbox/templates/aria_project_v2.3.7.xml
INFO     Writing ARIA project file (/tmp/ariaproject.xml)
INFO     Generate contact file (/tmp/etc/BPT1_BOVIN_gremlin_filtered.contact.txt)

Build infrastructure

Before running the structure calculation pipeline, ARIA needs to build the whole infrastructure of the project.

(venv) [user@host tmp] > aria2 -s bpt1/out/ariaproject.xml
Loading project "ariaproject.xml"...
Setting-up bpt1/out/ariaproject.xml (run: 1)
INFO     [Project]: Protocols copied.
INFO     [Project]: Directory tree created.
INFO     [Project]: Copying data files into local data-directory...

Done.

For running ARIA: aria2 bpt1/out/ariaproject.xml

Running ARIA

We use aria2 command without the -s flag to launch structure calculations

(venv) [user@host tmp] > aria2 --no-test bpt1/out/ariaproject.xml
.. .......................................................................... ..
..          ARIA -- Ambiguous Restraints for Iterative Assignment             ..
..                                                                            ..
..                A software for automated NOE assignment                     ..
..                                                                            ..
..                               Version 2.3                                  ..
..                                                                            ..
..                                                                            ..
.. Copyright (C) Benjamin Bardiaux, Michael Habeck, Therese Malliavin,        ..
..              Wolfgang Rieping, and Michael Nilges                          ..
..                                                                            ..
.. All rights reserved.                                                       ..
..                                                                            ..
.. NO WARRANTY. This software package is provided 'as is' without warranty of ..
.. any kind, expressed or implied, including, but not limited to the implied  ..
.. warranties of merchantability and fitness for a particular purpose or      ..
.. a warranty of non-infringement.                                            ..
..                                                                            ..
.. Distribution of substantively modified versions of this module is          ..
.. prohibited without the explicit permission of the copyright holders.       ..
..                                                                            ..
.. .......................................................................... ..
Loading project "ariaproject.xml"...
INFO     [Project]: Temporary directory has been set to /tmp/BPT1_BOVIN/gremlin/
                    aria_temp.tmpAVTAwy1562832135
INFO     [Project]: Host list check has been disabled.
INFO     [Project]: -------------------- Reading data --------------------
INFO     [Project]: Cache is enabled.
INFO     [Project]: Cache file does not exist. Creating new file.
INFO     [Project]: Reading molecule definition /tmp/bpt1/out/
                    run1/data/sequence/BPT1_BOVIN.xml.
INFO     [Project]: Data files read.
INFO     [Project]: Data files cached.
INFO     [Project]: ------------------- Filtering input data -------------------
INFO     [NOESYSpectrumFilter.TP]: Spectrum filter report written to file "/tmp
                                   /bpt1/out/run1/data/spectra/peak_list.filtered"
INFO     [Project]: ---------------- Preparing structure engine ----------------
INFO     [CNS]: Sequence PDB-file written.
INFO     [CNS]: PSF-file has been created.
INFO     [CNS]: Template PDB-file has been created.
INFO     [Project]: Starting ARIA main protocol on ...
INFO     [Project]: -------------------- Assigning spectra --------------------
INFO     [Protocol]: ---------------------- Iteration 0 -----------------------
INFO     [Protocol]: Calibrating spectrum "gremlin"...
INFO     [Protocol]: Waiting for completion of structure calculation...
...
INFO    [Project]: ARIA run completed at ...

Warning

Most of the time the --no-test flags needs to be activated in order to disables the initial dry run of the commands (specified in the host list) used to launch a structure calculation. Those dry run are not compatible with several hosts.

Structure calculation with EC & NMR restraints

Another possibility is to use an already existing ARIA project with NMR restraints and use ariaec setup with the -p flag to update it with EC restraints.

(venv) [user@host tmp] > ariaec setup malecoli/data/MALE_ECOLI.fa malecoli/data/MALE_ECOLI_contacts.evfold.out -t evcoupling -o malecoli/out -c malecoli/data/config.ini -p malecoli/data/ariaproject_nmr.xml
================================================================================

                         ARIA Evolutive Contact toolbox

================================================================================

INFO     Initialize settings
INFO     Updating settings according to config file
INFO     Making output directories
INFO     Reading fasta file /c7/home/fallain/tmp/malecoli/data/male_ecoli.fa
INFO     Amino acid sequence:   KIEEGKLVIWINGDKGYNGLAEVGKKFEKDTGIKVTVEHPDKLEEKFPQVAATGDGPDIIFWAHDRFGGYAQSGLLAEITPDKAFQDKLYPFTWDAVRYNGKLIAYPIAVEALSLIYNKDLLPNPPKTWEEIPALDKELKAKGKSALMFNLQEPYFTWPLIAADGGYAFKYENGKYDIKDVGVDNAGAKAGLTFLVDLIKNKHMNADTDYSIAEAAFNKGETAMTINGPWAWSNIDTSKVNYGVTVLPTFKGQPSKPFVGVLSAGINAASPNKELAKEFLENYLLTDEGLEAVNKDKPLGAVALKSYEEELAKDPRIAATMENAQKGEIMPNIPQMSAFWYAVRTAVINAASGRQTVDEALKDAQTRITK
INFO     The file format evcoupling is not supported by the conkit plugin. Switching to homemade parsers.
INFO     Checking if file /c7/home/fallain/tmp/malecoli/data/MALE_ECOLI_contacts.evfold.out correspond to our definition of evcoupling format
INFO     Reading /c7/home/fallain/tmp/malecoli/data/MALE_ECOLI_contacts.evfold.out file
INFO     Loading contact file
INFO     Alignment of sequence in contact file (evcoupling) with reference (MALE_ECOLI.fa)
****TGARILALSALTTMMFSASALAKIEEGKLVIWINGDKGYNGLAEVGKKFEKDTGIKVTVEHPDKLEEKFPQVAATGDGPDIIFWAHDRFGGYAQSGLLAEITPDKAFQDKLYPFTWDAVRYNGKLIAYPIAVEALSLIYNKDLLPNPPKTWEEIPALDKELKAKGKSALMFNLQEPYFTWPLIAADGGYAFKYENGKY***DVGVDNAGAKAGLTFLVDLIKNKHMNADTDYSIAEAAFNKGETAMTINGPWAWSNIDTSKVNYGVTVLPTFKGQPSKPFVGVLSAGINAASPNKELAKEFLE*YLLTDEGLEAVNKDKPLGAVALKSYEEELAKDPRIAATMENAQKGEIMPNIPQMSAFWYAVRTAVINAASGRQTVDEALKDAQTRI--
                          ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||...||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||.||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--------------------------KIEEGKLVIWINGDKGYNGLAEVGKKFEKDTGIKVTVEHPDKLEEKFPQVAATGDGPDIIFWAHDRFGGYAQSGLLAEITPDKAFQDKLYPFTWDAVRYNGKLIAYPIAVEALSLIYNKDLLPNPPKTWEEIPALDKELKAKGKSALMFNLQEPYFTWPLIAADGGYAFKYENGKYDIKDVGVDNAGAKAGLTFLVDLIKNKHMNADTDYSIAEAAFNKGETAMTINGPWAWSNIDTSKVNYGVTVLPTFKGQPSKPFVGVLSAGINAASPNKELAKEFLENYLLTDEGLEAVNKDKPLGAVALKSYEEELAKDPRIAATMENAQKGEIMPNIPQMSAFWYAVRTAVINAASGRQTVDEALKDAQTRITK
  Score=364

WARNING  Found a shift of 26 residues in positions given within contact list
INFO     Update index in contact list and remove unassigned contacts
INFO     Remove contacts outside sequence bonds
INFO     Filtering evcoupling contact map
INFO     ...Position filter
INFO     ...Conservation filter
WARNING  No conservation information. Can't use related filter
INFO     ...Secondary structure clash filter
WARNING  No secondary structure information. Can't use secondary structure filter
INFO     ...Disulfure bridge unicity filter
INFO     Setting contact number with treshold 1.0
INFO     Update evcoupling maplot
INFO     Update evcoupling scoremap
INFO     Select top 370 contacts according to scoremap
writing to the file: /c7/home/fallain/tmp/malecoli/out/etc/MALE_ECOLI.seq
INFO     Load molecule file and convert it into xml format
INFO     [SequenceList]: reading sequence /c7/home/fallain/tmp/malecoli/out/etc/
                         MALE_ECOLI.seq
                         reading /c7/home/fallain/tmp/malecoli/out/etc/
                         MALE_ECOLI.seq
INFO     Writing tbl files ...
INFO     Writing evcoupling ARIA XML distance restraints
INFO     Using contact scores as selection criteria
INFO     Selecting 370 contacts
INFO     0%|          | 0/370 [00:00<?, ?it/s]
INFO     100%|##########| 370/370 [00:00<00:00, 1245.07it/s]
INFO     Write 370 xml distance restraints in /c7/home/fallain/tmp/malecoli/out/xml/MALE_ECOLI_evcoupling.xml
INFO     Loading aria template file /c7/home/fallain/Projects/ariaec/src/ariaec/src/aria/conbox/templates/aria_project_v2.3.4.xml
INFO     Directory /baycells/scratch/fallain/tmp/MALE_ECOLI/evcoupling doesn't exist.
INFO     Create new directory /baycells/scratch/fallain/tmp/MALE_ECOLI/evcoupling
INFO     Writing ARIA project file (/c7/home/fallain/tmp/malecoli/out/ariaproject.xml)
INFO     Reading ARIA project file
INFO     Update spectrum data in project
INFO     Update sequence data in project
INFO     Update dihedrals data in project
INFO     Update rdcs data in project
INFO     Update rdcs annealing parameters
INFO     Writing new ARIA xml file (/c7/home/fallain/tmp/malecoli/out/ariaproject.xml)
INFO     Generate contact file (/c7/home/fallain/tmp/malecoli/out/etc/MALE_ECOLI_evcoupling_filtered.contact.txt)

Warning

The ARIA project file is mandatory during the setup step if we want to mix evolutionary with experimental restraints. Only the data section is conserved actually. You still needs to give a configuration file for the other project parameters.

Analysis

Configuration file

A possible way to analyze different output according to their initial parameters in a configuration file is to use the iniconv command. It will convert the file into a comma separated value file which can be more practical for an analysis pipeline.

(venv) [user@host tmp] > ariaec iniconv bpt1/data/config.ini -o bpt1/out
================================================================================

                         ARIA Evolutive Contact toolbox

================================================================================

INFO     Initialize settings
INFO     Reading configuration file(s)
INFO     Generate output csv file (configs.csv)

Contact map analysis

Aside ConKit_ Command Line Interface CLI) tools, the maplot command line has been implemented to show statistics between contact maps and the reference which can be a structure in PDB_ format. The first argument in this command will always be set as the reference map/pdb for the contact map, ROC or precision-recall plots.

(venv) [user@host tmp] > ariaec maplot bpt1/data/BPT1_BOVIN.fa bpt1/data/BPT1_BOVIN.indextableplus  bpt1/data/BPT1_BOVIN.native.aligned.pdb bpt1/data/BPT1_BOVIN_contacts.gremlin.out -o bpt1/out -t pdb gremlin
================================================================================

                         ARIA Evolutive Contact toolbox

================================================================================

INFO     Initialize settings
INFO     Making output directories
INFO     Reading fasta file /tmp/bpt1/data/bpt1_bovin.fa
INFO     Amino acid sequence:   FCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCG
INFO     Checking if file /tmp/bpt1/data/BPT1_BOVIN.indextableplus correspond to indextableplus format
INFO     Format type correct (indextableplus)
INFO     Reading secondary structure file /tmp/bpt1/data/BPT1_BOVIN.indextableplus [indextableplus]
INFO     Loading ss dist file
INFO     Reading distance file /tmp/venv/lib/python2.7/site-packages/aria/conbox/data/ss_dist.txt
INFO     Align secondary structure sequence with protein sequence
INFO     Reading /tmp/bpt1/data/BPT1_BOVIN.native.aligned.pdb file
INFO     Updating distance map with pdb file
INFO     Generate contact map using contact definition defaultdict(None, {'default_cutoff': 8.0})
INFO     Using default cutoff
INFO     Reading /tmp/bpt1/data/BPT1_BOVIN_contacts.gremlin.out file
INFO     Pdb map set as reference
INFO     Generate contact map plot (/tmp/bpt1/out/BPT1_BOVIN.maplot.pdf)
INFO     Generate map report file (/tmp/bpt1/out/mapreport)
INFO     Generate roc file (/tmp/bpt1/out/graphics/maplot.roc.csv)
INFO     Generate roc plot (/tmp/bpt1/out/graphics/maplot.roc.pdf)
INFO     Generate precall file (/tmp/bpt1/out/graphics/maplot.roc.csv)
INFO     Generate precall plot (/tmp/bpt1/out/graphics/maplot.precall.pdf)
INFO     Generate contact file (/tmp/bpt1/out/BPT1_BOVIN_contacts_gremlin.contact.txt)
INFO     Generate stat file (/tmp/bpt1/out/cmp.contactcmp.csv)
INFO
INFO     Contact list: [(1, 39), (1, 42), (1, 51), (3, 22), (3, 25), (3, 45), (3, 47), (3, 51), (4, 23), (4, 38), (4, 39), (4, 42), (4, 49), (5, 23), (6, 19), (7, 37), (8, 12), (8, 31), (8, 33), (9, 19), (9, 34), (9, 37), (9, 41), (10, 33), (10, 36), (11, 33), (11, 35), (12, 8), (12, 20), (12, 31), (12, 33), (13, 33), (13, 34), (14, 31), (14, 33), (14, 36), (15, 31), (16, 29), (16, 31), (17, 41), (17, 43), (18, 28), (18, 29), (18, 45), (19, 6), (19, 9), (19, 28), (19, 29), (20, 12), (20, 25), (20, 27), (21, 28), (22, 3), (22, 25), (22, 28), (23, 4), (23, 5), (23, 26), (23, 29), (23, 43), (24, 28), (25, 3), (25, 20), (25, 22), (25, 28), (25, 46), (25, 53), (26, 23), (27, 20), (27, 48), (27, 52), (28, 18), (28, 19), (28, 21), (28, 22), (28, 24), (28, 25), (29, 16), (29, 18), (29, 19), (29, 23), (30, 33), (30, 37), (30, 38), (31, 8), (31, 12), (31, 14), (31, 15), (31, 16), (32, 37), (32, 38), (32, 40), (32, 41), (33, 8), (33, 10), (33, 11), (33, 12), (33, 13), (33, 14), (33, 30), (34, 9), (34, 13), (34, 37), (35, 11), (36, 10), (36, 14), (36, 39), (36, 50), (37, 7), (37, 9), (37, 30), (37, 32), (37, 34), (37, 41), (38, 4), (38, 30), (38, 32), (38, 41), (39, 1), (39, 4), (39, 36), (40, 32), (41, 9), (41, 17), (41, 32), (41, 37), (41, 38), (42, 1), (42, 4), (43, 17), (43, 23), (43, 47), (44, 47), (45, 3), (45, 18), (45, 49), (46, 25), (46, 49), (46, 50), (47, 3), (47, 43), (47, 44), (47, 51), (48, 27), (48, 52), (49, 4), (49, 45), (49, 46), (49, 53), (50, 36), (50, 46), (50, 53), (51, 1), (51, 3), (51, 47), (52, 27), (52, 48), (53, 25), (53, 49), (53, 50)]
INFO     Generate contact map plot (/tmp/bpt1/out/ref.maplot.pdf)
INFO     Generate contact file (/tmp/bpt1/out/BPT1_BOVIN_native_aligned.contact.txt)

Structure quality report

The generated structures can be analyzed with procheck, whatif, clashlist or prosa using ariaec pdbqual command. It will only need a configuration file with the path to the executable of the quality tool.

(venv) [user@host tmp] > ariaec pdbqual run1/structures/it2/fitted.pdb -o bpt1/out/ -c bpt1/data/config.ini
================================================================================

                         ARIA Evolutive Contact toolbox

================================================================================
INFO     Initialize settings
INFO     Updating settings according to config file
INFO     Starting quality runs with ['/tmp/run1/structures/it2/fitted.pdb'] file(s)
INFO     Copying file(s) to output directory /tmp/bpt1/out
...
QualityChecks.py finished.
INFO     /tmp/bpt1/out/quality_checks generated
INFO     Removing infile(s) in output directory /tmp/bpt1/out

Note

If quality tools has been configured in the pipeline. There is no need to run this command after ARIA calculation.

Violation analysis

Additionally, we can use the ariaec analysis command to generate supplementary analysis file for a given ARIA iteration.

(venv) [user@host tmp] > ariaec analysis bpt1/out/ariaproject.xml bpt1/out/run1/structures/it8 gremlin -r bpt1/data/BPT1_BOVIN.native.aligned.pdb -c bpt1/data/config.ini -o bpt1/out/run1/structures/it8
================================================================================

                         ARIA Evolutive Contact toolbox

================================================================================

INFO     Initialize settings
INFO     Updating settings according to config file
INFO     Ensemble analysis will be done on restraints and ensemble from it7 with violation criteria of it8
INFO     Reading distance restraints file(s)
INFO     Reading native structure
INFO     [StructureEnsemble]: Reading PDB files ...
INFO     [StructureEnsemble]: PDB files read.
INFO     Reading structure ensemble(s)
INFO     Clusters found in this iteration, compute analysis foreach generated cluster ensemble
INFO     [StructureEnsemble]: Reading PDB files ...
INFO     [StructureEnsemble]: PDB files read.
INFO     [StructureEnsemble]: Reading PDB files ...
INFO     [StructureEnsemble]: PDB files read.
INFO     Violation analysis
INFO     Writing violation analysis of clust 0 /tmp/bpt1/out/run1/structures/it8/violations.csv file
INFO     Writing violation analysis of clust 1 /tmp/bpt1/out/run1/structures/it8/violations.csv file
INFO     [StructureEnsemble]: Reading PDB files ...
INFO     [StructureEnsemble]: PDB files read.