Tutorial¶
Note
Even if data for this tutorial are already in the docs/examples
folder. You
have to change at least cns_executable
and host_executable
with the
path of your CNS_ executable (not supplied in this package) in the
configuration file . Otherwise, this example will not work.
Structure calculation with EC restraints¶
We show here an example of de novo structure prediction from GREMLIN_ contacts
combined with secondary structure prediction with the ariaec
Command Line
Interface (CLI) is the tool for converting and analyze contact map information.
. The files related to this example can be found in the docs
folder
or here
.
Configuration file¶
All the parameters for ariaec
commands are encapsulated on a configuration
file in INI format. Each time you need to overwrite the default parameters,
another configuration file can be used with the updated parameters. There is
no need to give all the parameters in order to have a correct configuration
file.
A more detailed description of the parameters is in Configuration section.
Restraints & project conversion¶
The main command of this interface is ariaec setup
which convert EC data
and create an ARIA project XML file. Then we can follow the usual steps like an
usual ARIA project.
(venv) [user@host tmp] > ariaec setup bpt1/data/BPT1_BOVIN.fa bpt1/data/BPT1_BOVIN_contacts.gremlin.out -t gremlin -s bpt1/data/BPT1_BOVIN.indextableplus -o /tmp -c bpt1/data/config.ini
================================================================================
ARIA Evolutive Contact toolbox
================================================================================
INFO Initialize settings
INFO Updating settings according to config file
INFO Making output directories
INFO Reading fasta file /tmp/bpt1/data/bpt1_bovin.fa
INFO Amino acid sequence: FCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCG
INFO Checking if file /tmp/bpt1/data/BPT1_BOVIN.indextableplus correspond to indextableplus format
INFO Format type correct (indextableplus)
INFO Reading secondary structure file /tmp/bpt1/data/BPT1_BOVIN.indextableplus [indextableplus]
INFO Loading ss dist file
INFO Reading distance file /.conda/envs/aria/lib/python2.7/site-packages/aria/conbox/data/ss_dist.txt
INFO Align secondary structure sequence with protein sequence
INFO Reading /tmp/bpt1/data/BPT1_BOVIN_contacts.gremlin.out file
INFO Filtering gremlin contact map
INFO ...Position filter
INFO Removed 21 contacts.
INFO ...Conservation filter
INFO Conservation filter only works with indextableplus files !
INFO ...Secondary structure clash filter
INFO Removed 1 contacts.
INFO ...Disulfure bridge unicity filter
INFO Removed 11 contacts.
INFO Setting contact number with treshold 1.0
INFO Update gremlin maplot
INFO Update gremlin scoremap
INFO Select top 53 contacts according to scoremap
writing to the file: /tmp/etc/BPT1_BOVIN.seq
INFO Load molecule file and convert it into xml format
INFO [SequenceList]: reading sequence /tmp/etc/BPT1_BOVIN.seq
reading /tmp/etc/BPT1_BOVIN.seq
INFO Writing tbl files ...
INFO Dihedral restraints for secondary structures (/tmp/tbl/BPT1_BOVIN_dihed.tbl)
INFO Secondary structure restraints (/tmp/tbl/BPT1_BOVIN_ssdist.tbl)
INFO Helix bond restraints (/tmp/tbl/BPT1_BOVIN_hbond.tbl)
INFO Writing gremlin ARIA XML distance restraints
INFO Using contact scores as selection criteria
INFO Selecting 53 contacts
INFO 0%| | 0/53 [00:00<?, ?it/s]
INFO 100%|##########| 53/53 [00:00<00:00, 2549.17it/s]
INFO Write 53 xml distance restraints in /tmp/xml/BPT1_BOVIN_gremlin.xml
INFO Loading aria template file /.conda/envs/aria/lib/python2.7/site-packages/aria/conbox/templates/aria_project_v2.3.7.xml
INFO Writing ARIA project file (/tmp/ariaproject.xml)
INFO Generate contact file (/tmp/etc/BPT1_BOVIN_gremlin_filtered.contact.txt)
Build infrastructure¶
Before running the structure calculation pipeline, ARIA needs to build the whole infrastructure of the project.
(venv) [user@host tmp] > aria2 -s bpt1/out/ariaproject.xml
Loading project "ariaproject.xml"...
Setting-up bpt1/out/ariaproject.xml (run: 1)
INFO [Project]: Protocols copied.
INFO [Project]: Directory tree created.
INFO [Project]: Copying data files into local data-directory...
Done.
For running ARIA: aria2 bpt1/out/ariaproject.xml
Running ARIA¶
We use aria2 command without the -s
flag to launch structure calculations
(venv) [user@host tmp] > aria2 --no-test bpt1/out/ariaproject.xml
.. .......................................................................... ..
.. ARIA -- Ambiguous Restraints for Iterative Assignment ..
.. ..
.. A software for automated NOE assignment ..
.. ..
.. Version 2.3 ..
.. ..
.. ..
.. Copyright (C) Benjamin Bardiaux, Michael Habeck, Therese Malliavin, ..
.. Wolfgang Rieping, and Michael Nilges ..
.. ..
.. All rights reserved. ..
.. ..
.. NO WARRANTY. This software package is provided 'as is' without warranty of ..
.. any kind, expressed or implied, including, but not limited to the implied ..
.. warranties of merchantability and fitness for a particular purpose or ..
.. a warranty of non-infringement. ..
.. ..
.. Distribution of substantively modified versions of this module is ..
.. prohibited without the explicit permission of the copyright holders. ..
.. ..
.. .......................................................................... ..
Loading project "ariaproject.xml"...
INFO [Project]: Temporary directory has been set to /tmp/BPT1_BOVIN/gremlin/
aria_temp.tmpAVTAwy1562832135
INFO [Project]: Host list check has been disabled.
INFO [Project]: -------------------- Reading data --------------------
INFO [Project]: Cache is enabled.
INFO [Project]: Cache file does not exist. Creating new file.
INFO [Project]: Reading molecule definition /tmp/bpt1/out/
run1/data/sequence/BPT1_BOVIN.xml.
INFO [Project]: Data files read.
INFO [Project]: Data files cached.
INFO [Project]: ------------------- Filtering input data -------------------
INFO [NOESYSpectrumFilter.TP]: Spectrum filter report written to file "/tmp
/bpt1/out/run1/data/spectra/peak_list.filtered"
INFO [Project]: ---------------- Preparing structure engine ----------------
INFO [CNS]: Sequence PDB-file written.
INFO [CNS]: PSF-file has been created.
INFO [CNS]: Template PDB-file has been created.
INFO [Project]: Starting ARIA main protocol on ...
INFO [Project]: -------------------- Assigning spectra --------------------
INFO [Protocol]: ---------------------- Iteration 0 -----------------------
INFO [Protocol]: Calibrating spectrum "gremlin"...
INFO [Protocol]: Waiting for completion of structure calculation...
...
INFO [Project]: ARIA run completed at ...
Warning
Most of the time the --no-test
flags needs to be activated in order to
disables the initial dry run of the commands (specified in the host
list) used to launch a structure calculation. Those dry run are not
compatible with several hosts.
Structure calculation with EC & NMR restraints¶
Another possibility is to use an already existing ARIA project with NMR
restraints and use ariaec setup
with the -p
flag to update it with EC
restraints.
(venv) [user@host tmp] > ariaec setup malecoli/data/MALE_ECOLI.fa malecoli/data/MALE_ECOLI_contacts.evfold.out -t evcoupling -o malecoli/out -c malecoli/data/config.ini -p malecoli/data/ariaproject_nmr.xml
================================================================================
ARIA Evolutive Contact toolbox
================================================================================
INFO Initialize settings
INFO Updating settings according to config file
INFO Making output directories
INFO Reading fasta file /c7/home/fallain/tmp/malecoli/data/male_ecoli.fa
INFO Amino acid sequence: KIEEGKLVIWINGDKGYNGLAEVGKKFEKDTGIKVTVEHPDKLEEKFPQVAATGDGPDIIFWAHDRFGGYAQSGLLAEITPDKAFQDKLYPFTWDAVRYNGKLIAYPIAVEALSLIYNKDLLPNPPKTWEEIPALDKELKAKGKSALMFNLQEPYFTWPLIAADGGYAFKYENGKYDIKDVGVDNAGAKAGLTFLVDLIKNKHMNADTDYSIAEAAFNKGETAMTINGPWAWSNIDTSKVNYGVTVLPTFKGQPSKPFVGVLSAGINAASPNKELAKEFLENYLLTDEGLEAVNKDKPLGAVALKSYEEELAKDPRIAATMENAQKGEIMPNIPQMSAFWYAVRTAVINAASGRQTVDEALKDAQTRITK
INFO The file format evcoupling is not supported by the conkit plugin. Switching to homemade parsers.
INFO Checking if file /c7/home/fallain/tmp/malecoli/data/MALE_ECOLI_contacts.evfold.out correspond to our definition of evcoupling format
INFO Reading /c7/home/fallain/tmp/malecoli/data/MALE_ECOLI_contacts.evfold.out file
INFO Loading contact file
INFO Alignment of sequence in contact file (evcoupling) with reference (MALE_ECOLI.fa)
****TGARILALSALTTMMFSASALAKIEEGKLVIWINGDKGYNGLAEVGKKFEKDTGIKVTVEHPDKLEEKFPQVAATGDGPDIIFWAHDRFGGYAQSGLLAEITPDKAFQDKLYPFTWDAVRYNGKLIAYPIAVEALSLIYNKDLLPNPPKTWEEIPALDKELKAKGKSALMFNLQEPYFTWPLIAADGGYAFKYENGKY***DVGVDNAGAKAGLTFLVDLIKNKHMNADTDYSIAEAAFNKGETAMTINGPWAWSNIDTSKVNYGVTVLPTFKGQPSKPFVGVLSAGINAASPNKELAKEFLE*YLLTDEGLEAVNKDKPLGAVALKSYEEELAKDPRIAATMENAQKGEIMPNIPQMSAFWYAVRTAVINAASGRQTVDEALKDAQTRI--
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||...||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||.||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--------------------------KIEEGKLVIWINGDKGYNGLAEVGKKFEKDTGIKVTVEHPDKLEEKFPQVAATGDGPDIIFWAHDRFGGYAQSGLLAEITPDKAFQDKLYPFTWDAVRYNGKLIAYPIAVEALSLIYNKDLLPNPPKTWEEIPALDKELKAKGKSALMFNLQEPYFTWPLIAADGGYAFKYENGKYDIKDVGVDNAGAKAGLTFLVDLIKNKHMNADTDYSIAEAAFNKGETAMTINGPWAWSNIDTSKVNYGVTVLPTFKGQPSKPFVGVLSAGINAASPNKELAKEFLENYLLTDEGLEAVNKDKPLGAVALKSYEEELAKDPRIAATMENAQKGEIMPNIPQMSAFWYAVRTAVINAASGRQTVDEALKDAQTRITK
Score=364
WARNING Found a shift of 26 residues in positions given within contact list
INFO Update index in contact list and remove unassigned contacts
INFO Remove contacts outside sequence bonds
INFO Filtering evcoupling contact map
INFO ...Position filter
INFO ...Conservation filter
WARNING No conservation information. Can't use related filter
INFO ...Secondary structure clash filter
WARNING No secondary structure information. Can't use secondary structure filter
INFO ...Disulfure bridge unicity filter
INFO Setting contact number with treshold 1.0
INFO Update evcoupling maplot
INFO Update evcoupling scoremap
INFO Select top 370 contacts according to scoremap
writing to the file: /c7/home/fallain/tmp/malecoli/out/etc/MALE_ECOLI.seq
INFO Load molecule file and convert it into xml format
INFO [SequenceList]: reading sequence /c7/home/fallain/tmp/malecoli/out/etc/
MALE_ECOLI.seq
reading /c7/home/fallain/tmp/malecoli/out/etc/
MALE_ECOLI.seq
INFO Writing tbl files ...
INFO Writing evcoupling ARIA XML distance restraints
INFO Using contact scores as selection criteria
INFO Selecting 370 contacts
INFO 0%| | 0/370 [00:00<?, ?it/s]
INFO 100%|##########| 370/370 [00:00<00:00, 1245.07it/s]
INFO Write 370 xml distance restraints in /c7/home/fallain/tmp/malecoli/out/xml/MALE_ECOLI_evcoupling.xml
INFO Loading aria template file /c7/home/fallain/Projects/ariaec/src/ariaec/src/aria/conbox/templates/aria_project_v2.3.4.xml
INFO Directory /baycells/scratch/fallain/tmp/MALE_ECOLI/evcoupling doesn't exist.
INFO Create new directory /baycells/scratch/fallain/tmp/MALE_ECOLI/evcoupling
INFO Writing ARIA project file (/c7/home/fallain/tmp/malecoli/out/ariaproject.xml)
INFO Reading ARIA project file
INFO Update spectrum data in project
INFO Update sequence data in project
INFO Update dihedrals data in project
INFO Update rdcs data in project
INFO Update rdcs annealing parameters
INFO Writing new ARIA xml file (/c7/home/fallain/tmp/malecoli/out/ariaproject.xml)
INFO Generate contact file (/c7/home/fallain/tmp/malecoli/out/etc/MALE_ECOLI_evcoupling_filtered.contact.txt)
Warning
The ARIA project file is mandatory during the setup step if we want to mix evolutionary with experimental restraints. Only the data section is conserved actually. You still needs to give a configuration file for the other project parameters.
Analysis¶
Configuration file¶
A possible way to analyze different output according to their initial parameters in a configuration file is to use the iniconv command. It will convert the file into a comma separated value file which can be more practical for an analysis pipeline.
(venv) [user@host tmp] > ariaec iniconv bpt1/data/config.ini -o bpt1/out
================================================================================
ARIA Evolutive Contact toolbox
================================================================================
INFO Initialize settings
INFO Reading configuration file(s)
INFO Generate output csv file (configs.csv)
Contact map analysis¶
Aside ConKit_ Command Line Interface CLI) tools, the maplot command line has been implemented to show statistics between contact maps and the reference which can be a structure in PDB_ format. The first argument in this command will always be set as the reference map/pdb for the contact map, ROC or precision-recall plots.
(venv) [user@host tmp] > ariaec maplot bpt1/data/BPT1_BOVIN.fa bpt1/data/BPT1_BOVIN.indextableplus bpt1/data/BPT1_BOVIN.native.aligned.pdb bpt1/data/BPT1_BOVIN_contacts.gremlin.out -o bpt1/out -t pdb gremlin
================================================================================
ARIA Evolutive Contact toolbox
================================================================================
INFO Initialize settings
INFO Making output directories
INFO Reading fasta file /tmp/bpt1/data/bpt1_bovin.fa
INFO Amino acid sequence: FCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCG
INFO Checking if file /tmp/bpt1/data/BPT1_BOVIN.indextableplus correspond to indextableplus format
INFO Format type correct (indextableplus)
INFO Reading secondary structure file /tmp/bpt1/data/BPT1_BOVIN.indextableplus [indextableplus]
INFO Loading ss dist file
INFO Reading distance file /tmp/venv/lib/python2.7/site-packages/aria/conbox/data/ss_dist.txt
INFO Align secondary structure sequence with protein sequence
INFO Reading /tmp/bpt1/data/BPT1_BOVIN.native.aligned.pdb file
INFO Updating distance map with pdb file
INFO Generate contact map using contact definition defaultdict(None, {'default_cutoff': 8.0})
INFO Using default cutoff
INFO Reading /tmp/bpt1/data/BPT1_BOVIN_contacts.gremlin.out file
INFO Pdb map set as reference
INFO Generate contact map plot (/tmp/bpt1/out/BPT1_BOVIN.maplot.pdf)
INFO Generate map report file (/tmp/bpt1/out/mapreport)
INFO Generate roc file (/tmp/bpt1/out/graphics/maplot.roc.csv)
INFO Generate roc plot (/tmp/bpt1/out/graphics/maplot.roc.pdf)
INFO Generate precall file (/tmp/bpt1/out/graphics/maplot.roc.csv)
INFO Generate precall plot (/tmp/bpt1/out/graphics/maplot.precall.pdf)
INFO Generate contact file (/tmp/bpt1/out/BPT1_BOVIN_contacts_gremlin.contact.txt)
INFO Generate stat file (/tmp/bpt1/out/cmp.contactcmp.csv)
INFO
INFO Contact list: [(1, 39), (1, 42), (1, 51), (3, 22), (3, 25), (3, 45), (3, 47), (3, 51), (4, 23), (4, 38), (4, 39), (4, 42), (4, 49), (5, 23), (6, 19), (7, 37), (8, 12), (8, 31), (8, 33), (9, 19), (9, 34), (9, 37), (9, 41), (10, 33), (10, 36), (11, 33), (11, 35), (12, 8), (12, 20), (12, 31), (12, 33), (13, 33), (13, 34), (14, 31), (14, 33), (14, 36), (15, 31), (16, 29), (16, 31), (17, 41), (17, 43), (18, 28), (18, 29), (18, 45), (19, 6), (19, 9), (19, 28), (19, 29), (20, 12), (20, 25), (20, 27), (21, 28), (22, 3), (22, 25), (22, 28), (23, 4), (23, 5), (23, 26), (23, 29), (23, 43), (24, 28), (25, 3), (25, 20), (25, 22), (25, 28), (25, 46), (25, 53), (26, 23), (27, 20), (27, 48), (27, 52), (28, 18), (28, 19), (28, 21), (28, 22), (28, 24), (28, 25), (29, 16), (29, 18), (29, 19), (29, 23), (30, 33), (30, 37), (30, 38), (31, 8), (31, 12), (31, 14), (31, 15), (31, 16), (32, 37), (32, 38), (32, 40), (32, 41), (33, 8), (33, 10), (33, 11), (33, 12), (33, 13), (33, 14), (33, 30), (34, 9), (34, 13), (34, 37), (35, 11), (36, 10), (36, 14), (36, 39), (36, 50), (37, 7), (37, 9), (37, 30), (37, 32), (37, 34), (37, 41), (38, 4), (38, 30), (38, 32), (38, 41), (39, 1), (39, 4), (39, 36), (40, 32), (41, 9), (41, 17), (41, 32), (41, 37), (41, 38), (42, 1), (42, 4), (43, 17), (43, 23), (43, 47), (44, 47), (45, 3), (45, 18), (45, 49), (46, 25), (46, 49), (46, 50), (47, 3), (47, 43), (47, 44), (47, 51), (48, 27), (48, 52), (49, 4), (49, 45), (49, 46), (49, 53), (50, 36), (50, 46), (50, 53), (51, 1), (51, 3), (51, 47), (52, 27), (52, 48), (53, 25), (53, 49), (53, 50)]
INFO Generate contact map plot (/tmp/bpt1/out/ref.maplot.pdf)
INFO Generate contact file (/tmp/bpt1/out/BPT1_BOVIN_native_aligned.contact.txt)
Structure quality report¶
The generated structures can be analyzed with procheck, whatif, clashlist or
prosa using ariaec pdbqual
command. It will only need a configuration
file with the path to the executable of the quality tool.
(venv) [user@host tmp] > ariaec pdbqual run1/structures/it2/fitted.pdb -o bpt1/out/ -c bpt1/data/config.ini
================================================================================
ARIA Evolutive Contact toolbox
================================================================================
INFO Initialize settings
INFO Updating settings according to config file
INFO Starting quality runs with ['/tmp/run1/structures/it2/fitted.pdb'] file(s)
INFO Copying file(s) to output directory /tmp/bpt1/out
...
QualityChecks.py finished.
INFO /tmp/bpt1/out/quality_checks generated
INFO Removing infile(s) in output directory /tmp/bpt1/out
Note
If quality tools has been configured in the pipeline. There is no need to run this command after ARIA calculation.
Violation analysis¶
Additionally, we can use the ariaec analysis
command to generate
supplementary analysis file for a given ARIA iteration.
(venv) [user@host tmp] > ariaec analysis bpt1/out/ariaproject.xml bpt1/out/run1/structures/it8 gremlin -r bpt1/data/BPT1_BOVIN.native.aligned.pdb -c bpt1/data/config.ini -o bpt1/out/run1/structures/it8
================================================================================
ARIA Evolutive Contact toolbox
================================================================================
INFO Initialize settings
INFO Updating settings according to config file
INFO Ensemble analysis will be done on restraints and ensemble from it7 with violation criteria of it8
INFO Reading distance restraints file(s)
INFO Reading native structure
INFO [StructureEnsemble]: Reading PDB files ...
INFO [StructureEnsemble]: PDB files read.
INFO Reading structure ensemble(s)
INFO Clusters found in this iteration, compute analysis foreach generated cluster ensemble
INFO [StructureEnsemble]: Reading PDB files ...
INFO [StructureEnsemble]: PDB files read.
INFO [StructureEnsemble]: Reading PDB files ...
INFO [StructureEnsemble]: PDB files read.
INFO Violation analysis
INFO Writing violation analysis of clust 0 /tmp/bpt1/out/run1/structures/it8/violations.csv file
INFO Writing violation analysis of clust 1 /tmp/bpt1/out/run1/structures/it8/violations.csv file
INFO [StructureEnsemble]: Reading PDB files ...
INFO [StructureEnsemble]: PDB files read.