Introduction to Nucleic Acids simulations
A-DNA to B-DNA transition
(but focus on the analysis part!)
[ Skip right to the analysis section ]
Introduction
Note: User commands will be colored red and program output in black.
In this tutoral we will run a DNA molecule with 12 base pairs starting in the A-DNA configuration and study the conversion to the B-DNA form. After the simulation we will use the tool CPPTRAJ to analyze the characteristics that switch during the conversion.
The starting structure is type A-DNA generated using the Nucleic Acids Builder tool available as part of the AmberTools collection of programs.
Generating the initial topology and coordinate files
Create a new directory in your scratch folder with the name 'tutorial6' and change into that directory:
$ mkdir tutorial6 $ cd tutorial6
Copy the initial PDB structure (A-dna.pdb) to your directory available here (or wget http://www.amber.utah.edu/PRACE-BioExcel-Seasonal-School-2019/DNA-tutorial/include/A-dna.pdb). The structure has the sequence d(CGCGAATTCGCG), which is known as the Drew-Dickerson dodecamer, or DDD, and is one of the most famous and studied short oligomers available. Open the file using VMD so you can see the structure that we are going to use for our tutorial.
$ vmd A-dna.pdb
Next we need to generate the topology and the coordinates file for the system. We do that using the LEaP
module of AmberTools. There are a large number of force fields and force fields modifications for nucleic acids. For this example we are going to use the parameters included in the version 14 of Amber which is a combination of files and modifications.
Note: If on beskow, interative tleap will not work. Do the initial set-up on tegner...
Open tleap (we do not need to use the visual interface, so tleap is the terminal version):$ tleap
Source the force field, which has the name ff14SB:
$ source leaprc.ff14SB
As you can see, when the force field loads, there is a line which states what modifications are being loaded. For nucleic acids, it loads:
PARM99 + frcmod.ff99SB + frcmod.parmbsc0 + OL3 for RNA
Which means that we are loading the parm99 file plus the ff99SB modifications (which do not do anything for nucleic acids) plus the parmbsc0 (modifications for the alpha/gamma dihedrals) plus the OL3 modification (for the chi angle, only for RNA).
In tleap, if we type the comand 'list', we can now see all the resiudes that we have loaded:
Since the molecule we are using has a negative charge, we load the ions force field so we can use them to net neutralize our system:
> loadamberparams frcmod.ionsjc_tip3p
We now load our DNA structure with the command 'loadpdb' and asign it to the variable 'dna':
> dna = loadpdb A-dna.pdb Loading PDB file: ./A-dna.pdb total atoms in file: 758
Check our structure for possible errors:
> check dna
Checking 'dna'.... WARNING: The unperturbed charge of the unit: -22.000000 is not zero. Checking parameters for unit 'dna'. Checking for bond parameters. Checking for angle parameters. check: Warnings: 1 Unit is OK.
The structure has a -22 charge which is expected since we have 22 phosphate groups. At this step we can proceed and add counter ions to neutralize the charge.
> addions dna Na+ 0
This command will add ions to the variabla 'dna', using Na+ type ions until it reaches a charge of '0'. The ions are added close to the negative sites of the molecule. Next, we solvate the system.
> solvateoct dna TIP3PBOX 9.0
The command solaveoct will create a truncated octahedron periodic cell centered on the variable 'dna' which contains our systems and add explicit water molecules using the TIP3P water model. tleap will add at least 9.0 angstroms of solvation between the solute (our DNA molecule) and the borders of the box. Around 5336 water molecules should be added.
That is it. Next we generate the topology file and the coordinates file of our system:
> saveamberparm dna dna.prmtop dna.rst7
This will generate a topology and coordinates file using the force field and force field modifications we loaded previously. After this, we quit tleap
> quit
In our directory, we have the topology file 'dna.prmtop' that contains all the atoms in our system and their parameters, and a coordinates file 'dna.rst7' that has all the xyz coordinates for each atom. We can check our topology and coordinates file with VMD.
Launch VMD and load the topology file 'dna.prmtop'. It will recognize the file as an AMBER7 parm file. Load the file and in the VMD main windows it should display the number of atoms. Next, load the coordinates file 'dna.rst7' and it should be loaded like an AMBER7 restart file. Hit load and you should be able to see the solvated system.
Close VMD and now we are ready to run minimization and equilibration of our DNA molecule. We will use the following protocols:
mini.in | heat.in | eq.in |
Minimzing the system with 25 kcal/mol restraints on DNA, 500 steps of steepest descent and 500 of conjugated gradient &cntrl imin=1, ntx=1, irest=0, ntpr=50, ntf=1, ntb=1, cut=9.0, nsnb=10, ntr=1, maxcyc=1000, ncyc=500, ntmin=1, restraintmask=':1-24', restraint_wt=25.0, &end &ewald ew_type = 0, skinnb = 1.0, &end |
Heating the system with 25 kcal/mol restraints on DNA, V=const &cntrl imin=0, ntx=1, ntpr=500, ntwr=500, ntwx=500, ntwe=500, nscm=5000, ntf=2, ntc=2, ntb=1, ntp=0, nstlim=50000, t=0.0, dt=0.002, cut=9.0, tempi=100.0, ntt=1, ntr=1,nmropt=1, restraintmask=':1-24', restraint_wt=25.0, &end &wt type='TEMP0',istep1=0, istep2=5000, value1=100.0,value2=300.0, &end &wt type='TEMP0',istep1=5001, istep2=50000, value1=300.0,value2=300.0, &end &wt type='END', &end |
Equilibrating the system with 0.5 kcal/mol restraints on DNA, during 500ps, at constant T= 300K & P=1ATM and coupling = 0.2 &cntrl imin=0, ntx=5, ntpr=500, ntwr=500, ntwx=500, ntwe=500, nscm=5000, ntf=2, ntc=2, ntb=2, ntp=1, tautp=0.2, taup=0.2, nstlim=25000, t=0.0, dt=0.002, cut=9.0, ntt=1, ntr=1, irest=1, restraintmask=':1-24', restraint_wt=0.5, &end &ewald ew_type = 0, skinnb = 1.0, &end |
First we run mini.in, then heat.in and last eq.in. Remember that these are only examples to setup simulations. For actual research simulations a more thorough minimization and equilibration protocol should be used.
Note: pmemd is not installed, need to use pmemd.MPI or sander (single core) or sander.MPI.
To do this, salloc and aprun -n N pmemd.MPI (where N is the number of cores or submit as regular batch job). NOTE also that minimization is not well parallelized in AMBER.
To run mini.in:
$ pmemd -i mini.in -o mini.out -p dna.prmtop -c dna.rst7 -r mini.rst -ref dna.rst7
To run heat.in:
$ pmemd -i heat.in -o heat.out -p dna.prmtop -c mini.rst -r heat.rst -x heat.mdcrd -ref mini.rst
The last step, eq.in:
$ pmemd -i eq.in -o eq.out -p dna.prmtop -c heat.rst -r eq.rst -x eq.mdcrd -ref heat.rst
This equilibration steps will generate a reasonable starting system that we can use for production MD simulation using the input file:
md.in |
production |
To run the production simulation we use:
$ pmemd -i md.in -o md.out -p dna.prmtop -c eq.rst -r md.rst7 -x md.nc -inf mdinfo
This will run pmemd for 2 nanoseconds, which is just good enough to see the DNA dynamics and conformational changes. The simulation will generate a restart file (md.rst7) and a trajectory file in the binary NetCDF format (md.nc)
Analysis of the trajectory using CPPTRAJ
The program CPPTRAJ developed by Dr. Dan Roe and other collaborators is our main engine to analyze AMBER molecular dynamics simulations. It has a huge array of commands and analysis actions and is also focused on speed. This is very important since we now analyze trajectories no shorter than microsecond time scale, which represents tens of millions of frames.
The paper describing CPPTRAJ is D. Roe and T.E. Cheatham, III. "PTRAJ and CPPTRAJ: Software for processing and analysis of molecular dynamics trajectory data." J. Chem. Theory Comp. 9, 3084-3095(2013). ...the long awaited PTRAJ paper! It is available at: JCTC link
CPPTRAJ can be run either using an input script or in interactive mode. Using an input script is useful when you want to automate a lot of analysis or when submitting the analysis for a compute cluster, for example. Also, having the script allows you to remember what you did (provenance). The interactive mode is great for doing short, simple analysis and also to look at the extensive integrated help which is often the most up-to-date source of help information for the program.
Our first CPPTRAJ analysis will generate a stripped trajectory and reorient it to the center so we can see the simulation in VMD. Also, it is faster to work with stripped trajectories (that is, without water), siwnce it saves a considerable amount of time and hard disk space.
The input script is:
process1.cpptraj | |
parm dna.prmtop trajin md.nc autoimage strip :WAT trajout md-nowater.nc netcdf |
Load the corresponding topology file |
Needed files:
     
dna.prmtop
     
md.nc
Run CPPTRAJ with the command:
$ cpptraj -i process1.cpptraj
This will run CPPTRAJ, read the input file and execute the analysis:
This generates a new trajectory file with the name 'md-nowater.nc' which is autoimaged and has no waters. Before we can see the movie in VMD (or chimera), we need a new topology file. This is because our topology file still has waters. To generate a stripped topology file we can use either CPPTRAJ or the python program 'parmed.py'
We will use parmed.py to demonstrate quickly how it works.
Load the topology in parmed.py
$ parmed.py dna.prmtop
Run the command to strip waters:
> strip :WAT
Save the new topology file:
> parmout dna-nowater.prmtop
Type 'go' to run the process:
> go
This reads in the topology file, deletes all the residues with the name WAT and saves a new topology file with the name 'dna-nowater.prmtop'
Now we can open our trajectory in VMD:
$ vmd -parm7 dna-nowater.prmtop -netcdf md-nowater.nc
This command will open VMD, load the topology file and load the frames included in the NetCDF file. You must be able to see a big conformational change as the DNA switches from the A- to the B- form configuration.
Now we can do further analysis using CPPTRAJ. We are going to group a bunch of analysis in the same input script. This is faster since the trajectory file will be read only once and with the data loaded, several analysis can be made.
We will perform:
RMSD calculation - this will measure the deviation of the structure over time
Pucker distributions - a measure of the sugar puckers, so we can check the conformational change
Structural parameters - a collection of parameters specific of nucleic acids, specially the helical axis
Needed files:
     
dna.prmtop
     
md.nc
The CPPTRAJ input for the analysis is:
process2.cpptraj | |
parm dna-nowater.prmtop |
- Load the corresponding topology file - Load the trajectory file - Measure the RMSD values and save it to a dataset called 'all'. Use the first frame as a reference. Measure the residues 1 through 24, but only the atoms that are not H. Save the file as an xmgrace type of file with name rmsd-all.agr. l Use the mass of the atoms for measuring. - Measure the RMSD values and save it to a dataset called 'notail'. Use the first frame as a reference. Measure the residues 3 through 10 and 15 to 22, but only the atoms that are not H. Save the file as an xmgrace type of file with name rmsd-all.agr. Using the same name as the first measure, will combine the two measurements into a single xmgrace filel Use the mass of the atoms for measuring. - The pucker commands measure the pseudo-pucker rotation around the atoms C1' through O4' of residue 5. Data is saved to an xmgrace file called pucker.agr and it will be modified to a 360 degrees range (instead of -180 to +180 degrees). The same command follows but residues 6 through 8. - The nastruct command calculates basic nucleic acids structure parameters. The resrange states that only the base pair 6,7 and 18,19 will be read and will generate a file with the suffix data_6-7. This file will include the shear, stretch, stagger, buckle, propeller, opening HB, major and minor groove measurements for that particular base pair. |
If we open the file 'rmsd-all.agr':
$ xmgr rmsd-all.agr
We have two lines, which corresponds to the two datasets we calculated with CPPTRAJ. The blackline is using all the residues and the red line is using residues 3 to 10 and 15 to 22. We can see that a variation of more than 4 angstroms is present at the starting of the simulation for the black line (using all atoms) and a little lower for the red line. After around the frame 400, the values fluctuate between 4.5 and 5.5 angstoms when using all atoms and around 3.5 and 4.5 for the central base pairs. This suggest a big conformational shift early on the simulation which we already saw using VMD.
The pucker distribution (file pucker.agr) for residues 5 through 8 looks like this:
Each line corresponds to the pucker pseudo-rotation angle for residues 5 to 8. At the start of the simulation, we can roughly see more population of values going from 0 to around 100 degrees (which corresponds to a C3'-endo configuration, present in the A-DNA form). After 600 frames, the values cluster between 100 and 200 degrees which corresponds to a C2'-endo configuration present for B-DNA. For details about all these configurations and measurements, we recommend the Nucleic Acids bible which is the Wolfram Saenger book 'Principles of Nucleic Acid Structure'.
The pucker distribution is easier to differentiate if we calculate the histogram for each residue. In xmgrace, we go to -> Data -> Transformations -> Histograms...
We select all the datasets present (4 because we calculated the pucker distributions for 4 residues). We will calculate a histogram starting at 0 degrees and finish at 360 with 99 bins and normalize it.
Hit apply. The histograms will be calculated for each dataset.
With the Histograms windows still open, we right click on the highlighted datasets and click hide. This will hide the original values. We have to do that so we can actually see the new generated histogram data.
Close the histograms windows and hit the recenter icon to rescale the plot
Go to Plot -> Set Appearance
And select the datasets with the plus symbol next to it (+). These are our active (not hidden) datasets. You can simply click the first one and drag the mouse to multi select the 4 datasets.
Select Line properties to Straight, Width to 2 and hit Apply
Now in the Set Appearance window, go to Edit and click on Set different colors.
Now we can see more easily that the population of values for the pucker distribution is mainly in the 110 to 180 degrees which corresponds to the C2'-endo configuration. The curves does not appear smooth since we only have ~800 frames of simulation. Extending the simulation will show a nicer distribution.
The nastruct comand will generate 3 files. One with the information for base pairs (BP.data_6-7). The first column is the time frame and each column stands for:
#Frame Base1 Base2 Shear Stretch Stagger Buckle Propeller Opening HB Major Minor
We include here what each column stands for, because since we included the 'noheader' keyword in the nastruct command, no header will be written in the output file. Another file generated by nastruct is one with the information of the step parameters between the central base pairs 6,19 and 7,18 (BPstep.data_6-7):
#Frame BP1 BP2 Shift Slide Rise Tilt Roll Twist
and the last one with helical parameters involved in the same base pairs (Helix.data_6-7):
#Frame BP1 BP2 X-disp Y-disp Rise Incl. Tip Twist
Of course one could expand to include more base pairs, but it will require more post processing of the output files. In this case we will see how the helicoidal twist between the central base pairs of our DNA (base pairs 6-19 and 7-18) chain evolve over time.
The file we will use is the BPstep.data_6-7 which has on the last column the twist value. We first have to do a bit of unix-kung-fu to properly format the files. First lets delete the extra blank lines from the CPPTRAJ output file using sed:
$ sed -e '/^$/d' BPstep.data_6-7 > BPstep.data_6-7.new
Then we extract colum 1 (the frame count) and column 9 (the twist value) using awk:
$ awk '{print $1 "\t" $9}' BPstep.data_6-7.new > twist_6-7.dat
Now we can plot the twist value with xmgrace:
$ xmgr twist_6-7.dat
The plot shows an increase on the average twist value around the frame 180 which roughly matches the time on which the RMSD deviation converges. Of course this is only one base pair, it would require further analysis on the rest of the DNA molecule to actually study the entire conversion process between the A-B conformation.
Files used in this tutorial are available here:
mini.out - mini.rst
heat.out - heat.rst
eq.out - eq.rst
md.out - md.rst7 - md.nc