TrajStat Documentation
Table of Contents
Overview
TrajStat is a Python script designed for analyzing molecular dynamics (MD) trajectories using the MDAnalysis library. It provides a comprehensive suite of analysis methods, including RMSD, RMSF, radius of gyration, PCA, hydrogen bonds, salt bridges, and more.
This script is a critical component of molecular dynamics workflows, enabling researchers to extract meaningful insights from trajectory data and visualize structural dynamics.
Usage
The script can be executed from the command line with the following arguments:
python trajstat.py --systems --output_dir
Replace <path_to_systems>
with the directory containing the trajectory files and <path_to_output_directory>
with the directory where the results will be stored.
Features
- RMSD Calculation: Calculates the root mean square deviation (RMSD) for proteins and nucleic acids.
- RMSF Calculation: Computes the root mean square fluctuation (RMSF) for protein chains.
- Radius of Gyration: Determines the compactness of the system.
- PCA: Performs principal component analysis (PCA) on backbone atoms.
- Salt Bridges: Identifies ionic interactions between acidic and basic residues.
- Hydrogen Bonds: Analyzes hydrogen bonds between proteins, nucleic acids, and other molecules.
- Visualization: Generates plots for RMSD, RMSF, PCA, salt bridges, and hydrogen bonds.
Functions
rmsd_calc
Calculates RMSD for proteins.
rmsd_calc(top_file, traj_file)
rmsf_calc
Calculates RMSF for protein chains.
rmsf_calc(top_file, traj_file, start_fr)
rgyr_calc
Calculates the radius of gyration for proteins.
rgyr_calc(top_file, traj_file, start_fr)
pca_calc
Performs PCA on backbone atoms.
pca_calc(top_file, traj_file, start_fr)
saltbridges
Identifies salt bridges between acidic and basic residues.
saltbridges(top_file, traj_file)
hbond_calc
Analyzes hydrogen bonds between proteins and other molecules.
hbond_calc(top_file, traj_file, start_fr)
Dependencies
The script requires the following Python libraries:
- MDAnalysis
- Pandas
- Matplotlib
- Seaborn
- Plotly
- Numpy
Output
The script generates the following output files:
- CSV files containing analysis results (e.g., RMSD, RMSF, PCA).
- Plots in TIFF or PNG format for visualization.
- Text files with statistical summaries.
Example
To analyze trajectories in the systems
folder and save results in the output
folder:
python trajstat.py --systems /path/to/systems --output_dir /path/to/output
Prerequisites
- Ensure Python 3.x is installed and added to your system PATH.
- Install required libraries using
pip install MDAnalysis pandas matplotlib seaborn plotly numpy
. - Ensure trajectory files are in a compatible format (e.g., DCD, PDB).
Notes
- Ensure trajectory files and topology files are correctly paired.
- Use consistent naming conventions for input files to avoid errors.
- Output plots are saved in high-resolution formats for publication purposes.
Error Handling
To improve error handling, consider implementing logging and exception handling in your script. For example:
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
try:
# Example function call
rmsd_calc("topology.pdb", "trajectory.dcd")
except Exception as e:
logging.error(f"Error occurred: {e}")
exit(1)
Author
This script was developed for molecular dynamics trajectory analysis as part of the MSc Bioinformatics project by Keaghan Brown.