mutintro.py - Mutation Introduction Script Documentation
Table of Contents
Project Information
Title: The development of an automated computational workflow to prioritize potential resistance variants identified in HIV Integrase Subtype C
Author: Keaghan Brown (3687524) - MSc Bioinformatics Candidate
Supervisor: Ruben Cloete - Lecturer, SANBI
Institution: South African National Bioinformatics Institute, University of the Western Cape
Funding: Poliomyelitis Research Foundation and UWC Ada & Bertie Levenstein Bursary Programme
Overview
The mutintro.py
script is a critical component of the automated computational workflow for introducing mutations into protein structures. It leverages PyMOL’s mutagenesis wizard to introduce mutations individually or simultaneously and optionally applies energy minimization using FoldX.
This script is designed to streamline the mutation introduction process, ensuring consistency and accuracy in preparing mutant structures for downstream analysis.
Usage
python mutintro.py --pdb_file path/to/file.pdb --output_dir path/to/output --mutations path/to/mutations.csv --mode [single|multiple]
Arguments
--pdb_file
: Path to the original PDB file that will undergo mutations.--output_dir
: Directory where modified PDB files will be saved.--mutations
: CSV file with mutation data. Each column represents a system, and each row contains mutations likeQ148R
.--mode
: Mutation mode. Options:single
: Each mutation is introduced individually.multiple
: All mutations for a system are introduced together.
Mutation Format
Mutations are written in the format OriginalResiduePositionNewResidue
, e.g., Q148R
, which means:
- Original residue: Q (Glutamine)
- Position: 148
- New residue: R (Arginine)
Main Functionalities
Class: MutationIntro
mutant_processing(mutant_list)
Reads a CSV mutation table and returns dictionaries for both individual and grouped mutations.
individual_introduction(pdb_file, output_dir, mutant_data)
Introduces each mutation into a separate PDB file using PyMOL’s mutagenesis wizard and saves them to the output directory.
simultaneous_introduction(pdb_file, output_dir, mutant_data)
Introduces all mutations for a given system into a single structure and saves the final mutated PDB file.
foldx_emin(foldx_exe, output_dir)
(Optional) Runs FoldX’s "Optimize" command on each mutated structure to minimize energy and resolve potential steric clashes.
FoldX Integration
The script searches for a folder named foldx
in the current working directory and automatically locates the FoldX binary within it.
Example command executed:
./foldx --command=Optimize --pdb=Q148R_auto.pdb --output-file=Q148R_auto.pdb
Dependencies
pymol
biopython
(PDBParser, PPBuilder)pandas
argparse, os, sys, warnings
(Standard Library)
Example
python mutintro.py --pdb_file HIV_Integrase.pdb --output_dir ./output --mutations variant_list.csv --mode multiple
Prerequisites
- Ensure Python 3.x is installed and added to your system PATH.
- Install required libraries using
pip install pymol biopython pandas
. - Ensure PyMOL is installed and can be invoked in script mode.
- Download and place the FoldX executable in a folder named
foldx
.
Notes
- The mutation wizard automatically selects the rotamer with the least steric clashes.
- Make sure your PDB file matches the expected format (chain and residue numbers must match mutation data).
- All output files are saved in the specified output directory.
Error Handling
To improve error handling, consider implementing logging and exception handling in your script. For example:
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
try:
# Example subprocess call
subprocess.run(["python", "mutintro.py", "--pdb_file", "HIV_Integrase.pdb", "--output_dir", "./output", "--mutations", "variant_list.csv", "--mode", "multiple"], check=True)
except subprocess.CalledProcessError as e:
logging.error(f"Error occurred: {e}")
exit(1)