AMIA Logo
UWC Logo SANBI Logo

mutintro.py - Mutation Introduction Script Documentation

Project Information

Title: The development of an automated computational workflow to prioritize potential resistance variants identified in HIV Integrase Subtype C
Author: Keaghan Brown (3687524) - MSc Bioinformatics Candidate
Supervisor: Ruben Cloete - Lecturer, SANBI
Institution: South African National Bioinformatics Institute, University of the Western Cape
Funding: Poliomyelitis Research Foundation and UWC Ada & Bertie Levenstein Bursary Programme

Overview

The mutintro.py script is a critical component of the automated computational workflow for introducing mutations into protein structures. It leverages PyMOL’s mutagenesis wizard to introduce mutations individually or simultaneously and optionally applies energy minimization using FoldX.

This script is designed to streamline the mutation introduction process, ensuring consistency and accuracy in preparing mutant structures for downstream analysis.

Usage

python mutintro.py --pdb_file path/to/file.pdb --output_dir path/to/output --mutations path/to/mutations.csv --mode [single|multiple]

Arguments

  • --pdb_file: Path to the original PDB file that will undergo mutations.
  • --output_dir: Directory where modified PDB files will be saved.
  • --mutations: CSV file with mutation data. Each column represents a system, and each row contains mutations like Q148R.
  • --mode: Mutation mode. Options:
    • single: Each mutation is introduced individually.
    • multiple: All mutations for a system are introduced together.

Mutation Format

Mutations are written in the format OriginalResiduePositionNewResidue, e.g., Q148R, which means:

  • Original residue: Q (Glutamine)
  • Position: 148
  • New residue: R (Arginine)

Main Functionalities

Class: MutationIntro

mutant_processing(mutant_list)

Reads a CSV mutation table and returns dictionaries for both individual and grouped mutations.

individual_introduction(pdb_file, output_dir, mutant_data)

Introduces each mutation into a separate PDB file using PyMOL’s mutagenesis wizard and saves them to the output directory.

simultaneous_introduction(pdb_file, output_dir, mutant_data)

Introduces all mutations for a given system into a single structure and saves the final mutated PDB file.

foldx_emin(foldx_exe, output_dir)

(Optional) Runs FoldX’s "Optimize" command on each mutated structure to minimize energy and resolve potential steric clashes.

FoldX Integration

The script searches for a folder named foldx in the current working directory and automatically locates the FoldX binary within it.

Example command executed:

./foldx --command=Optimize --pdb=Q148R_auto.pdb --output-file=Q148R_auto.pdb

Dependencies

  • pymol
  • biopython (PDBParser, PPBuilder)
  • pandas
  • argparse, os, sys, warnings (Standard Library)

Example

python mutintro.py --pdb_file HIV_Integrase.pdb --output_dir ./output --mutations variant_list.csv --mode multiple

Prerequisites

  • Ensure Python 3.x is installed and added to your system PATH.
  • Install required libraries using pip install pymol biopython pandas.
  • Ensure PyMOL is installed and can be invoked in script mode.
  • Download and place the FoldX executable in a folder named foldx.

Notes

  • The mutation wizard automatically selects the rotamer with the least steric clashes.
  • Make sure your PDB file matches the expected format (chain and residue numbers must match mutation data).
  • All output files are saved in the specified output directory.

Error Handling

To improve error handling, consider implementing logging and exception handling in your script. For example:

import logging

                

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') try: # Example subprocess call subprocess.run(["python", "mutintro.py", "--pdb_file", "HIV_Integrase.pdb", "--output_dir", "./output", "--mutations", "variant_list.csv", "--mode", "multiple"], check=True) except subprocess.CalledProcessError as e: logging.error(f"Error occurred: {e}") exit(1)