AMIA Logo
UWC Logo SANBI Logo

foldxana.py - FoldX Stability Analysis Documentation

Project Information

Title: The development of an automated computational workflow to prioritize potential resistance variants identified in HIV Integrase Subtype C
Author: Keaghan Brown (3687524) - MSc Bioinformatics Candidate
Supervisor: Ruben Cloete - Lecturer, SANBI
Institution: South African National Bioinformatics Institute, University of the Western Cape
Funding: Poliomyelitis Research Foundation and UWC Ada & Bertie Levenstein Bursary Programme

Overview

The foldxana.py script is a key component of the automated computational workflow for analyzing structural stability changes in HIV Integrase Subtype C. It utilizes the FoldX software to calculate the stability of wild-type and mutant protein structures.

The script outputs a detailed HTML report summarizing the ΔΔG changes induced by mutations, which provides insights into the potential impact of mutations on protein stability.

Usage

python foldxana.py --pdb_file path/to/original.pdb --output_dir path/to/mutants/

Arguments

  • --pdb_file: Path to the wild-type PDB structure.
  • --output_dir: Directory containing the mutated PDB files (must end in _auto.pdb).

Outputs

  • stability_index.html: Table showing:
    • Wild-type structure energy
    • Each variant structure energy
    • Stability difference (ΔΔG)

Main Functionalities

Class: FoldXAna

foldx_stability(output_dir, pdb_file)

Automatically locates the FoldX executable and runs stability calculations:

  • First for the wild-type PDB file
  • Then for each mutant PDB file in the output directory

stability_changes(output_dir, pdb_file)

Parses the resulting FoldX .fxout files and calculates:

  • Stability of wild-type system
  • Stability of each variant system
  • ΔΔG = WT Stability - Variant Stability

The results are exported to stability_index.html with HTML table formatting.

Workflow Summary

  1. FoldX executable is located automatically within the current folder.
  2. Stability is calculated for WT and each variant using the Stability command.
  3. ΔΔG is computed and compiled into an HTML report.

Dependencies

  • FoldX (executable must be in current directory)
  • Python standard libraries
  • pandas, numpy

Example

python foldxana.py --pdb_file HIV_WT.pdb --output_dir ./variants/

Output: stability_index.html

Prerequisites

  • Ensure Python 3.x is installed and added to your system PATH.
  • Install required libraries using pip install pandas numpy.
  • Download and place the FoldX executable in the script's working directory.

Notes

  • Ensure FoldX is downloaded and available in the script's working directory.
  • All variant structures must be named using the format: VariantName_auto.pdb.
  • Only the second value (energy) from FoldX output is used.
  • HTML output is styled with basic CSS for readability.

Error Handling

To improve error handling, consider implementing logging and exception handling in your script. For example:

import logging

        logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

        try:
            # Example subprocess call
            subprocess.run(["python", "foldxana.py", "--pdb_file", "HIV_WT.pdb", "--output_dir", "./variants"], check=True)
        except subprocess.CalledProcessError as e:
            logging.error(f"Error occurred: {e}")
            exit(1)