Calculating The Distance Between Atomic Coordinates

October 08, 2023 Post a Comment

I have a text file as shown below ATOM 920 CA GLN A 203 39.292 -13.354 17.416 1.00 55.76 C ATOM 929 CA HIS A 204 38.546 -15.963 14.792 1.00 29.53

Solution 1:

Do NOT split on whitespace

The other answers given here make a flawed assumption - that coordinates will be space-delimited. Per the PDB specification of ATOM, this is not necessarilly the case: PDB record values are specified by column indices, and may flow into one another. For instance, your first ATOM record reads:

ATOM    920  CA  GLN A20339.292 -13.35417.4161.0055.76           C

But this is perfectly valid as well:

ATOM    920  CA  GLN A20339.292-13.354017.4161.0055.76           C

The better approach

Because of the column-specified indices, and the number of other problems that can occur in a PDB file, you should not write your own parser. The PDB format is messy, and there's a lot of special cases and badly formatted files to handle. Instead, use a parser that's already written for you.

I like Biopython's PDB.PDBParser. It will parse the structure for you into Python objects, complete with handy features. If you prefer Perl, check out BioPerl.

PDB.Residue objects allow keyed access to Atoms by name, and PDB.Atom objects overload the - operator to return distance between two Atoms. We can use this to write clean, concise code:

Code

from Bio import PDB
parser = PDB.PDBParser()

# Parse the structure into a PDB.Structure object
pdb_code = "1exm"
pdb_path = "pdb1exm.ent"
struct = parser.get_structure(pdb_code, pdb_path)

# Grab the first two residues from the structure
residues = struct.get_residues()
res_one = residues.next()
res_two = residues.next()

try:
    alpha_dist = res_one['CA'] - res_two['CA']
except KeyError:
    print"Alpha carbon missing, computing distance impossible!"

Solution 2:

If your data is separated by whitespace, a simple split can do the job. Buffering the lines to compare them to each other sequentially.

use strict;
use warnings;

my @line;
while (<>) {
    push @line, $_;            # add line to buffernextif @line < 2;         # skip unless buffer is fullprint proc(@line), "\n";   # process and print shift @line;               # remove used line 
}

subproc{
    my @a = split' ', shift;   # line 1my @b = split' ', shift;   # line 2my $x = ($a[6]-$b[6]);      # calculate the diffsmy $y = ($a[7]-$b[7]);
    my $z = ($a[8]-$b[8]);
    my $dist = sprintf"%.1f",                # format the numbersqrt($x**2+$y**2+$z**2);   # do the calculationreturn"$a[3]-$b[3]\t$dist"; # return the string for printing
}

Output (with the sample data):

GLN-HIS 3.8
HIS-ASN 3.8
ASN-GLU 3.9
GLU-VAL 3.8

If your data is tab separated, you can split on /\t/ instead of ' '.

Solution 3:

Assuming your data are in "atoms.txt", this reads it in line by line and splits the entries into a list:

import csv

withopen("atoms.txt") as f:
  reader = csv.reader(f)
  for line, in reader:
      entries = line.split()
      print entries

Now for each list extract the columns you need, and calculate the distances etc (Bear in mind that the lists in python are zero-based).

Solution 4:

You should ideally use the MDAnalysis package for a pythonic way of "slicing" atoms and segments and calculating distance measures among them. In fact, MDAnalysis supports several MD simulation and chemical structure formats.

For a little more verbose example, see also the following entry on Biostars.org.

Solution 5:

If you are interested in just one pair, bash works just fine. This is a script I use, I have it set to relaunch at the end (turn this off if you wish). It will prompt you for which atom. PDB files can have different column set up, so for the awk line make sure that the columns match up. Do a test case by hand before using with a new pdb file. This is trivial, but change in the script my pdb file to yours.

#!/usr/bin/env bashecho" "echo"Use one letter abbreviations. Case doesn't matter."echo"Example: A 17 CA or n 162 cg"echo" - - - - - - - - - - - - - - - - - -"#------------First Selection------------read -e -p "What first atom? " sel1

# echo $sel1
sel1caps=${sel1^^}# echo "sel1caps="$sel1caps

arr1=($sel1caps)
# echo "arr1[0]= "${arr1[0]}# echo "arr1[1]= "${arr1[1]}# echo "arr1[2]= "${arr1[2]}#To convert one to three lettersif [ ${arr1[0]} = A ] ; then
    SF1=ALA
elif [ ${arr1[0]} = H ] ; then
    SF1=HIS
elif [ ${arr1[0]} = R ] ; then
    SF1=ARG
elif [ ${arr1[0]} = K ] ; then
    SF1=LYS
elif [ ${arr1[0]} = I ] ; then
    SF1=ILE
elif [ ${arr1[0]} = F ] ; then
    SF1=PHE 
elif [ ${arr1[0]} = L ] ; then
    SF1=LEU
elif [ ${arr1[0]} = W ] ; then
    SF1=TRP
elif [ ${arr1[0]} = M ] ; then
    SF1=MET
elif [ ${arr1[0]} = P ] ; then
    SF1=PRO 
elif [ ${arr1[0]} = C ] ; then
    SF1=CYS 
elif [ ${arr1[0]} = N ] ; then
    SF1=ASN
elif [ ${arr1[0]} = V ] ; then
    SF1=VAL
elif [ ${arr1[0]} = G ] ; then
    SF1=GLY 
elif [ ${arr1[0]} = S ] ; then
    SF1=SER 
elif [ ${arr1[0]} = Q ] ; then
    SF1=GLN 
elif [ ${arr1[0]} = Y ] ; then
    SF1=TYR 
elif [ ${arr1[0]} = D ] ; then
    SF1=ASP
elif [ ${arr1[0]} = E ] ; then
    SF1=GLU 
elif [ ${arr1[0]} = T ] ; then
    SF1=THR 
elseecho"use one letter codes"echo"exiting"exitfi# echo "SF1 ="$SF1#If there is nothing printing for line 1, check the expression for your pdb file. The spaces may need adjustment at the end.
line1=$(grep -E "${arr1[2]} *?${SF1}(.*?) ${arr1[1]}     " 1A23.pdb)
# echo $line1

ar_l1=($line1)
# echo "ar_l1[1]="${ar_l1[1]}echo" - - - - - - - - - - - - - - - - - -"#------------Second Selection------------read -e -p "What second atom? " sel2

# echo $sel2

sel2caps=${sel2^^}# echo "sel2caps="$sel2caps

arr2=($sel2caps)
# echo "arr2[0]= "${arr2[0]}# echo "arr2[1]= "${arr2[1]}# echo "arr2[2]= "${arr2[2]}#To convert one to three lettersif [ ${arr2[0]} = A ] ; then
    SF2=ALA
elif [ ${arr2[0]} = H ] ; then
    SF2=HIS
elif [ ${arr2[0]} = R ] ; then
    SF2=ARG
elif [ ${arr2[0]} = K ] ; then
    SF2=LYS
elif [ ${arr2[0]} = I ] ; then
    SF2=ILE
elif [ ${arr2[0]} = F ] ; then
    SF2=PHE 
elif [ ${arr2[0]} = L ] ; then
    SF2=LEU
elif [ ${arr2[0]} = W ] ; then
    SF2=TRP
elif [ ${arr2[0]} = M ] ; then
    SF2=MET
elif [ ${arr2[0]} = P ] ; then
    SF2=PRO 
elif [ ${arr2[0]} = C ] ; then
    SF2=CYS 
elif [ ${arr2[0]} = N ] ; then
    SF2=ASN
elif [ ${arr2[0]} = V ] ; then
    SF2=VAL
elif [ ${arr2[0]} = G ] ; then
    SF2=GLY 
elif [ ${arr2[0]} = S ] ; then
    SF2=SER 
elif [ ${arr2[0]} = Q ] ; then
    SF2=GLN 
elif [ ${arr2[0]} = Y ] ; then
    SF2=TYR 
elif [ ${arr2[0]} = D ] ; then
    SF2=ASP
elif [ ${arr2[0]} = E ] ; then
    SF2=GLU 
elif [ ${arr2[0]} = T ] ; then
    SF2=THR 
elseecho"use one letter codes"echo"exiting"exitfi# echo "SF2 ="$SF2


line2=$(grep -E "${arr2[2]} *?${SF2}(.*?) ${arr2[1]}     " 1A23.pdb)
# echo $line2

ar_l2=($line2)
# echo "ar_l2[1]="${ar_l2[1]}# echo "ar_l2[1]="${ar_l2[1]}

atom1=${ar_l1[1]}
atom2=${ar_l2[1]}echo"==========================="echo${arr1[0]}${arr1[1]}${arr1[2]}" to "${arr2[0]}${arr2[1]}${arr2[2]}":"# 6, 7, 8 are column numbers in the pdb file. # If there are multiple molecules it should be 7, 8, 9.
awk '$2=='$atom1'{x1=$7;y1=$8;z1=$9}                                 # get the ATOM 1
     $2=='$atom2'{x2=$7;y2=$8;z2=$9}                               # get the ATOM 2
     END{print sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2) + (z1-z2)*(z1-z2))}' 1A23.pdb    # calculate the distance.echo"Angstroms"echo"==========================="echo" "echo"-_-_-_-_Running Script Again_-_-_-_-"
./distance_soln.sh

Getting Started with Python