Calculating The Distance Between Atomic Coordinates
Solution 1:
Do NOT split on whitespace
The other answers given here make a flawed assumption - that coordinates will be space-delimited. Per the PDB specification of ATOM
, this is not necessarilly the case: PDB record values are specified by column indices, and may flow into one another. For instance, your first ATOM
record reads:
ATOM 920 CA GLN A20339.292 -13.35417.4161.0055.76 C
But this is perfectly valid as well:
ATOM 920 CA GLN A20339.292-13.354017.4161.0055.76 C
The better approach
Because of the column-specified indices, and the number of other problems that can occur in a PDB file, you should not write your own parser. The PDB format is messy, and there's a lot of special cases and badly formatted files to handle. Instead, use a parser that's already written for you.
I like Biopython's PDB.PDBParser
. It will parse the structure for you into Python objects, complete with handy features. If you prefer Perl, check out BioPerl.
PDB.Residue
objects allow keyed access to Atoms by name, and PDB.Atom
objects overload the -
operator to return distance between two Atoms. We can use this to write clean, concise code:
Code
from Bio import PDB
parser = PDB.PDBParser()
# Parse the structure into a PDB.Structure object
pdb_code = "1exm"
pdb_path = "pdb1exm.ent"
struct = parser.get_structure(pdb_code, pdb_path)
# Grab the first two residues from the structure
residues = struct.get_residues()
res_one = residues.next()
res_two = residues.next()
try:
alpha_dist = res_one['CA'] - res_two['CA']
except KeyError:
print"Alpha carbon missing, computing distance impossible!"
Solution 2:
If your data is separated by whitespace, a simple split
can do the job. Buffering the lines to compare them to each other sequentially.
use strict;
use warnings;
my @line;
while (<>) {
push @line, $_; # add line to buffernextif @line < 2; # skip unless buffer is fullprint proc(@line), "\n"; # process and print shift @line; # remove used line
}
subproc{
my @a = split' ', shift; # line 1my @b = split' ', shift; # line 2my $x = ($a[6]-$b[6]); # calculate the diffsmy $y = ($a[7]-$b[7]);
my $z = ($a[8]-$b[8]);
my $dist = sprintf"%.1f", # format the numbersqrt($x**2+$y**2+$z**2); # do the calculationreturn"$a[3]-$b[3]\t$dist"; # return the string for printing
}
Output (with the sample data):
GLN-HIS 3.8
HIS-ASN 3.8
ASN-GLU 3.9
GLU-VAL 3.8
If your data is tab separated, you can split on /\t/
instead of ' '
.
Solution 3:
Assuming your data are in "atoms.txt", this reads it in line by line and splits the entries into a list:
import csv
withopen("atoms.txt") as f:
reader = csv.reader(f)
for line, in reader:
entries = line.split()
print entries
Now for each list extract the columns you need, and calculate the distances etc (Bear in mind that the lists in python are zero-based).
Solution 4:
You should ideally use the MDAnalysis package for a pythonic way of "slicing" atoms and segments and calculating distance measures among them. In fact, MDAnalysis supports several MD simulation and chemical structure formats.
For a little more verbose example, see also the following entry on Biostars.org.
Solution 5:
If you are interested in just one pair, bash works just fine. This is a script I use, I have it set to relaunch at the end (turn this off if you wish). It will prompt you for which atom. PDB files can have different column set up, so for the awk line make sure that the columns match up. Do a test case by hand before using with a new pdb file. This is trivial, but change in the script my pdb file to yours.
#!/usr/bin/env bashecho" "echo"Use one letter abbreviations. Case doesn't matter."echo"Example: A 17 CA or n 162 cg"echo" - - - - - - - - - - - - - - - - - -"#------------First Selection------------read -e -p "What first atom? " sel1
# echo $sel1
sel1caps=${sel1^^}# echo "sel1caps="$sel1caps
arr1=($sel1caps)
# echo "arr1[0]= "${arr1[0]}# echo "arr1[1]= "${arr1[1]}# echo "arr1[2]= "${arr1[2]}#To convert one to three lettersif [ ${arr1[0]} = A ] ; then
SF1=ALA
elif [ ${arr1[0]} = H ] ; then
SF1=HIS
elif [ ${arr1[0]} = R ] ; then
SF1=ARG
elif [ ${arr1[0]} = K ] ; then
SF1=LYS
elif [ ${arr1[0]} = I ] ; then
SF1=ILE
elif [ ${arr1[0]} = F ] ; then
SF1=PHE
elif [ ${arr1[0]} = L ] ; then
SF1=LEU
elif [ ${arr1[0]} = W ] ; then
SF1=TRP
elif [ ${arr1[0]} = M ] ; then
SF1=MET
elif [ ${arr1[0]} = P ] ; then
SF1=PRO
elif [ ${arr1[0]} = C ] ; then
SF1=CYS
elif [ ${arr1[0]} = N ] ; then
SF1=ASN
elif [ ${arr1[0]} = V ] ; then
SF1=VAL
elif [ ${arr1[0]} = G ] ; then
SF1=GLY
elif [ ${arr1[0]} = S ] ; then
SF1=SER
elif [ ${arr1[0]} = Q ] ; then
SF1=GLN
elif [ ${arr1[0]} = Y ] ; then
SF1=TYR
elif [ ${arr1[0]} = D ] ; then
SF1=ASP
elif [ ${arr1[0]} = E ] ; then
SF1=GLU
elif [ ${arr1[0]} = T ] ; then
SF1=THR
elseecho"use one letter codes"echo"exiting"exitfi# echo "SF1 ="$SF1#If there is nothing printing for line 1, check the expression for your pdb file. The spaces may need adjustment at the end.
line1=$(grep -E "${arr1[2]} *?${SF1}(.*?) ${arr1[1]} " 1A23.pdb)
# echo $line1
ar_l1=($line1)
# echo "ar_l1[1]="${ar_l1[1]}echo" - - - - - - - - - - - - - - - - - -"#------------Second Selection------------read -e -p "What second atom? " sel2
# echo $sel2
sel2caps=${sel2^^}# echo "sel2caps="$sel2caps
arr2=($sel2caps)
# echo "arr2[0]= "${arr2[0]}# echo "arr2[1]= "${arr2[1]}# echo "arr2[2]= "${arr2[2]}#To convert one to three lettersif [ ${arr2[0]} = A ] ; then
SF2=ALA
elif [ ${arr2[0]} = H ] ; then
SF2=HIS
elif [ ${arr2[0]} = R ] ; then
SF2=ARG
elif [ ${arr2[0]} = K ] ; then
SF2=LYS
elif [ ${arr2[0]} = I ] ; then
SF2=ILE
elif [ ${arr2[0]} = F ] ; then
SF2=PHE
elif [ ${arr2[0]} = L ] ; then
SF2=LEU
elif [ ${arr2[0]} = W ] ; then
SF2=TRP
elif [ ${arr2[0]} = M ] ; then
SF2=MET
elif [ ${arr2[0]} = P ] ; then
SF2=PRO
elif [ ${arr2[0]} = C ] ; then
SF2=CYS
elif [ ${arr2[0]} = N ] ; then
SF2=ASN
elif [ ${arr2[0]} = V ] ; then
SF2=VAL
elif [ ${arr2[0]} = G ] ; then
SF2=GLY
elif [ ${arr2[0]} = S ] ; then
SF2=SER
elif [ ${arr2[0]} = Q ] ; then
SF2=GLN
elif [ ${arr2[0]} = Y ] ; then
SF2=TYR
elif [ ${arr2[0]} = D ] ; then
SF2=ASP
elif [ ${arr2[0]} = E ] ; then
SF2=GLU
elif [ ${arr2[0]} = T ] ; then
SF2=THR
elseecho"use one letter codes"echo"exiting"exitfi# echo "SF2 ="$SF2
line2=$(grep -E "${arr2[2]} *?${SF2}(.*?) ${arr2[1]} " 1A23.pdb)
# echo $line2
ar_l2=($line2)
# echo "ar_l2[1]="${ar_l2[1]}# echo "ar_l2[1]="${ar_l2[1]}
atom1=${ar_l1[1]}
atom2=${ar_l2[1]}echo"==========================="echo${arr1[0]}${arr1[1]}${arr1[2]}" to "${arr2[0]}${arr2[1]}${arr2[2]}":"# 6, 7, 8 are column numbers in the pdb file. # If there are multiple molecules it should be 7, 8, 9.
awk '$2=='$atom1'{x1=$7;y1=$8;z1=$9} # get the ATOM 1
$2=='$atom2'{x2=$7;y2=$8;z2=$9} # get the ATOM 2
END{print sqrt((x1-x2)*(x1-x2) + (y1-y2)*(y1-y2) + (z1-z2)*(z1-z2))}' 1A23.pdb # calculate the distance.echo"Angstroms"echo"==========================="echo" "echo"-_-_-_-_Running Script Again_-_-_-_-"
./distance_soln.sh
Post a Comment for "Calculating The Distance Between Atomic Coordinates"