Clustering multiple fits in a cryoEM map¶

class
Cluster.
Cluster
[source]¶ A class to clustering an ensemble of structure instance

RMSD_ensemble
(rank_fit_ensemble, ensemble_list, CA=True)[source]¶ Calculates the pairwise RMSD matrix for all Structure Instance in the ensemble.
 Arguments:
 rank_fit_ensemble
 Ensemble of Structure Instance ranked using cluster.rank_fit_ensemble
 ensemble_list
 Input list of Structure Instances
CA is set to True if only CARMSD is needed
 Return:
 A numpy array

cluster_fit_ensemble_top_fit
(ensemble_list, score, rms_cutoff, res_target_map, sigma_coeff, number_top_mod=0, write=False, targetMap=False)[source]¶ RMSD clustering of the multiple “fits” starting from the best scoring model accordingly with a chosen score. Cluster the fits based on Calpha RMSD (starting from the best scoring model)
 Arguments:
 ensemble_list
 Input list of Structure Instances.
 targetMap
 Target Map Instance.
 score
Scoring function to use. See ScoringFunctions class for a list of the available Scoring Function. E.g. set score=’CCC’ to use the Crosscorrelation coefficient.
Score option are:
i ‘CCC’  Crosscorrelation coefficient;
ii ‘LAP’  Laplacianfiltered crosscorrelation coefficient: useful for maps with resolutions worse than 1015 A;
iii ‘MI’  Mutual information score: a good and robust score but relatively slow to calculate;
iv ‘ENV’  Envelope score: the fastest score to calculate due to binarisation of the map.
vvii ‘NV’,’NV_Sobel’,’NV_Laplace’ Normal vector score: a vectorbased surface superimposition score with or without Sobel/Laplace filter.
viii ‘CD’  Chamfer Distance: a score used in computer vision algorithms as a fast similarity metric
 rms_cutoff
 float, the Calpha RMSD cutoff based on which you want to cluster the solutions. For example 3.5 (for 3.5 A).
 res_target_map
 the resolution, in Angstroms, of the target Map.
 sigma_coeff
the sigma value (multiplied by the resolution) that controls the width of the Gaussian. Default values is 0.356.
Other values used :
0.187R corresponding with the Gaussian width of the Fourier transform falling to half the maximum at 1/resolution, as used in Situs (Wriggers et al, 1999);
0.225R which makes the Fourier transform of the distribution fall to 1/e of its maximum value at wavenumber 1/resolution, the default in Chimera (Petterson et al, 2004)
0.356R corresponding to the Gaussian width at 1/e maximum height equaling the resolution, an option in Chimera (Petterson et al, 2004);
0.425R the fullwidth half maximum being equal to the resolution, as used by FlexEM (Topf et al, 2008);
0.5R the distance between the two inflection points being the same length as the resolution, an option in Chimera (Petterson et al, 2004);
1R where the sigma value simply equal to the resolution, as used by NMFF (Tama et al, 2004).
 number_top_mod
 Number of Fits to cluster. Default is all.
 write
 True will write out a file that contains the list of the structure instances representing different fits scored and clustered. note the lrms column is the Calpha RMSD of each fit from the first fit in its class

rank_fit_ensemble
(ensemble_list, score, res_target_map, sigma_coeff, number_top_mod=0, write=False, targetMap=False, cont_targetMap=None)[source]¶ RMSD clustering of the multiple “fits” accordingly with a chosen score. Cluster the fits based on Calpha RMSD (starting from the best scoring model)
 Arguments:
 ensemble_list
 Input list of Structure Instances.
 targetMap
 Target Map Instance.
 score
Scoring function to use. See ScoringFunctions class for a list of the available Scoring Function. E.g. set score=’CCC’ to use the Crosscorrelation coefficient.
Score option are:
i ‘CCC’  Crosscorrelation coefficient;
ii ‘LAP’  Laplacianfiltered crosscorrelation coefficient: useful for maps with resolutions worse than 1015 A;
iii ‘MI’  Mutual information score: a good and robust score but relatively slow to calculate;
iv ‘ENV’  Envelope score: the fastest score to calculate due to binarisation of the map.
vvii ‘NV’,’NV_Sobel’,’NV_Laplace’ Normal vector score: a vectorbased surface superimposition score with or without Sobel/Laplace filter.
viii ‘CD’  Chamfer Distance: a score used in computer vision algorithms as a fast similarity metric
 rms_cutoff
 float, the Calpha RMSD cutoff based on which you want to cluster the solutions. For example 3.5 (for 3.5 A).
 res_target_map
 the resolution, in Angstroms, of the target Map.
 sigma_coeff
the sigma value (multiplied by the resolution) that controls the width of the Gaussian. Default values is 0.356.
Other values used :
0.187R corresponding with the Gaussian width of the Fourier transform falling to half the maximum at 1/resolution, as used in Situs (Wriggers et al, 1999);
0.225R which makes the Fourier transform of the distribution fall to 1/e of its maximum value at wavenumber 1/resolution, the default in Chimera (Petterson et al, 2004)
0.356R corresponding to the Gaussian width at 1/e maximum height equaling the resolution, an option in Chimera (Petterson et al, 2004);
0.425R the fullwidth half maximum being equal to the resolution, as used by FlexEM (Topf et al, 2008);
0.5R the distance between the two inflection points being the same length as the resolution, an option in Chimera (Petterson et al, 2004);
1R where the sigma value simply equal to the resolution, as used by NMFF (Tama et al, 2004).
 number_top_mod
 Number of Fits to cluster. Default is all.
 write
 True will write out a file that contains the list of the structure instances representing different fits scored and clustered. note the lrms column is the Calpha RMSD of each fit from the first fit in its class
