This is an old revision of the document!


Auto3DEM User Guide

Introduction

Auto3DEM is an automated system for 3D structure determination from images of vitrified particles. The software was specifically developed for icosahedral viruses, but should be able to handle particles of any symmetry as long as the particles are roughly spherical and their images can readily be divided into circular annuli. Even this restriction can be relaxed if auto3dem is run in refine mode from the beginning: if initial orientations for the images are not available, they can be determined using the global search option (see po2r arguments).

The software can be run in either serial or parallel mode, but for best performance the latter is preferred. The underlying programs called by auto3dem have all been parallelized using MPI and you will need to have an MPI implementation (e.g. MPICH or OpenMPI) installed on your computer hardware in order to run in parallel mode.

If a starting model is not available, one can be constructed (only for icosahedral particles) with the aid of the script setup_rmc that is distributed with the software (see Generating a starting model). setup_rmc creates the input files needed to construct multiple random models, run each for a specified number of iterations, and compare the results to identify what is likely the best starting map.

Running auto3dem

Data files and directory structure

Auto3DEM does not require the data files to be in any particular location, but we find it most convenient to place the particle parameter files and boxed image files in directories named dat and pif, respectively. An example directory structure is given below.

Virus/
      dat/
          file1.dat_000
          file2.dat_000
      pif/
          file1_box.pif
          file2_box.pif

To maximize flexibility and avoid having to edit the particle parameter files when calculations are run in different locations, we suggest using relative path names to specify the locations of the corresponding boxed images. For example, the first line of the parameter file file1.dat_000 would contain the line

../pif/file1_box.pif

Launching a new run

Auto3DEM is launched from the Unix/Linux command line or from within a batch script using the following syntax

auto3dem -ncpu ncpu -input infile [-nodefile nfile]

where

ncpu = number of CPUs (can also use -np)
infile = auto3dem input file
nfile = name of file containing list of nodes (optional)

The node file does not generally need to be specified. This option is provided for those users who wish to run on a specific set of nodes. It is also used internally for jobs run on batch systems that use the PBS scheduler.

If launching auto3dem from the command line, you will most likely wish to run the program in the background by appending an ampersand (&) to the end of the line. When running on the same machine where the job was submitted, the CPU and memory usage can be monitored using the Linux top command.

Other than the number of CPUs, all parameters that control the behaviour of auto3dem are set in the input parameter file. The majority of the rest of this document focuses on constructing this file.

Summary file

auto3dem writes out a summary file that keeps track of important reconstruction parameters as a function of iteration number. Example output is shown below

------ AUTO3DEM version v4.05.1 (parallel) ------

 
itr:          iteration number
              0 = starting map constructed from images
mode:         s(n) = search mode (PPFT)  w/ n=binfactor
              r(l) = refine mode (PO2R)  local search
              r(g) = refine mode (PO2R)  global search
              r(m) = refine mode (PO2R)  only magnification
              r(c) = refine mode (PCTFR) only CTF
estres:       estimated resolution of model
              * = FSC never < 0.5
delta:        spacing between solutions tested (degrees/microns)
map undamp:   resolution to which map is computed
map damp:     resolution at which map density damped to zero
time:         wallclock time for iteration
cpu:          number of CPUs (MPI processes)
nptles:       number ptles used to construct map
ntot:         total number ptles in parameter files
nmg:          number of micrographs
defocus       defocus range (microns)
 
All resolutions expressed in Angstroms, times in seconds

                 delta   map    map
itr mode estres  angle undamp  damp   time cpu nptles  ntot  nmg  defocus
--- ---- ------ ------ ------ ------ ----- --- ------ ------ --- ---------
  1 s(2) 21.98    2.07  15.79  13.64    27   2    606    608   5 1.58-4.22
  2 s(2) 18.00    1.70  14.03  12.30    25   2    607    608   5 1.58-4.22
  3 s(2) 18.08    1.51  13.65  12.01    27   2    607    608   5 1.58-4.22
  4 r(l) 14.79    0.74  10.17   9.23    87   2    607    608   5 1.58-4.22
  5 r(l) 12.05    0.55  10.14   9.20    47   2    607    608   5 1.58-4.22

When a new run is launched, a new summary file will be generated with a name based on the current directory (e.g. Virus_summary). If a summary file already exists, it will be copied to a new location (e.g. Virus_summary_backup). For continuation runs (see option auto restart), new results will be appended to the existing summary file.

Restarting auto3dem

auto3dem writes two restart files for each iteration. The first is written after the new particle origins and orientations are calculated and the second after the new map is generated. The names of the restart files are derived from the name of the directory where auto3dem was launched and are labelled with both the iteration number and either the letter a or b, to indicate the first or second restart file for the iteration. For example:

Virus_restart_1a
Virus restart_1b

The restart files will contain all of the keyword parameters. Default values are supplied as necessary and other parameters are updated to reflect the progress of the reconstruction (e.g. improvements in resolution, naming of particle parameter files to reflect iteration number, etc.)

If auto3dem runs to completion, the restart files are moved to a directory that is automatically created and named using the name of the directory where auto3dem was launched (e.g. Virus_RESTARTS). A continue file is also generated in the run directory (e.g. Virus_continue). The only difference between the continue file and the restart files is that the former will have the number of iterations reset to whatever was used in the initial auto3dem parameter file whereas the latter will be configured to complete the original run.

To restart or continue a run, just launch auto3dem using the appropriate input file. For example:

auto3dem -ncpu ncpu -input Virus_continue
auto3dem -ncpu ncpu -input Virus_restart_5b

As of version 4.05.1, the name of continue file generated at the end of each cycle of iterations will change according to the next iteration number to run. For example, when you run auto3dem for the first 10 iterations with the input file Virus_master, the name of the continue file will be Virus_continue_11.

Auto3dem input file

Auto3dem uses a single keyword-based file for specifying the auto3dem input. We describe below the general rules for the input file and the allowed format for the two types of records: auto3dem control parameter records and data records.

General formatting rules

The following rules apply to the entire input file.

  • The ordering of lines is irrelevant. However, if a keyword is specified more than once, the value associated with the latter occurrence will override any previous one.
  • Extra whitespace is ignored. Leading/trailing whitespace and blank lines are ignored. Contiguous blocks of whitespace are treated the same as a single space. Embedded whitespace is not allowed in character string input. For example, data file 1.pif would not be a valid string.
  • The hash/number (#) sign indicates the start of a comment. Entire lines can be commented out, or comments can be added to the end of a line. Hash signs are not allowed in character strings.
  • Extra fields are ignored. Input parameters are specified using three fields, while data files only use two fields. Any data appearing after the end of the last required field is ignored. The one exception to this rule is that the email recipient field may consist of multiple addresses separated by whitespace and/or commas.
  • Fields are case insensitive, except for character string input. Identifiers and keywords are internally converted to lowercase. Character strings specifying directories, file names, and binaries must be typed using the correct case.

Auto3dem control parameter records

These lines control the overall behavior of both auto3dem and the underlying image reconstruction codes (P3DR, PCTFR, PCUT, PO2R, PPFT, and PSF) that are called by auto3dem. With the exception of the email recipient line, which may contain an arbitrary number of recipients, all records have a three-field format

identifier key value

The first field (identifier) is used to distinguish whether the record contains an auto3dem control parameter or input for one of the image processing programs. The second and third fields form key-value pairs corresponding to the name of the input parameter and its values. The following case-insensitive values are allowed as identifiers:

auto  - auto3dem control parameter         (controls workflow)
p3dr  - P3DR input parameter               (map reconstruction)
pctfr - PCTFR input parameter              (CTF refinement)
pcut  - PCUT input parameter               (map masking)
po2r  - PO2R input parameter               (orientation refinement)
por   - PO2R input parameter               (provided for back compatibility)
ppft  - PPFT input parameter               (orientation search)
psf   - PSF input parameter                (resolution estimation)

The value for each key can be a string, a numerical value, or a flag (0 or 1, like no/false or yes/true).

Data records

All data files are specified using a two-field format.

data filename

The first field in the record must be the keyword data. The data lines can appear anywhere, but as a matter of convenience they are normally located at the end of the file. The file names are case sensitive. It is not necessary to provide full paths to the data files since the directory containing these files is specified using an auto3dem control parameter.

Minimal required input

Default values can be used for the majority of the auto3dem input parameters, but some values must still be supplied. The listing below shows an example of minimal input file.

auto  mode        search      # Search mode (using ppft)
auto  niter       10          # 10 iterations
auto  start_map   start.pif   # Starting map
p3dr  res_min     8.5         # Resolution to which map is computed    
data  file1.dat_000           # At least one data file required

When using the gold standard approach, two additional input parameters are needed, pointing to the two independent maps

auto    start_map_even  start_even.pif
auto    start_map_odd   start_odd.pif

Full input

When launching a new run, it is typically easiest to start with a minimal input file. Restart and continuation files will contain all keywords and can easily be edited.

Better Performance

Auto3DEM can make a reasonable estimate for the inner and outer diameters of the capsid. If these radii are known (e.g. from inspecting the central section of the starting map), higher resolution can sometimes be achieved by specifying the following parameters

auto_freeze_annulus 1 # Keep annulus_low/high, in_rad/out_rad fixed
ppft annulus_low    n # inner radius of capsid (including protrusions)
ppft annulus_high   n # outer radius of capsid (including protrusions)
pcut in_rad         n # inner radius of capsid (excluding protrusions)
pcut out_rad        n # outer radius of capsid (excluding protrusions)

If the images have a small pixel size, often you can speed up the computation by using the following combination of parameters, which starts the search mode calculations using binned image data

auto bin_reduce   1 # Automatically reduce bin_factor if resolution does not improve
ppft bin_factor   2 # Start with 2x2 binning of images

Full listing of auto3dem keywords

auto

For the sake of clarity, the Auto3DEM input parameters are divided into two sets. The first set contains the general parameters, while the second contains the parameters related to particle selection criteria.

general parameters

adapt_angle: controls whether or not the orientation angle step for search/refinement in PPFT/PO2R is adaptively determined from the particle size and the current level of resolution of the reconstruction. Allowed values = (yes,no).

bin_reduce: controls whether or not PPFT bin_factor should be reduced when resolution of reconstruction fails to improve.

boxrad: radius (pixels) of image box.

delete_maps: non-zero value specifies that the maps generated at intermediate stages of the reconstruction should be deleted. Only the map generated at the last of the specified number of iterations (auto niter) will be saved.

estimate_res: non-zero value indicates that resolution estimation is performed.

flatten_map: flag specifying whether or not background density should be removed from the reference map.

flatten_falloff: size (pixels) of soft edge to use when removing background density flag (-1= automatically determined, based on map resolution; 0=hard edge; n=soft edge n pixels width).

freeze_annulus: non-zero value freezes inner and outer radii of the annulus defining the ordered region of the map. This affects the parameters annulus_low and annulus_high in PPFT and in_rad and out_rad in PCUT.

freeze_res: non-zero value freezes the resolutions used in PPFT, and PO2R (resolution used in P3DR will still be based on the results from PSF).

fsc_hithresh: cutoff value for FSC used in estimating map resolution.

fsc_lothresh: cutoff value for FSC used to set resolution limits in P3DR, PSF, and PO2R.

gauss_adj: parameter used to set width of Gaussian falloff in P3DR (reciprocal angstroms).

generate_map: controls whether to generate or not the reconstruction in combination with the alignment of the particles. If the map is not calculated, the parameter niter is forced to one; also, in the new continue file the parameter iter_start will not be updated, and generate_map will be set for calculating the map (mode = ‘only’). If only the map is calculated, the filename of the reconstruction can be changed from the default by using the parameter map_suffix. Allowed values = (yes,no,only).

have_map: non-zero value indicates that starting map is available.

hollow_auto: flag specifying whether or not an optimum value for inner and outer radii should be automatically determined (it requires hollow_map set to 1 and if set it will overwrite the values in hollow_in_rad and hollow_out_rad.

hollow_cut_step: number of steps used by masking algorithm when generating a hollow map.

hollow_cut_weight: weight used by masking algorithm when generating a hollow map.

hollow_in_rad: inner radius of hollowed map.

hollow_map: flag specifying whether or not map should be hollowed.

hollow_out_rad: outer radius of hollowed map.

iter_start: starting iteration, i.e. number assigned to first iteration.

map_suffix: string suffix to append to the name of the reconstruction. Used only if generate_map is set to ‘only’. Default is ‘none’, a reserved string indicating that the name is assigned according to the standard rules. :!:Warning: ‘none’ implies that for multiple runs of only-reconstructions each one will overwrite the previous result.

mode: AUTO3DEM mode of operation. Allowed values = (search, refine).

new_ptles: flag specifying whether or not new particles should be oriented relative to existing map without updating the map.

niter: maximum number of iterations of AUTO3DEM main loop.

noctf: if true, disables CTF correction. Overrides CTF mode and sets to zero for programs P3DR, PO2R, PPFT, and PCTFR. Used primarily with image sets which have already been CTF corrected.

noise_suppressio: apply Rosenthal and Henderson JMB 333 721-745 (2003) noise suppression algorithm.

outfile: base name used to construct names of log, summary, restart, and continuation files.

partrad: radius (pixels) of particle. Used for some adaptive estimates. This value can be initially set by the user; if not set, its default is the value of the boxrad parameter. After each iteration its value is determined from the radial profile of the latest reconstruction.

per_ptle_ctf: apply CTF correction on a per-particle basis in P3DR, PO2R, PCTFR and PPFT. Setting to one overrides per_ptle_ctf parameter set for individual programs.

quit_early: set to non-zero value to have AUTO3DEM quit if the FSC curve never drops below fsc_hithresh (usual value is fsc_hithresh=0.5). This option is normally only used for random model calculations where it is set automatically by setup_rmc.pl if the resolution-based selection criterion is selected (option –trad).

refine_ctf: refine CTF parameters when running in refine mode, to be performed as the first iteration when set (the flag is automatically re-set to 0 so that the next iteration will be a conventional refinement).

res_adj: additive parameter that determines the higher resolution to which map will be calculated beyond upper resolution limit used in PO2R (reciprocal angstroms).

restart: set to 1 to continue calculation.

rmc: set to 1 to perform a random model computation.

rundir: directory containing input data (maps, images, particle parameters).

start_map: name of starting map used by AUTO3DEM.

switch_mode: if true, allows auto3dem to automatically switch from search to refine mode.

symm_code: symmetry code.

term_refine: allow automatic termination of run when in refine mode (functionality not currently active, added as placeholder).

term_search: allow automatic termination of run when in search mode (functionality not currently active, added as placeholder).

particle selection parameters

box_center_offset: maximum allowable distance between the center of the particle and the center of the box; applied separately to each coordinate. Particles with centers too far from center of box are excluded from the model.

cmp_cc_fraction: fraction of images to accept on the basis of the CMP correlation coefficient. Makes sense only when parsing particle parameter files generated in search mode.

cmp_cc_nstd: number of standard deviations to add to the average CMP correlation coefficient when setting cutoff. Negative values are less restrictive, positive values are more restrictive.

global_select: if set to true (non-zero) value, then selection criteria are applied globally across particle parameter files. Otherwise, selection criteria are applied on a per file basis.

nselect_offset: number of selection criteria to evaluate in each ‘direction’ from the central selection criterion. The total number of selection criteria to be evaluated is (2*nselect_offset + 1).

omega1, omega1_tol: select images with omega within omega1_tol of omega1. Must be used together.

omega2, omega2_tol: select images with omega within omega2_tol of omega2. Must be used together.

pft_cc_fraction: fraction of images to accept on the basis of the PFT correlation coefficient. Makes sense only when parsing particle parameter files generated in search mode.

pft_cc_nstd: number of standard deviations to add to the average PFT correlation coefficient when setting cutoff. Negative values are less restrictive, positive values are more restrictive.

phi_reject_lower / phi_reject_upper: range of azimuthal angles (phi_reject_lower < phi < phi_reject_upper) for which images will be excluded from map construction.

prj_cc_fraction: fraction of images to accept on the basis of the PRJ correlation coefficient. Makes sense only when parsing particle parameter files generated in search mode.

prj_cc_nstd: number of standard deviations to add to the average PRJ correlation coefficient when setting cutoff. Negative values are less restrictive, positive values are more restrictive.

score_fraction: fraction of images to accept on the basis of the score generated by program PO2R.

score_nstd: number of standard deviations to add to the average score when setting cutoff. Negative values are less restrictive, positive values are more restrictive.

select_delta: the size of the ‘step’ to be used when evaluating multiple selection criteria. For standard deviation-based criteria, adds a fixed number of standard deviations; for fraction-based criteria, adds a fixed fraction.

theta_reject_lower / theta_reject_upper: range of inclination angles (theta_reject_lower < theta < theta_reject_upper) for which images will be excluded from map construction.

p3dr

apo_border: width of border region for map apodization (pixels).

bin: name of P3DR binary.

ctf_ff1: 1st CTF filter factor.

ctf_ff2: 2nd CTF filter factor.

ctfmode: CTF mode (ctf_mode also accepted).

fsc_file_name: name of FSC file to be used when applying noise suppression algorithm. (File format: line 1 = number of FSC records; subsequent lines = spatial frequency (Å-1) FSC value).

filter: filter mode.

magfactor: magnification factor.

map_dim: map dimension.

max_cpu: maximum number of CPUs to be used by P3DR.

per_ptle_ctf: apply CTF correction on a per-particle basis.

res_max: resolution at end of Gaussian falloff.

res_min: resolution to which map is computed with amplitudes unaltered.

symm_code: symmetry code.

tempfac: temperature factor.

zero_fill: zero fill for background pixels, i.e. padding factor in real space.

pctfr

anastigm: enable/disable enforcing of anastigmatic behavior for the CTF.

bin: name of PCTFR binary.

ctfmode: CTF mode (ctf_mode also accepted).

dangle: CTF astigmatism angle step size, in degrees.

dfocus: CTF defocus step size, in microns.

funcmode: function mode.

funcweight: function weight.

max_cpu: maximum number of CPUs to be used by PCTFR.

nangle: number of steps in CTF astigmatism angles taken in each direction.

ndefocus: number of steps in CTF defocus values taken in each direction.

res_max: maximum resolution used in image/projection comparison.

res_min: minimum resolution used in image/projection comparison.

tempfac: temperature factor.

zero_fill: zero fill for background pixels, i.e. padding factor in real space.

pcut

bin: name of PCUT binary.

cut_step: number of steps used by masking algorithm.

cut_weight: weight used in masking algorithm.

in_rad: inner radius for masking.

max_cpu: maximum number of CPUs to be used by PCUT.

out_rad: outer radius for masking.

po2r

bin: name of PO2R binary.

ctfmode: CTF mode (ctf_mode also accepted).

dangle: angular step size for local mode(delta angle, degrees).

dcenter: spatial step size (delta xy, pixels).

funcmode: function mode.

funcweight: function weight.

gangle: angular step size for global mode.

handtest: enable/disable handedness tests for images.

magref_calibrate: 1 to keep magnification values as obtained by refinement (new map can be at different scale), 0 to adjust them to the same previous average magnification.

magref_reset: 1 to ‘forget’ magnification assigned to data, 0 to refine around the current estimated value.

magref_step: size of magnification refinement step (microns).

max_cpu: maximum number of CPUs to be used by PO2R

mode: search mode. Allowed values = (local,mag,global,ticos_equiv)*.

local: local refinement (nangle steps of dangle degrees along each direction)
mag: magnification refinement (nmagref steps of magref_step microns along each direction)
global: global search in one asymmetric unit (on a grid with step of gangle degrees)
ticos_equiv: restricted search to the 60 symmetry related orientations.

nangle: number of angular steps taken in each direction.

ncenter: number of spatial steps taken in each direction.

nmagf: number of magnification factor steps along each direction.

per_ptle_ctf: apply CTF correction on a per-particle basis.

quick_search: implement quick approximate search of orientation space.

res_max: maximum resolution used in image/projection comparison.

res_min: minimum resolution used in image/projection comparison.

symm_code: symmetry code.

tempfac: temperature factor.

zero_fill: zero fill for background pixels, i.e. padding factor in real space.

ppft

The PFTsearch/PPFT input parameters pftrads_filename, pftres1_filename, and pftres2_filename are not read from the input parameter file since they are set by AUTO3DEM. They are assigned the values ppft_iter_n.rads, ppft_iter_n.res1, and ppft_iter_n.res2, respectively, where n is the iteration number.

annulus_high: outer radius of annulus for image/projection comparison.

annulus_low: inner radius of annulus for image/projection comparison.

bin: name of PPFT binary.

bin_factor: binning factor.

ctf_mode: CTF mode (ctfmode also accepted).

delta_theta: step size for inclination angle theta (degrees).

filter_factor_1: 1st filter factor.

input_mode: input mode.

jcut: minimum order Bessel function (Jn), default strongly recommended!

mag_cen: midpoint for magnification scale search.

mag_norm: switch used to normalize the MAG scale factors so that the average MAG is 1.0.

mag_num: extent of magnification search window.

mag_step: grid size of magnification scale search.

max_cpu: maximum number of CPUs to be used by PPFT.

per_ptle_ctf: apply CTF correction on a per-particle basis.

pft_filename: name of PFT file.

pftrad_hi: outer PFT radius.

pftrad_lo: inner PFT radius.

pftrad_step: PFT radius step size.

prj_filename: file name for prj output.

quick_omega: perform fast approximate search for omega.

resolution_high: upper resolution limit.

resolution_low: lower resolution limit.

sigcut: threshold for variance mask when filtering PFT data, default strongly recommended!

symm_code: alternative for specifying symmetry code (symmetry also accepted)

temperature_factor: temperature factor

verbose: verbose factor (controls level of output)

psf

The PSF input parameters pixel_size is not read from the input parameter file since it is obtained by parsing the first line of the particle parameter files.

bin: name of PSF binary.

max_cpu: maximum number of CPUs to be used by PSF.

res_max: maximum resolution used in FSC calculation.

res_min: minimum resolution used in FSC calculation.

res_step: resolution step size.

Auto3dem keyword default values

The following tables list the auto3dem input parameters along with their default values. A missing default value means that no default is used. Required input shown in bold <hi red>red </hi>.