Processing Tab
Table of Contents
Overview
Once a data collection Run has completed with 5 or more images, the data are automatically processed using ICEflow - an SSRL automatic pipeline that runs several 3rd party data processing pipelines that provide quick feedback on the quality of each data set.
Data processing run information and selected results are written to the Sample database.
Once the run information appears in the database, it is displayed in the new Processing Tab (selected results will be displayed in a future release).
If the Processing Tab is not displaying processing information:
If a group does not want auto-processing results written to the Sample database, they can opt out by contacting their support staff. If a group opts out, no results will be displayed in the Processing Tab, however auto-processing will still be carried out and the results can be found in the auto-processing directories.
Processing Tab Layout and Navigation
Each Processing Tab row displays information for each data processing run. Currently the table only shows general processing parameters:
- Start Time
- Cassette Port
- Run Number
- Image Filename
- Image Directory
- Processing Directory
- Pipeline and Version
The next release will include selected processing results.
Clicking on a row will expand and highlight the row for a better view. The table can be quickly traversed using arrow keys.
The first column in the table is used for sorting. The default sorting is on Start Time. Click on the drop down in the column header to select another column for sorting. Currently, Port, Start Time, Run, and Pipeline can be selected for sorting.
The first column can also be used to sort in the reverse direction by clicking on the arrow in the top right-hand corner of the column header.
The table columns can be resized by hovering near the side of the column. When the cursor turns into a cross, click and drag the side of the column to the desired size.
The “Help” button opens a popup window with documentation for the Processing Tab.
Where Are My Data Located?
ICEflow carries out the automated data processing on the SSRL beamlines using 3rd party data processing software. Each pipeline processes data using different programs, algorithms and parameters and the results are saved in separate directories.
The data processing directories can be accessed by looking inside the same directory as the collected diffraction images (double click on the image directory name in the table to open the directory). However, these are symbolic links to a second directory where each of the data processing files are actually located. We add these links to make it easier to locate the results while also preserving the directory tree produced by the processing programs. For example, inside the image directory you should see a directory like this:
autoprocessing_{3rd_party_software}_{unique_id}
where ‘3rd_party_software’ is the name of the software employed by ICEflow to carry out the automated processing (currently with autoPROC). For example, autoPROC processing results would have this symbolic link:
autoprocessing_autoproc_531066_20b60dx
The symlinks point to an actual processing directory, which is stored on the /data disk and are named using a descriptive naming convention:
/data/{username}/autoprocessing/{3rd_party_software}/{filename_prefix}_{run_number}_{date}_{time}
Thus, the above symlink would point to:
/data/{username}/autoprocessing/autoproc/test_set1_2_06182024_110055
This processing directory contains a README file that provides the version of ICEflow that was used to automatically process the data, as well as the explicit path to the image files and the filename template for the image files. It also contains the names of the 3rd party software used and their version numbers as well as the command strings that were used to run the software for each resolution cutoff. Publication references for all the programs used in the pipeline are also provided in this file.
Also in the processing directory are three logfiles, titled out-{cutoff}.log , where "cutoff" represents a resolution cutoff criteria applied for each of the three autoPROC runs carried out in parallel. The results for each of these runs are written in subfolders that are titled after the cutoff criteria. Thus, using the above path example, results from an autoPROC run that used I/sig(I) >= 1.5 as a resolution cutoff criterion would be found under:
/data/{username}/autoprocessing/autoproc/test_set1_2_06182024_110055/isigi/
Pipelines Running before the Inception of ICEflow (before 6/5/2024)
For the SSRL pipeline (and the xia2 test pipeline), look in the image file directory for the symbolic link to the directory with the processed data (these links will all begin with 'autoprocessing'). The processing directory contains the README file that describes the pipeline used for data processing. If you can't find what you're looking for, contact your user support staff member.
ICEflow (autoPROC) Pipeline
In this version of ICEflow, autoPROC v1.0.5 is deployed in a default configuration. Future versions of ICEflow may incorporate changes made to the pipeline, 3rd party software, versions, configurations, input parameters, etc. which will be listed listed and documented in the next section.
Resolution Cutoffs
- cc12 – Data are processed using a resolution cutoff corresponding to a value of CC1/2 that is dynamically calculated by autoPROC.
- isigi - Data are processed using a resolution cutoff corresponding to a value of I/σ(I) that is calculated dynamically by autoPROC.
- nocutoff - Data are processed without using a resolution cutoff, i.e. to the corners of a rectangular detector.
Programs and Output
- autoPROC - the data processing pipeline. The general log files are written into the top directory:
XDS – performs indexing, refinement, and integration. The input file XDS.INP - generated automatically by autoPROC - supplies the default parameters to the program and is based upon information stored in the header of the diffraction images (detector type and distance, oscillation start and range, number of images in the date set, etc.). The important output files from XDS can be found in the cutoff subfolders, which contain:
IDXREF.LP - the results of the automated indexing to find the unit cell parameters and an idea of what the crystal symmetry is.
INTEGRATE.LP - the full log of the processing.
CORRECT.LP - gives an indication of the data quality and resolution.
XDS_ASCII.HKL - contains integrated intensities.
POINTLESS – is run often during the workflow to analyzes the data for twinning, symmetry, and will identify the correct space group.
AIMLESS - takes the output from POINTLESS, calculates scale factors between all the images in the data set, applies the scales, and merges all the reflection data together to give an output file containing one copy of each reflection (the unique data set). While the key output from AIMLESS is included in the general autoPROC output file; the full log can be found in the {cutoff} directory.
CTruncate - reads the output from AIMLESS and attempts to put the data onto an absolute scale and generates structure factor amplitudes (F) from the reflection intensities (I). Its output can be found in the {cutoff} directory
References
- AUTOPROC - Vonrhein, C., Flensburg, C., Keller, P., Sharff, A., Smart, O., Paciorek, W., Womack, T. & Bricogne, G. Data processing and analysis with the autoPROC toolbox. Acta Crystallographica D67, 293-302 (2011)
- AUTOPROC - Vonrhein, C., Flensburg, C., Keller, P., Fogh, R., Sharff, A., Tickle, I.J. and Bricogne, G., Advanced exploitation of unmerged reflection data during processing and refinement with autoPROC and BUSTER. Acta Crystallographica D80(3) (2024).
- XDS - Kabsch, W. XDS. Acta Crystallographica D66, 125-132 (2010)
- POINTLESS - Evans, P.R. Scaling and assessment of data quality, Acta Crystallographica D62, 72-82 (2006)
- AIMLESS - Evans, P.R. and Murshudov, G.N. How good are my data and what is the resolution? Acta Crystallographica D69, 1204–1214 (2013)
How to Modify the Processing Script and Reprocess Datasets
- Move to the top processing folder:
> cd /data/{username}/autoprocessing/autoproc/{unique_id}/
- Copy the processing script for whichever cutoff you prefer:
> cp run-{cutoff}.sh my_new_run.sh
- Open
my_new_run.sh in the geany text editor and modify the autoPROC launch string as needed using:
> geany run.sh
CRITICAL - make sure to change the folder name after the -d argument or the script will not run! E.g.:
process [pre-existing arguments] -d new_proc_folder [rest of arguments]
- Save the new version and run my_new_run.sh:
> my_new_run.sh
ICEflow Versions and Release Notes
- ICEflow-1.2.1 (patch) (autopPROC) - released on 06/21/2024
- Fixed issue where autoPROC cannot index images collected with a vertical offset on a Pilatus detector.
- Changed symlinks in data folder to point to the top processing folder.
- ICEflow-1.2.0 (autoPROC) - released on 06/18/2024
- Changed to a more descriptive convention for output folder:
/data/{username}/{3rd_party_software}/{filename_prefix}_{run_number}_{date}_{time}/{cutoff}
- The
README file now contains explicit path to source data and image file template, for easier reference.
- Documentation below edited to reflect these changes.
- ICEflow-1.1.0 (autoPROC) - released on 06/12/2024
- Fixed an issue where both cutoff versions output data with a CC1/2-based resolution cutoff; the I/sig(I)-based cutoff is now enabled.
- Added a no-cutoff option as documented below.
- The summary.html files are now copied to the top folder for easier access.
- ICEflow-1.0.0 (autoPROC) - released on 06/05/2024
- Initial release, with only autoPROC pipeline enabled.
More information on the software supported by the SSRL-SMB Macromolecular Crystallography division is available on our software webpage.
|