Processing Tab

Table of Contents


Overview

Once a data collection Run has completed with 10 or more images, the data are automatically processed using ICEflow - an SSRL automatic pipeline that runs several 3rd party data processing applications that provide quick feedback on the quality of each data set.

Data processing run information and selected results are written to the Sample database.

Once the run information appears in the database, this information (with selected results, see below) is diplayed in the Processing Tab.

If the Processing Tab is not displaying processing information:

  • Make sure a spreadsheet has been assigned to the cassette in the Screening Tab (which is required for writing to the Sample database). A default spreadsheet can be created and assigned to the cassette.
  • If you have assigned the correct spreadsheet to your cassette in the Screening Tab and you still cannot see processing information, let your beamline support staff know.

If a group does not want auto-processing results written to the Sample database, they can opt out by checking the appropriate checkbox on the SSRL SMB Unix account request form (applies upon submittal) or by contacting their support staff. If a group opts out, no results will be recorded in our database or displayed in the Processing Tab, however auto-processing will still be carried out and the results can be found in the auto-processing directories.

Processing Tab Overview



Processing Tab Layout and Navigation

Each Processing Tab row displays information and selected results for each data processing run; the table can be quickly traversed using scroll bars ands arrow keys. You can use a drop-down menu to select from four Display Options: Minimal, Less, More, All. (The "All" option adds additional columns for staff troubleshooting)

  • Minimum - a bare-bones configuration shown only the most important information:
    1. Start Time
    2. Status
    3. Port
    4. Run
    5. SG
    6. Unit Cell
    7. Resolution
    8. Summary file
    9. Error
  • Less - a descriptive configuration showing some additional important information:
    1. Start Time
    2. Status
    3. Port
    4. Run
    5. SG
    6. Unit Cell
    7. Resolution
    8. Mosaicity
    9. Summary file
    10. R-factors
    11. Summary Stats
    12. Anomalous
    13. Error
  • More - a more comprehensive configuration showing many details:
    1. Start Time
    2. Status
    3. Port
    4. Run
    5. Crystal ID
    6. Protein
    7. Filename
    8. Image Directory
    9. Processing Directory
    10. SG
    11. Unit Cell
    12. Resolution
    13. Mosaicity
    14. Summary File
    15. R-factors
    16. Summary Stats
    17. Anomalous
    18. Error
    19. 3rd Party Pipeline
    20. 3rd Party Pipeline Version
  • All - all information (for staff and troubleshooting):
    1. Start Time
    2. Status
    3. Port
    4. Run
    5. Crystal ID
    6. Protein
    7. Filename
    8. Image Directory
    9. Processing Directory
    10. SG
    11. Unit Cell
    12. Resolution
    13. Mosaicity
    14. Summary File
    15. R-factors
    16. Summary Stats
    17. Anomalous
    18. Error
    19. 3rd Party Pipeline
    20. 3rd Party Pipeline Version
    21. Sample ID
    22. SIL ID
    23. Job ID
    24. Hostname

NOTE: the "Less" configuration is shown in the above screenshot.

The "Processing Method" drop-down menu can be used to select a resolution cutoff for which you would like to view the results.

The first column in the table is used for sorting. The default sorting is on Start Time (which is the time when the job was submitted to the queue). Click on the drop down in the column header to select another column for sorting.

Processing spreadsheet sorting by column.

The first column can also be used to sort in the reverse direction by clicking on the arrow in the top right-hand corner of the column header.

The widths of the table columns can be changed by hovering the mouse pointer near the side of the column. When the pointer turns into a cross, click and drag the side of the column to the desired width.

The “Help” button opens a webpage with documentation for the Processing Tab.

The "Status" column indicates the current status of the auto-processing job:

  • Pending - the job has been added to the queue but not started yet
  • Submitted - the job has been started
  • Running - the job is currently in progress
  • Error - (highlighted in red) the job has exited with an error. The associated error message can be found in the column labeled "Error". The processing error messages are extracted from the general log file (see below).
  • Completed - the job has finished without errors

The "Summary" column will show a path to a summary HTML file (e.g. 00_summary.html for autoPROC)

  • Double-clicking on the cell with the filename will open the file as a webpage in a web browser.
  • Periodically refreshing the page will show the updated statistics as the processing job runs.
  • The file will also display any error messages and warnings that come up.


How to Interpret Error Messages

  • If an error occurs, it will be displayed in the Error column on the Processing Tab. There are two general types of errors:
    • ICEflow errors (these should be labeled "ICEFLOW ERROR")
    • Processing software errors (these should be labeled "AUTOPROC ERROR" for autoPROC)
  • If you see ICEflow errors, please contact your beamline staff, who will forward this information to ICEflow developers.
  • autoPROC errors most often reflect issues with processing the data; some of the most common types are:
    • indexing errors by XDS
    • integration errors by XDS
    • scaling errors in apScale or XSCALE
  • ICEflow extracts autoPROC error messages from the "top" log file (typically named out-{cutoff}.log); these errors most often point to the log files for specific processes (e.g. indexing) and provide relative paths to them. Inspect these logs if you need more detailed information about what went wrong.
  • The autoPROC manual lists a few common errors that can be encountered when running autoPROC as well as a few general suggestions for how to handle them. ICEflow is designed to avoid the more basic errors (for example, all SSRL beamline-specific settings have been implemented already), but if any of these errors crop up anyway, please let the beamline staff know, so they can pass this information on to ICEflow developers.
  • If you need help with troubleshooting a data processing job and would like to contact the ICEflow development team directly, please email Art Lyubimov.

Where Are My Data Located?

ICEflow carries out the automated data processing on the SSRL beamlines using 3rd party data processing software. Each pipeline processes data using different programs, algorithms and parameters and the results are saved in separate directories. Currently, autoPROC is the only automated processing program ran by ICEflow, though more will be added in the future.

The data processing directories can be accessed by looking inside the same directory as the collected diffraction images (double click on the image directory name in the table to open the directory). However, these are symbolic links to a second directory where each of the data processing files are actually located. We add these links to make it easier to locate the results while also preserving the directory tree produced by the processing programs. For example, inside the image directory you should see a directory like this:


        autoprocessing_{3rd_party_software}_{unique_id}

where ‘3rd_party_software’ is the name of the software employed by ICEflow to carry out the automated processing (currently with autoPROC). For example, autoPROC processing results would have this symbolic link:

        autoprocessing_autoproc_531066_20b60dx
The symlinks point to an actual processing directory, which is stored on the /data disk and are named using a descriptive naming convention:

        /data/{username}/autoprocessing/{3rd_party_software}/{filename_prefix}_{run_number}_{date}_{time}_{queue_rank}

Thus, the above symlink would point to:

        /data/{username}/autoprocessing/autoproc/test_set1_2_06182024_110055_r1
This processing directory contains a README file that provides the version of ICEflow that was used to automatically process the data, as well as the explicit path to the image files and the filename template for the image files. It also contains the names and the publication references for all the programs used in the pipeline.

Also in the processing directory are three logfiles, titled out-{cutoff}.log, where "cutoff" represents a resolution cutoff criteria applied for each of the three autoPROC runs carried out in parallel. The results for each of these runs are written in subfolders that are titled after the cutoff criteria. Thus, using the above path example, results from an autoPROC run that used CC12 >= 0.300 as a resolution cutoff criterion would be found under:


        /data/{username}/autoprocessing/autoproc/test_set1_2_06182024_110055_r1/cc12/


Pipelines Running before the Inception of ICEflow (before 6/5/2024)

For the SSRL pipeline (and the xia2 test pipeline), look in the image file directory for the symbolic link to the directory with the processed data (these links will all begin with 'autoprocessing'). The processing directory contains the README file that describes the pipeline used for data processing. If you can't find what you're looking for, contact your user support staff member.


ICEflow (autoPROC) Pipeline

In this version of ICEflow, autoPROC v1.0.5 is deployed in a default configuration. Future versions of ICEflow may incorporate changes made to the pipeline, 3rd party software, versions, configurations, input parameters, etc. which will be listed listed and documented in the next section.

Resolution Cutoffs

  • cc12 – Data are processed using a resolution cutoff corresponding to a value of CC12 that is dynamically calculated by autoPROC.
  • isigi - Data are processed using a resolution cutoff corresponding to a value of I/σ(I) that is calculated dynamically by autoPROC.
  • nocutoff - Data are processed without using a resolution cutoff, i.e. to the corners of a rectangular detector.

Programs and Output

  • autoPROC - the data processing pipeline. The general log files are written into the top directory:
    • out-{cutoff}.log - log file(s) for the entire automated processing run.
    • {cutoff}/summary.html - result summary in webpage format, can be viewed by running Firefox:
      
              > firefox {cutoff}/summary.html
                      
      NOTE: {cutoff}_summary.html files can also be found in the top folder.
    • {cutoff}/truncate-unique.mtz – final MTZ file containing integrated intensities and structure factors.
    • {cutoff}/truncate-unique.table1 - Table1-formatted merging statistics corresponding to the above MTZ file
    • {cutoff}/staraniso-alldata-unique.mtz - final MTZ file processed using ellipsoidal truncation to account for anisotropy.
    • {cutoff}/staraniso-alldata-unique.table1 - Table1-formatted merging statistics corresponding to the above MTZ file
  • XDS – performs indexing, refinement, and integration. The input file XDS.INP - generated automatically by autoPROC - supplies the default parameters to the program and is based upon information stored in the header of the diffraction images (detector type and distance, oscillation start and range, number of images in the date set, etc.). The important output files from XDS can be found in the cutoff subfolders, which contain:
    • IDXREF.LP - the results of the automated indexing to find the unit cell parameters and an idea of what the crystal symmetry is.
    • INTEGRATE.LP - the full log of the processing.
    • CORRECT.LP - gives an indication of the data quality and resolution.
    • XDS_ASCII.HKL - contains integrated intensities.
  • POINTLESS – is run often during the workflow to analyzes the data for twinning, symmetry, and will identify the correct space group.
  • AIMLESS - takes the output from POINTLESS, calculates scale factors between all the images in the data set, applies the scales, and merges all the reflection data together to give an output file containing one copy of each reflection (the unique data set). While the key output from AIMLESS is included in the general autoPROC output file; the full log can be found in the {cutoff} directory.
  • CTruncate - reads the output from AIMLESS and attempts to put the data onto an absolute scale and generates structure factor amplitudes (F) from the reflection intensities (I). Its output can be found in the {cutoff} directory

References

  • AUTOPROC - Vonrhein, C., Flensburg, C., Keller, P., Sharff, A., Smart, O., Paciorek, W., Womack, T. & Bricogne, G. Data processing and analysis with the autoPROC toolbox. Acta Crystallographica D67, 293-302 (2011)
  • AUTOPROC - Vonrhein, C., Flensburg, C., Keller, P., Fogh, R., Sharff, A., Tickle, I.J. and Bricogne, G., Advanced exploitation of unmerged reflection data during processing and refinement with autoPROC and BUSTER. Acta Crystallographica D80(3) (2024).
  • XDS - Kabsch, W. XDS. Acta Crystallographica D66, 125-132 (2010)
  • POINTLESS - Evans, P.R. Scaling and assessment of data quality, Acta Crystallographica D62, 72-82 (2006)
  • AIMLESS - Evans, P.R. and Murshudov, G.N. How good are my data and what is the resolution? Acta Crystallographica D69, 1204–1214 (2013)

How to Modify the Processing Script and Reprocess Datasets

  1. Move to the top processing folder via the symbolic link included with your image files:
    
            > cd /data/{username}/{path_to_images}/autoprocessing_autoproc_{unique_id}/
            
  2. Copy the processing script for whichever cutoff you prefer:
    
            > cp run-{cutoff}.sh my_new_run.sh
            
  3. Open my_new_run.sh in the geany text editor and modify the autoPROC launch string as needed using:
    
            > geany run-{cutoff}.sh
            

    CRITICAL - make sure to change the folder name after the -d argument or the script will not run! E.g.:

    
            process [pre-existing arguments] -d new_proc_folder [rest of arguments]
            
            
  4. Save the new version and run my_new_run.sh:
    
            > ./my_new_run.sh
            

ICEflow Versions and Release Notes

  • ICEflow-1.5.0 - released on 03/19/2025
    • ICEflow jobs are now submitted to a queue using the Slurm workload manager.
    • Processing job submission parameters tweaked for optimal resource usage.
    • Major changes to the GUI:
      • A new status - "Pending" - appears for jobs waiting to run.
      • New columns added to the Layout, with two settings available to users.
      • A clickable link to a summary webpage file added.
    • autoPROC log parsing takes into account new formatting.
  • ICEflow-1.4.0 - released on 01/14/2025
    • Added data collection strategy calculation (iMosflm) to ICEflow packages
    • Strategy can be found on the Collect Tab
  • ICEflow-1.3.0 (autopPROC) - released on 09/12/2024
    • Added error extraction from logs and reporting in the Processing Tab.
    • Reorganized result reporting to database and UI.
  • ICEflow-1.2.1 (patch) (autopPROC) - released on 06/21/2024
    • Fixed issue where autoPROC cannot index images collected with a vertical offset on a Pilatus detector.
    • Changed symlinks in data folder to point to the top processing folder.
  • ICEflow-1.2.0 (autoPROC) - released on 06/18/2024
    • Changed to a more descriptive convention for output folder: /data/{username}/{3rd_party_software}/{filename_prefix}_{run_number}_{date}_{time}/{cutoff}
    • The README file now contains explicit path to source data and image file template, for easier reference.
    • Documentation below edited to reflect these changes.
  • ICEflow-1.1.0 (autoPROC) - released on 06/12/2024
    • Fixed an issue where both cutoff versions output data with a CC1/2-based resolution cutoff; the I/sig(I)-based cutoff is now enabled.
    • Added a no-cutoff option as documented below.
    • The summary.html files are now copied to the top folder for easier access.
  • ICEflow-1.0.0 (autoPROC) - released on 06/05/2024
    • Initial release, with only autoPROC pipeline enabled.

More information on the software supported by the SSRL-SMB Macromolecular Crystallography division is available on our software webpage.