CCWiffer User Manual
by Xiaoshu Wang, Jonas S. Almeida
Introduction
What is CCWiffer
CCWiffer is a short name for "Charleston Core Wiff Converter". It is a software designed to help proteomic users converting Mass Spectrometry (MS) data stored in proprietary Wiff format into a community standard XML-based format.
Supported Formats
CCWiffer currently supports both mzXML and mzData. The supported version of mzXML is 2.1, which has a namespace of "http://sashimi.sourceforge.net/schema_revision/mzXML_2.1". The supported version of mzData is 1.5.
Features
- Multiple files selection.
- Scan selection filters.
- Support both mzXML and mzData
- Support common XML character encodings.
Requirements
Please note, because CCWiffer uses the software library of the Analyst QS, the CCWiffer can only be used on a machine where Analyst QS is installed!
User References
Main Windows
The main window is where you select wiff files for conversion, choose the schema (i.e., mzXML or mzData), decide the character encoding, and output file names. See Figure 1
Figure 1: The Main Window
Figure Legends:
- Add a wiff file to conversion list. A wiff file can also be dragged from other window location to the list.
- Remove a wiff file from conversion list.
- Memory Option Window
- List of selected wiff files to be converted.
- Metadata Window
- Dropdown list of XML schemas, to which the wiff files will be converted to.
- Select the character encoding the XML file will be used to write.
- Select if "indent" is used when writing the XML content.
- Select the folder where the XML files will be saved to.
- Select the naming patterns of the XML files. Please note, if the file names are the same, the latter converted file will overwrite previously converted files.
- Start conversion process.
Metadata Window
Clicking on "Edit Metadata" will popup a Window Form for editing the metadata about the MS experiment that the specified wiff file represents. The metadata is divided into six sections, Selection, General, Instrument, Data Processing, Spotting and Scan.
Selection Tab
The selection tab has three functional areas. See Figure 2.
Figure 2Selection Tab
1. File Information Window
The top portion of the Selection Tab display the general file information about the MS experiment. According to Analyst QS document, each wiff file can contain experiments run on multiple samples. Each sample can contain multiple periods, each of which is run on multiple cycles with a number of experiments (scans) on each cycle. All these information is displayed in the window.
2. Heirarchical Order of MS/MS scan
This is a check box to indicate if the XML file should order the scan element according its precursor-product heirarchy.
Second, according to mzXML specification, if both precursor scan and product scan are encoded, the product scan should be written as a child element of the precursor scan. But this requirement is redundant because the precursor scan can be indicated from the optional "precursorScanNum" attribute of <precursor> element for the product scan. In addition, I think due to the file size of mzXML, all program will use, more likely than not, sequential access such as SAX to access the file. Hence, practically, it is not much useful to use document structure to implicate relationship. This is the reason that I leave it as an optional feature.
3. Selection Tab
The selection section allows user to customize which scan goes into the converted XML file.
Each scan filter is composed of mulitple selection sentence. Each selection sentence is composed with an operator followed with a filter. The "Operater" can be either "+" (means "include") or "-" (implies "exclude") and a missing operator is the same as "+". A filter is enclosed within a pair of parenthesis (). If only one filter is used, parenthesis is optional.
Each filter is composed of four list in the format of following:
(sample # list; period # list ; cycle # list; experiment # list)
An emplty list implies "all". Therefore, ( ; ; ; ) implies select all scans in the Wiff file.
With each list, items can be separatedly listed by ",". Each list item can be either a number or a range. A range is separated by "-".
For instance, to select all scans of cycles 3 and cycle 50 to 60 from the 1st period of second sample, the following filter can be used.
(2 ; 1 ; 3, 50-60; )
To select all but the 7th cycle of the first sample first period, the following filters can be used.
(1 ; 1 ; ; ) - (1 ; 1; 7 ; )
Similar principles can be applied to combine filters. But please note the sequence of filter makes difference. If the above example filter order is reversed. It will end up selecting all cycles of the first sample and first period
General Tab
General Tab allows user to enter the general administrative information along with the infomration regarding the parent files. See Figure 3.
Figure 3: General Tab.
The "Calculate Sha1" button can calculate a file stored locally for you. But also note, Sha1 digest is not needed for mzData.
Instrument Tab
The Instrument Tab allows user to enter information regarding the instrument and manufacture software that is used to generate the data. See Figure 4.
Figure 4: Instrument Tab.
Data Processing Tab
Data processing tab allows user to enter information regarding the data processing that has been applied to the data stored in the wiff file.
More than one data processing element can be added. And there is always a default one to indicate the current mzXML or mzData is converted by the CCWiffer information. See Figure 5.
Figure 5: Data Processing Tab
If additional information needed to be provided other than provided options, such as if the data has been centroided, deisotoped, charge deconvoluted and spot integrated, it can be entered as a Name-Value-Type constructs by clicking on the button of "Processing Operation". See Figure 6.
Figure 6: Name Value Type entries
There is at least one data processing tab to indicate that the current XML file is converted by CCWiffer. On this default file tab, three options are provided on data processing operations - Centroid Height, Merge Distance and Mass Tolerance. The value of these three options are used internally to locate the precursor Mass and Intensity.
If user intended to keep the converted XML smaller by filtering out spikes or using centroid algorithm. User can check the Use built-in algorithm box. Checking this box will allow use to set the intensity cutoff value to filter noise. Or choose the Centroided dropdown list to "true" and then click on "Processing Operation" button to specify the value of "Centroid Height" and "Merge Distance" for the built in centroid algorithm.
Spotting Tab
The Spotting Tab allows user to enter the spotting information related to a MALDI experiment. See Figure 7.
Figure 7: Spotting Tab
Scan Tab
Scan tab allows user to select which attribute about the scan user would like to put into the XML file. See Figure 8.
Figure 8: Scan Tab
Memory Option Form
The Memory Option allows user to select the extension of temporary files. See figure 9.
Figure 9: Memory Options
Two temporary files will be created during the conversion process. The temporary files will be located in the same directory as the targeted XML file. The file name will be generated by appending an extension to name of XML file. So please choose an apropriate file extension to avoid potential naming conflict.
The size of maximum buffer size can modestly improve the speed of program. In general the larger the capcity, the faster the program run. Using more memorier, however, will affect the overall responsiveness of the computer. The maximum temporary storage is set to 200 MB. This is not due to the limitation of computer hardware but due to the limitation of some VB6 libraries.
Conversion Process
After user click the "Convert" button, the CCWiffer will convert all files in the selected list into the XML file according to the specified schema. On the top right portion of the program, a report will be shown to show the status of conversion for each file. In the bottom portion, it will show the conversion progress of the file that is being converted. Use can cancel the conversion at any time. See Figure 10.
Figure 10: Converting Process
