CC

CCWiffer User Manual

by Xiaoshu Wang, Jonas S. Almeida

Introduction

What is CCWiffer

CCWiffer is a short name for "Charleston Core Wiff Converter". It is a software designed to help proteomic users converting Mass Spectrometry (MS) data stored in proprietary Wiff format into a community standard XML-based format.

Supported Formats

CCWiffer currently supports both mzXML and mzData. The supported version of mzXML is 2.1, which has a namespace of "http://sashimi.sourceforge.net/schema_revision/mzXML_2.1". The supported version of mzData is 1.5.

Note
The mzData schema (retrieved from "http://psidev.sourceforge.net/ms/xml/mzdata/mzdata.xsd" at March, 21, 2005 doesn't have a targeted namespace defined. Without an XML namespace, it is extremely difficult to define an open data standard because there is no way to refer to the standard, nor its version. We therefore created a namespace URI "http://www.charlestoncore.org/docs/ccwiffer/mzData_1.5" for the purpose.

Features

  • Multiple files selection.
  • Scan selection filters.
  • Support both mzXML and mzData
  • Support common XML character encodings.

Requirements

Please note, because CCWiffer uses the software library of the Analyst QS, the CCWiffer can only be used on a machine where Analyst QS is installed!

User References

Main Windows

The main window is where you select wiff files for conversion, choose the schema (i.e., mzXML or mzData), decide the character encoding, and output file names. See Figure 1

Main Window

Figure 1: The Main Window

Figure Legends:

  1. Add a wiff file to conversion list. A wiff file can also be dragged from other window location to the list.
  2. Remove a wiff file from conversion list.
  3. Memory Option Window
  4. List of selected wiff files to be converted.
  5. Metadata Window
  6. Dropdown list of XML schemas, to which the wiff files will be converted to.
  7. Select the character encoding the XML file will be used to write.
  8. Select if "indent" is used when writing the XML content.
  9. Select the folder where the XML files will be saved to.
  10. Select the naming patterns of the XML files. Please note, if the file names are the same, the latter converted file will overwrite previously converted files.
  11. Start conversion process.

Metadata Window

Clicking on "Edit Metadata" will popup a Window Form for editing the metadata about the MS experiment that the specified wiff file represents. The metadata is divided into six sections, Selection, General, Instrument, Data Processing, Spotting and Scan.

Selection Tab

The selection tab has three functional areas. See Figure 2.

Selection Metadata Window

Figure 2Selection Tab

1. File Information Window

The top portion of the Selection Tab display the general file information about the MS experiment. According to Analyst QS document, each wiff file can contain experiments run on multiple samples. Each sample can contain multiple periods, each of which is run on multiple cycles with a number of experiments (scans) on each cycle. All these information is displayed in the window.

2. Heirarchical Order of MS/MS scan

This is a check box to indicate if the XML file should order the scan element according its precursor-product heirarchy.

Note
Two things to be noted. First, this option is only applicable to mzXML.
Second, according to mzXML specification, if both precursor scan and product scan are encoded, the product scan should be written as a child element of the precursor scan. But this requirement is redundant because the precursor scan can be indicated from the optional "precursorScanNum" attribute of <precursor> element for the product scan. In addition, I think due to the file size of mzXML, all program will use, more likely than not, sequential access such as SAX to access the file. Hence, practically, it is not much useful to use document structure to implicate relationship. This is the reason that I leave it as an optional feature.
3. Selection Tab

The selection section allows user to customize which scan goes into the converted XML file.

Each scan filter is composed of mulitple selection sentence. Each selection sentence is composed with an operator followed with a filter. The "Operater" can be either "+" (means "include") or "-" (implies "exclude") and a missing operator is the same as "+". A filter is enclosed within a pair of parenthesis (). If only one filter is used, parenthesis is optional.

Each filter is composed of four list in the format of following:

(sample # list; period # list ; cycle # list; experiment # list)

An emplty list implies "all". Therefore, ( ; ; ; ) implies select all scans in the Wiff file.

With each list, items can be separatedly listed by ",". Each list item can be either a number or a range. A range is separated by "-".

For instance, to select all scans of cycles 3 and cycle 50 to 60 from the 1st period of second sample, the following filter can be used.

(2 ; 1 ; 3, 50-60; )

To select all but the 7th cycle of the first sample first period, the following filters can be used.

(1 ; 1 ; ; ) - (1 ; 1; 7 ; )

Similar principles can be applied to combine filters. But please note the sequence of filter makes difference. If the above example filter order is reversed. It will end up selecting all cycles of the first sample and first period

Note
All list index starts from 1. In Analyst's VB code, smaple index starts from 1 but period, cycle, experiment starts from 0.

General Tab

General Tab allows user to enter the general administrative information along with the infomration regarding the parent files. See Figure 3.

General Tab

Figure 3: General Tab.

Note
The general information for sample and contacts are not specified in mzXML. The information, if entered will be entered as a special kind of data processing element. This is one of the unfortunate restrictions of XML. To avoid breaking the compliance to the schema, we have to shoehorn data into a place holder.

The "Calculate Sha1" button can calculate a file stored locally for you. But also note, Sha1 digest is not needed for mzData.

Instrument Tab

The Instrument Tab allows user to enter information regarding the instrument and manufacture software that is used to generate the data. See Figure 4.

Instrument Tab

Figure 4: Instrument Tab.

Data Processing Tab

Data processing tab allows user to enter information regarding the data processing that has been applied to the data stored in the wiff file.

More than one data processing element can be added. And there is always a default one to indicate the current mzXML or mzData is converted by the CCWiffer information. See Figure 5.

Data Processing Tab

Figure 5: Data Processing Tab

If additional information needed to be provided other than provided options, such as if the data has been centroided, deisotoped, charge deconvoluted and spot integrated, it can be entered as a Name-Value-Type constructs by clicking on the button of "Processing Operation". See Figure 6.

NVT

Figure 6: Name Value Type entries

There is at least one data processing tab to indicate that the current XML file is converted by CCWiffer. On this default file tab, three options are provided on data processing operations - Centroid Height, Merge Distance and Mass Tolerance. The value of these three options are used internally to locate the precursor Mass and Intensity.

If user intended to keep the converted XML smaller by filtering out spikes or using centroid algorithm. User can check the Use built-in algorithm box. Checking this box will allow use to set the intensity cutoff value to filter noise. Or choose the Centroided dropdown list to "true" and then click on "Processing Operation" button to specify the value of "Centroid Height" and "Merge Distance" for the built in centroid algorithm.

Spotting Tab

The Spotting Tab allows user to enter the spotting information related to a MALDI experiment. See Figure 7.

Spotting Tab

Figure 7: Spotting Tab

Scan Tab

Scan tab allows user to select which attribute about the scan user would like to put into the XML file. See Figure 8.

Scan Tab

Figure 8: Scan Tab

Note
The Pair Order attribute doesn't apply to mzData. And the "network" order is the same as "small" endian for mzData.

Memory Option Form

The Memory Option allows user to select the extension of temporary files. See figure 9.

Memory Option

Figure 9: Memory Options

Two temporary files will be created during the conversion process. The temporary files will be located in the same directory as the targeted XML file. The file name will be generated by appending an extension to name of XML file. So please choose an apropriate file extension to avoid potential naming conflict.

The size of maximum buffer size can modestly improve the speed of program. In general the larger the capcity, the faster the program run. Using more memorier, however, will affect the overall responsiveness of the computer. The maximum temporary storage is set to 200 MB. This is not due to the limitation of computer hardware but due to the limitation of some VB6 libraries.

Note
These options are not essential for the conversion process. But we decided to give users an option because if user have some important files with similar extension, it will be erraised during the conversion process.

Conversion Process

After user click the "Convert" button, the CCWiffer will convert all files in the selected list into the XML file according to the specified schema. On the top right portion of the program, a report will be shown to show the status of conversion for each file. In the bottom portion, it will show the conversion progress of the file that is being converted. Use can cancel the conversion at any time. See Figure 10.

Converting

Figure 10: Converting Process