1. A computer implemented apparatus for searching for a search character string in a source character string database, the search and source character strings each including a prefix P, a base B and a suffix S segment, comprising:
a means for stripping from an inputted search character string, a predefined group of non-alphanumeric of characters to generate a converted search character string;
a means for generating a first group of search tokens, comprising a prescribed number of \u2018search\u2019 base tokens from the converted search character string, wherein the \u2018search\u2019 base tokens serve as search tokens, and wherein
the number of \u2018search\u2019 base tokens in the first group of tokens is at least equal to the number of characters in a prefix segment of the converted search character string,
each \u2018search\u2019 base token has a fixed base token length, wherein the fixed base token length is determined by a prescribed number of contiguous characters in the converted search character string and
the first \u2018search\u2019 base token begins at a first character position of the converted search character string and each subsequent \u2018search\u2019 base token begins at an adjacent sequential character position of the converted search character string;
a means for generating a source token database from a converted source character string generated from a source character string, and storing in a memory location of a computer, said generating means further generating a predetermined number of \u2018source\u2019 base tokens, wherein the \u2018source\u2019 base tokens serve as source tokens, and wherein
the number of \u2018source\u2019 base tokens is at least equal to the number of characters in a prefix segment of the converted source character string,
the number of the \u2018source\u2019 base tokens is substantially the same as the number of the \u2018search\u2019 base tokens,
each \u2018source\u2019 base token has said fixed base token length, wherein the fixed base token length is determined by the prescribed number of contiguous characters in the converted source character string, and
the first \u2018source\u2019 base token begins at a first character position of the converted source character string and each subsequent \u2018source\u2019 base token begins at an adjacent sequential character position of the converted source character string;
a means for searching the source token database using each of the \u2018search\u2019 base tokens and identifying one or more \u2018source\u2019 base tokens that match each of the \u2018search\u2019 base tokens, wherein for each match found, the searching means also identifies a character position in the converted search character string at which the match occurs; and
a means for outputting to a user, a list of one or more \u2018source\u2019 base tokens that match the \u2018search\u2019 base tokens and the corresponding character positions at which the match occurs.
2. The apparatus of claim 1 operable on the converted search character string when least one \u2018source\u2019 base token that matches a \u2018search\u2019 base token is identified, the apparatus further comprising:
a second means for generating one or more predetermined number of second groups of search tokens, wherein the number of second groups is less than or equal to the ratio of total number of characters in the converted search character string to the number of \u2018search\u2019 base tokens, wherein
each second group the converted search character includes a prescribed number of \u2018search\u2019 non-base tokens generated from string,
each \u2018search\u2019 non-base token having a fixed non-base token length the fixed non-base token length is determined by a prescribed number of contiguous characters which is less than the base token length,
each \u2018search\u2019 non-base token begins at a different adjacent sequential character position that does not include the base token length sequential character positions of the converted search character string used by the \u2018search\u2019 base tokens;
a means for generating in the source token database, one or more second groups of source tokens, such that the number of second groups is less than or equal to the ratio of total number of characters in the converted search character string to the number of \u2018search\u2019 base tokens, wherein
each second group includes a prescribed number of \u2018source\u2019 non-base tokens generated from the converted source character string,
each \u2018source\u2019 non-base token has said fixed non-base token length wherein the fixed non-base token length is determined by the prescribed number of contiguous characters in the converted source character string,
each \u2018source\u2019 non-base token begins at a different adjacent sequential character position that does not include the base token length sequential character positions of the converted source character string used by the \u2018source\u2019 base tokens; and
wherein the searching means also searches the source token database using each of the \u2018search\u2019 non-base tokens and identifies one or more \u2018source\u2019 non-base tokens that match each of the \u2018search\u2019 non-base tokens and combines the number of one or more \u2018source\u2019 non-base tokens matches with the number of one or more \u2018source\u2019 base tokens matches, to determine the one or more source character strings that produces the highest number of character position matches with the search character string; and
wherein outputting means outputs to a user, a list of one or more source character string that produced the highest number of character position matches with the search character string.
3. The apparatus of claim 1 wherein the searching means uses an algorithm selected from a set of algorithms which
(a) count a match for each search token starting position without double counting;
(b) count every character position that is part of a search token that produces a match; and
(c) sum the starting character positions of each search token that produced a match.
4. A computer implemented method for searching for a search character string in a source character string database, the search and source character strings each including a prefix P, a base B and a suffix S segment, the method comprising the steps of:
generating from the source character string, a converted source character string through the effect of removing a predetermined group of non-alphanumeric characters;
storing in a source token database in a memory location of a computer, a prescribed number, of \u2018source\u2019 base tokens generated from the converted source character string wherein the source base tokens serve as source tokens, and wherein
the number of \u2018source\u2019 base tokens is at least equal to the number of characters in a prefix segment of the converted source character string,
each \u2018source\u2019 base token has a fixed base token length wherein the fixed base token length is determined by a prescribed number of contiguous characters in the converted source character string, and
the first \u2018source\u2019 base token begins at a first character position of the converted source character string and each subsequent \u2018source\u2019 base token begins at an adjacent sequential character position of the converted source character string;
stripping from an inputted search character string, a predefined group of non-alphanumeric characters to generate a converted search character string;
generating at least a first group, of tokens comprising a prescribed number of \u2018search\u2019 base tokens from the converted search character string, wherein the \u2018search\u2019 base tokens serve as search tokens, and wherein
the number of \u2018search\u2019 base tokens in the first group of tokens is at least equal to the number of characters in a prefix segment of the converted search character string,
the number of the \u2018search\u2019 base tokens being substantially equal to the number of the \u2018source\u2019 base tokens,
each \u2018search\u2019 base token has the fixed base token length wherein the fixed base token length is determined by a prescribed number of contiguous characters, in the converted search character string, and
the first \u2018search\u2019 base token begins at a first character position of the converted search character string and each subsequent \u2018search\u2019 base token beings at an adjacent sequential character position of the converted search character string;
searching for each of the \u2018search\u2019 base tokens in the token source database and identifying one or more \u2018source\u2019 base tokens that match each of the \u2018search\u2019 base tokens wherein for each match found, the searching step also identifies a character position in the converted search character string at which the match occurs; and
outputting to a user, a list of one or more \u2018source\u2019 base tokens that match the \u2018search\u2019base tokens and the corresponding character positions at which the match occurs.
5. The method of claim 4 operable on the converted search character string when at least one \u2018source\u2019 base token that matches a \u2018search\u2019 base token is identified, the method further comprising the steps of:
generating one or more predetermined number of second groups of search tokens, wherein the number of second groups is less than or equal to the ratio of total number of characters in the converted search character string to the number of search base tokens, each second group of search tokens further including;
a prescribed number of \u2018search\u2019 non-base tokens generated from the converted search character string, wherein
each \u2018search\u2019 non-base token has a fixed non-base token length wherein the non-base token length is determined by a prescribed number of contiguous characters within the converted search character string,
the non-base token length is equal to or less than the base token length, and
each \u2018search\u2019 non-base token begins at a different adjacent sequential character position that does not include the base token length sequential character positions of the converted search character string used by the \u2018search\u2019 base tokens;
storing in the source token database, one or more predetermined number of second groups of \u2018source\u2019 tokens, wherein the number of second groups is less than or equal to the ratio of total number of characters in the converted search character string to the number of search base tokens, each second group of source tokens further including;
a prescribed of \u2018source\u2019 non-base tokens generated from the converted source character string, wherein
each \u2018source\u2019 non-base token said has the fixed non-base token length wherein the non-base token length is determined by a prescribed number of contiguous characters within the converted source character string, and
each \u2018source\u2019 non-base token begins at a different adjacent sequential character position that does not include the base token length sequential character positions of the converted source character string used by the \u2018source\u2019 base tokens;
the searching step includes searching the source token database using each of the \u2018search\u2019 non-base tokens and identifying one or more \u2018search\u2019 non-base tokens that match each of the \u2018search\u2019 non-base tokens; and
combining the number of one or more \u2018search\u2019 non-base tokens matches with the number of one or more \u2018search\u2019 base tokens matches, and
determining the one or more source character strings that produces the highest number of character position matches with the search character string;
such that the outputting step outputs to a user, a list of one or more source character string that produced the highest number of character position matches with the search character string.
6. The method of claim 4 wherein the matching step uses an algorithm selected from a set of algorithms for;
(a) counting a match for each search token starting position without double counting;
(b) counting every character position that is part of a search token that produces a match; and
(c) summing the starting character positions of each search token that produced a match.
The claims below are in addition to those above.
All refrences to claim(s) which appear below refer to the numbering after this setence.
1.-62. (canceled)
63. A method for identifying a protein in a sample comprising a plurality of proteins, the method comprising:
providing peptides derived from fragmentation of proteins in a sample comprising a plurality of proteins, wherein at least one peptide derived from the protein to be identified comprises at least one affinity ligand;
contacting the peptides with a capture moiety to select peptides comprising the affinity ligand;
fractionating the selected peptides to yield a plurality of peptide fractions;
subjecting the peptides in at least one peptide fraction to mass spectrometric analysis to detect at least one peptide derived from the protein to be identified; and
identifying the protein from which the detected peptide was derived.
64. The method of claim 63 wherein the detected peptide is a signature peptide of the protein to be identified, the method further comprising determining the mass of the signature peptide and using the mass of the signature peptide to identify the protein from which the detected peptide was derived.
65. The method of claim 63 further comprising determining the amino acid sequence of the detected peptide and using the amino acid sequence of the detected peptide to identify the protein from which the detected peptide was derived.
66. The method of claim 63 further comprising, prior to contacting the peptides with the capture moiety, covalently attaching at least one affinity ligand to at least one peptide derived from the fragmentation of the proteins.
67. The method of claim 63 further comprising, prior to fragmenting the proteins, covalently attaching at least one affinity ligand to at least one protein in the sample.
68. The method of claim 63 further comprising reducing and alkylating the proteins with an alkylating agent prior to fragmenting the proteins.
69. The method of claim 68 wherein the at least one affinity ligand is covalently attached to the alkylating agent.
70. The method of claim 63 wherein the at least one affinity ligand is covalently attached to an amino acid of the peptide selected from the group consisting of cysteine, tyrosine, tryptophan, histidine and methionine.
71. The method of claim 63 wherein the affinity ligand comprises a moiety selected from the group consisting of a peptide antigen, a polyhistidine, a biotin, a dinitrophenol, an oligonucleotide and a peptide nucleic acid.
72. The method of claim 63 wherein at least one peptide comprises an endogenous affinity ligand.
73. The method of claim 72 wherein the endogenous affinity ligand comprises a phosphate group or a carbohydrate.
74. The method of claim 73 wherein the endogenous affinity ligand comprises a phosphate group, and wherein contacting the peptides with a capture moiety comprises contacting the peptides at acidic pH with a cationic support surface.
75. The method of claim 72 wherein the endogenous affinity ligand comprises a cysteine or a histidine.
76. The method of claim 72 wherein the endogenous affinity ligand comprises an antigenic amino acid sequence.
77. The method of claim 63 further comprising attaching a plurality of affinity ligands, each to at least one protein or peptide, and contacting the peptides with a plurality of capture moieties to select peptides comprising at least one affinity ligand.
78. The method of claim 63 further comprising fragmenting the proteins in the sample to yield the peptides.
79. The method of claim 78 wherein the proteins are fragmented using an enzyme selected from the group consisting of trypsin, chymotrypsin, gluc-C, endo lys-C, pepsin, papain, proteinase K, carboxypeptidase, calpain and subtilisin.
80. The method of claim 63 wherein fractionating the selected peptides comprises subjecting the selected peptides to at least one separation technique selected from the group consisting of reversed phase chromatography, ion exchange chromatography, hydrophobic interaction chromatography, size exclusion chromatography, capillary gel electrophoresis, capillary zone electrophoresis and capillary electrochromatography, capillary isoelectric focusing, immobilized metal affinity chromatography and affinity electrophoresis.
81. The method of claim 63 wherein the sample comprises at least about 100 proteins.
82. The method of claim 63 wherein using the mass of the signature peptide to identify the protein from which the signature peptide was derived comprises comparing the mass of the signature peptide with the masses of reference peptides derived from putative proteolytic cleavage of a plurality of reference proteins in a database, wherein at least one reference peptide comprises at least one affinity ligand.
83. The method of claim 82 wherein peptides derived from fragmentation of the plurality of reference proteins are, prior to comparing the mass of the signature peptide with the masses of the reference peptides, computationally selected to exclude reference peptides that do not contain an amino acid upon which the affinity selection is based.
84.-99. (canceled)