The COMPARE database & COMPASS tool in overall risk assessment: The overall risk assessment examines many aspects of a new or novel protein in order to understand allergy potential as part of characterizing those proteins for their structure, function, and history of exposure. The primary activities of asking if a protein is an allergen or is it cross-reactive are part the assessment that are based on an algorithmic calculation process at the sequence level (FASTA). Because exposure scenarios, including the degree to which a protein may be exposed (dose level), the sequence level comparison process has the potential to identify the hazard part of a risk assessment.

Bioinformatics for risk assessment of novel foods and feeds: For genetically engineered (GE) crops expressing novel protein(s), allergenic potential is assessed based on bioinformatic calculation of protein sequence similarity to a known allergen. Because IgE binds to a portion of an allergen (epitope) and triggers an allergic response, an inference can be made on the potential of IgE binding to a specific protein based on its amino acid sequence. Currently, using bioinformatic approaches to characterize a new GE crop is required by all regulatory agencies. This is independent from whether or not risks are improbable or less likely compared with traditional breeding.

FASTA: FASTA (pronounced "fast A" where the upper case "A" is long) refers to both a file format and a search/comparison algorithm. When used as a sequence file format a FASTA file has the following construction:
>CAA32835.1 beta-lactoglobulin [Bos taurus]
MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTP
EGDLEILLQKWENDECAQKKIIAEKTKIPAVFKIDALNENKVLVLDTDYKKYLLVCMENSAEPEQSL
VCQCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI

FASTA also refers to an algorithm that takes a test sequence, either nucleotide or amino acid, and searches a corresponding sequence database to identify matching sequences. These sequence matches are then displayed as a sequence alignment. In addition to creating alignments that display exact amino acid matches, FASTA can identify and display matches between chemically similar amino acids and create alignments that contain gaps. The use of the FASTA algorithm has been described in Pearson (2000). Further details can be found here: https://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml.

E-value and Criteria for significance: A small E-value (e.g., 10e−4) indicates a potential biologically relevant similarity in the context of potential allergenic cross-reactivity; large E-values (e.g., >1.0) represent random alignments that do not possess biologically relevant similarity

Sliding window searches process: In a sliding window search, the test sequence is split into shorter overlapping sequences with each shorter sequence being used to search the database. As described in guidance documents, it is suggested that searches for allergens be performed by splitting the test sequence into overlapping 80 amino acid windows and identifying alignments that display greater than 35% identity. This methodology ignores the intrinsic local similarity of alignment algorithms such as FASTA, it discounts the value of aligned similar amino acids and the E-value, and it sets a maximum domain or motif size of 80 amino acids.

Interpretation of bioinformatics results: As noted in the "intended use" section, interpreting identity can provide a basis of understanding whether a protein sequence is indeed a known allergen or a protein sequence with such high levels of identity to an allergen.

A parallel assessment is also supported by the identity assessment based on the knowledge that two different proteins could share sequence to a degree that would allow allergy reactivity to both proteins. This is termed "cross-reactivity". The theory is based on clinical allergy that develops towards a protein (a second allergen) that did not cause the initial sensitization (the first allergen). Interpretation of identity that is assessed for new or unknown proteins attempts to determine whether the new protein is similar enough to an allergen to support cross-reactivity. The Codex (Codex Alimentarius 2009) process provides some baseline guidance for interpretation and is the basis of the shared 35% matching identity over at least 80 amino acids metrics. These metrics are used to assess this potential hazard.

References and suggested readings:

Guidelines

FAO/WHO. (2001) Evaluation of Allergenicity of Genetically Modified Foods. Report of a Joint
FAO/WHO Expert Consultation on Allergenicity of Foods Derived From Biotechnology. Food and Agriculture Organization of the United Nations, January 22−25, 2001, Rome, Italy.


Codex Alimentarius Commission. (2003) Appendix III (Guideline for the conduct of food safety
assessment of foods derived from recombinant-DNA plants) and Appendix IV (Annex on the
assessment of possible allergenicity). In (eds.), Alinorm 03/34: Joint FAO/WHO Food Standard Programme, Codex Alimentarius Commission, Twenty-Fifth Session, Rome, 30 June−5 July, 2003.
Codex Alimentarius Commission, Rome, Italy, pp. 47–60.


Codex Alimentarius Commission (2009). Codex Alimentarius: Foods Derived from Modern
Biotechnology. 2nd ed. Joint FAO/WHO Food Standards Programme. Rome: World Health
Organization. pp1-85. ISBN 978−92−5−105914−2. http://www.fao.org/3/a1554e/a1554e00.htm (last
accessed: 22 April 2019).


Citations (to learn more about the use of bioinformatics for in-silico prediction of allergenic cross-reactivity and about the FASTA algorithm):

Cressman, R., & Ladics, G. S. (2009). Further evaluation of the utility of "sliding window" FASTA in
predicting cross−reactivity with allergenic proteins. Regulatory Toxicology and Pharmacology, 54, S20−S25.

Ladics, G. S., Bannon, G. A., Silvanovich, A., & Cressman, R. F. (2007). Comparison of conventional
FASTA identity searches with the 80 amino acid sliding window FASTA search for the elucidation of
potential identities to known allergens. Molecular Nutrition and Food Research, 51, 985−998.

Mirsky, H. P., Cressman, R. F., & Ladics, G. S. (2013). Comparative assessment of multiple criteria for
the in−silico prediction of allergenic cross-reactivity. Regulatory Toxicology and Pharmacology, 67, 232−239.

Pearson, W. R. (2000). Flexible sequence similarity searching with the FASTA3 program package.
Methods in Molecular Biology, 132, 185−219.

Scheurer, S., Son, D. Y., Boehm, M., et al. (1999). Cross-reactivity and epitope analysis of Pru a 1, the
major cherry allergen. Molecular Immunology, 36, 155−167.

Silvanovich, A., Bannon, G., & McClain, S. (2009). The use of E-scores to determine the quality of
protein alignments. Regulatory Toxicology and Pharmacology, 54, S26−S31.