Calculation of pepitope using MATLAB or Microsoft Excel


The pepitope calculator is a tool that can be used to compute a specific measure of antigenic distance between two strains of influenza and to estimate vaccine effectiveness. It is freely distributed under the GNU General Public License.

Purpose

The pepitope is a variable that measures the antigenic distance between two influenza strains. pepitope shows a better correlation with known vaccine efficacy data than do the variables that are currently being used by the CDC and the WHO. These include ferret assays (pferret) and whole sequence comparisons of the hemagglutinin proteins of the two strains (psequence). These two methods show only modest correlation to known efficacy data. We, the Deem group, have conceived of pepitope (described below) and illustrated how this pepitope shows a better correlation to the efficacy data. The pEpitopeCalculator.mlappinstall or PepitopeCalculator.xls program is designed to facilitate use of pepitope (say, for the design of the annual flu vaccine). The pepitope vaccine efficacy calculator is provided on this website.

Description

The conceptual transition from psequence to pepitope is simple. The idea is that there are some regions of the hemagglutinin protein that are more important than others. These regions are the epitopes, which are the binding sites of human antibodies. Thus it is assumed that point mutations in the amino acids of these epitopes would have a great effect on the ability of the antibodies (made in response to the vaccine strain) to attach to the surface proteins of the circulating strain. It is also assumed that point mutations elsewhere in the strain would have little to no effect on the binding of the antibodies. An additional concept is that of epitope dominance. This is the idea that some epitopes are more important than others and that which epitope is most important can change from year to year. For H3N2/human Influenza, it is assumed that whichever epitope has the greatest percentage of mutations is dominant, because the dominant epitope is under the most pressure from the immune system. For H5N1/avian, the user must know which epitope is dominant.

With those concepts in mind, we define pepitope to be the fractional change between the dominant epitopes of the vaccine and the circulating strains, or as an equation:

pepitope = (number of mutations in the dominant epitope)/(number of amino acids in the dominant epitope)

Excel

The Excel file that can be downloaded below is set up to calculate pepitope for discrepancies in the hemagglutinin proteins of two Influenza Type A viruses, the H3N2 and H5N1 types, respectively. These two hemagglutinin strains have five epitopes (A, B, C, D, E) with A and B usually being dominant in H3N2.

The algorithm used by this program to compute the value of pepitope is as follows. First, the user inputs the two amino acid sequences of the hemagglutinin protein for comparison using one-letter IUPAC abbreviations. The user also inputs the type of flu that is being compared: H3N2/human or H5N1/avian. The program is designed to take input in the form of actual, pasted sequences or in the form of a text file. These raw inputs are saved on the Worksheet "Sequences". Some characters are then deleted from the strains, such as spaces and dashes.

Then, the strains are aligned using a reference strain: A/California/7/2004 (ISDN110647) for H3N2/human and A/Duck/Singapore/3/97 (ISDN49024) for H5N1/avian influenza. If the program encounters major problems aligning the sequences (i.e. - if the percentage of matching amino acids between an input strand and the reference is less than 50%), then an error message will be displayed requesting that the user try again. The most likely reason for this problem is that the hemagglutinin type of the input strain does not match that of the reference. If the alignment of one of the user's strains leaves extra characters before or after the reference strain, these characters are discarded. If the alignment has the user's strain starting after or ending before the reference strain, then these offsets are noted, as they could affect pepitope. The reason for this is that the program is set up to assume that any missing portions of the sequences are perfect matches. (This also includes "?" characters.) An error message is displayed, which differs depending on how much of the sequence is missing from the beginning or end. If the portions are large enough to possibly affect pepitope, the error message states that the sequences are "incomplete". If the portions are not large enough to affect pepitope, the message states that they are "slightly incomplete". These aligned and truncated strains, along with the reference strain, are displayed on the Worksheet "Aligned Sequences".

After the alignment procedure, the two strains are compared and the positions of any discrepancies are recorded. Then, those positions are cross-referenced with the positions of the residues in each epitope, which is contained in the hidden Worksheet "Residues". (The H3 and H5 epitope residues also differ.) On the Worksheet "Discrepancies", the number of mutations in each epitope is output along with the positions of each of those mutations. Then, a p value is calculated for each epitope by dividing the number of mutations in that epitope by the number of amino acids in that epitope. For H3N2/human, by the logic of [1], pepitope is the largest P value that is attained, and the corresponding epitope is deemed dominant. For H5N1/avian, pepitope is the p value for the dominant epitope. You must know which epitope is dominant in the vaccine and challenge strains that you are comparing for H5N1/avian. Since H5N1/avian evolves primarily in birds, in the absence of a vaccine, there is no theory yet to calculate which epitope is dominant. All of this information is output in the Worksheet "Results".

MATLAB


The MATLAB file that can be downloaded below will install an app in MATLAB to calculate pepitope for discrepancies in the hemagglutinin proteins between the vaccine and dominant circulating strains of Influenza Type A H3N2 viruses and to estimate vaccine efficacy. Version 1.0 : Vaccine efficacy is predicted based on the model in [2]. Version 2.0 : Vaccine efficacy is predicted based on the updated model in [3], and contains an option to use data in test-negative design studies from the Centers for Disease Control and Prevention (CDC) United States Vaccine Effectiveness Network or data in studies over the past decade from national influenza surveillance networks across the northern hemisphere (NH). Version 2.1: Also includes the option to calculate vaccine effectiveness. Additional functionality for Influenza Type A H1N1 and Type B will be added to a subsequent version of this app.

On the main "pEpitope Calculator" tab of the app, the user inputs two amino acid sequences of the hemagglutinin protein for comparison using one-letter IUPAC abbreviations. The program is designed to take input in the form of pasted sequences containing valid amino acid characters.

The strains are then aligned using a reference strain: A/Hong Kong/4801/2014 (EPI614406) for A/H3N2 human. If the program encounters major problems aligning the sequences, then an error message will be displayed requesting that the user try again. The most likely reason for this problem is that the hemagglutinin type of the input strain does not match that of the reference. If the alignment of one of the user's strains leaves extra characters before or after the reference strain, these characters are discarded. If the alignment has the user's strain starting after or ending before the reference strain, then these offsets are noted, as they could affect pepitope. An error message is displayed that states that the sequences are "incomplete" if they contain epitope regions.

After the alignment procedure, the two strains are compared and the positions of any discrepancies are recorded. Then, those positions are cross-referenced with the positions of the residues in each epitope. A p value is then calculated for each epitope by dividing the number of mutations in that epitope by the number of amino acids in that epitope. The pepitope is the largest p value that is attained, and the corresponding epitope is deemed dominant.

The GNU General Public License is found on the "GNL" tab.

Download

The older program is written as a VBA script in Microsoft Excel (download PepitopeCalculator.xls). The newer program is written in MATLAB (download pEpitopeCalculator.mlappinstall). For usage instructions, see below.

We ask that you cite our paper [1,2] in any publications that result from the use of the PepitopeCalculator.xls or pEpitopeCalculator.mlappinstall programs.

Usage

You can download a textfile of each sequence of interest from GISAID or the NIH Influenza Virus Resource.

Excel

You must enable macros for this program to run! Once you've enabled macros, a welcome screen will appear. From this screen, you will be able to choose how you want to input your data (copy/paste or text file), access a help file and get the contact information of the creators. After the program has run, or you have exited from it, you can restart it by clicking the "Start Pepitope Calculator" button on the toolbar. (Don't worry, this toolbar removes itself as the excel file shuts down.) Note that the program will not allow you to save, so that any manipulations of the code or of important data in hidden Worksheets will not be saved. Therefore, all pepitope data should be saved in a completely different Workbook/Application, and, if you wish to make changes to the program itself, you must choose the 'Save As...' command.

Let's go back and discuss input. If you would like to paste the sequences, click on that button and copy them from another location. In order to paste them, you must use the shortcut key (Ctrl + V). (Userforms in Excel do not allow you to access the Edit menu or right-click.) Next, make sure that you have selected the correct strain of influenza (H3N2 or H5N1) corresponding to the correct type of hemagglutinin (H3N2 or H5N1). Then, just click the Find Pepitope button and see the results.

If you would like to use a text file, do the following:

MATLAB

Instructions for installing the pEpitope Calculator Matlab app (requires Matlab R2012b or later):
1. Download the pEpitopeCalculator.mlappinstall file to your computer.
2. In Matlab, run this command: appinfo = matlab.apputil.install('pEpitopeCalculator.mlappinstall') OR In Matlab, under the "Apps" tab, select "Install App." Select the pEpitopeCalculator.mlappinstall file for automatic installation.

After installing the app, you can paste strain names and sequences containing valid amino acid characters for Influenza A H3N2 viruses and click "Calculate." The value of pepitope and the dominant epitope are output, along with the positions and amino acids of mutations in each epitope. The predicted vaccine efficacy and standard error are also output. The "Reset" button will clear the form for subsequent calculations. To copy/paste sequences into the program, do the following:

Feedback

PepitopeCalculator.xls and pEpitopeCalculator.mlappinstall are unsupported software.

References

1) V. Gupta, D. J. Earl, and M. W. Deem, ``Quantifying Influenza Vaccine Efficacy and Antigenic Distance,'' Vaccine 24 (2006) 3881-3888 (Reprint).
2) M. E. Bonomo and M. W. Deem, ``Predicting Influenza H3N2 Vaccine Efficacy from Evolution of the Dominant Epitope,'' Clinical Infectious Disease (2018) 10.1093/cid/ciy323 (Reprint).
3) M. E. Bonomo, R. Y. Kim, M. W. Deem, ``Influenza pEpitope Quantifies a Novel Antigenic Distance to Predict Vaccine,'' (2018).