SEARCHING AND IN SILLICO CHARACTERIZATION OF STREPTOMYCES PHAGE ENDOLYSINS AND THEIR CATALYTIC DOMAINS

Endolysins, peptidoglycan hydrolases encoded by bacteriophages, degrade bacterial cell wall and are promising alternative to antibiotics. Howewer, the selection and bioengineering of endolysins needs previous bioinformatic characterization. This study focuses on endolysins encoded by viruses infecting Streptomyces spp., describes Streptomyces phages diversity and association. Set of 143 endolysins were predicted in 175 Streptomyces phage genomes from NCBI database and The Actinobacteriophage Database and were characterized their phylogeny and function domains, especially catalytic domains. Predicted endolysins have amidase (Ami_2 and CHAP domain), muramidase (glycol_hydro_25) and peptidase activity (CHAP, NlpC/P60). This bioinformatic characterization serve as a base to next research in developing of endolysins with new properties in enzybiotik form.


INTRODUCTION
Phages or bacteriophages are specialized viruses able to infect host species belonging to the domain Bacteria. Importance of phages in the treatment of bacterial infections has been recognized in Eastern Europe several decades ago (Voelker, 2019). However, due to an increasing problem with multi-drug resistant bacteria, Western medicine has also begun to pay attention to this approach in recent years. The traditional phage therapy is associated with some risks such as adaptation of viable viruses to target other, closely related strains, transduction of virulence genes, lysogenic conversion of commensal bacteria and bacterial resistance to phages (Valero-Rello, 2019). For this reason, not the whole phage particles but products of their genes, lytic enzymes such as endolysins, represent more useful tool against multiresistant bacteria. Endolysins are also attractive for other fields, for example, in biotechnology for primary industry (Hoopes et al., 2009;Schmelcher et al., 2015); in food industry as the prevention against contamination by foodborne bacterial pathogens ( Endolysins are bacteriophage-encoded enzymes which cleave peptidoglycan (PG) of their host bacterial cell wall at the end of phage lytic cycle (Young, 1992). This degradation results in osmotic shock, cell rupture and often bacteria death (Fischetti, 2008). The lytic mechanism is effective also after exogenous application of purified endolysins Loeffler et al., 2001;Cheng et al., 2007;Schmelcher et al., 2012). The specificity of endolysins (from genus to strain specificity) makes endolysins a potential alternative or additive to current antibiotics as enzybiotics . Depending on the reaction and type of bond in PG being cleaved by certain endolysin, these enzymes can be categorized into several classes: amidases, glucosaminidases, transglycosylases, lysozymes and endopeptidases (Oliveira et al., 2012). Amidases hydrolyze the amide bond connecting MurNAc to the peptide stems; glucosaminidases, transglycosidases and lysozyme cleave the glycosidic bonds in glycan chain and endopeptidases act on the peptide bonds forming the peptide bridge or stem of PG. Generally, endolysins of phages infecting Gram-positive bacteria are modular proteins composed of N-terminally located enzymatic active domain (EAD) and C-terminally located cell wall binding domain (CBD). The domains are connected by short flexible linkers (Fenton et al., 2010;Schmelcher et al., 2012). The EAD is responsible for cleaving a specific bond in PG and usually there is only one EAD present in endolysin structure. The cell wall binding domain is responsible for recognizing specific epitopes on the bacterial cell wall and there can be one or more CBDs present in one endolysin , Oliveira et al., 2012. Lot of endolysins were predicted, but the number of biochemically confirmed endolysins is lower and only few structures are known, especially endolysins infecting Gram-positive bacteria. Our research is devoted to Streptomycetal phage endolysins because Streptomyces spp. are interesting for their high antibiotic production, since 70 % of clinically used antibiotics are from this species and on proteomic level, this genus is similar to Mycobacterium spp., such as Mycobacterium tuberculosis (Smith et al., 2013).This work is focused on searching for new endolysins encoded by Streptomyces phages. In silico analysis performed in this study aims to phage genome mining, identification of genes with endolysin function and especially to characteristics of catalytic domains of these peptidoglycan degrading enzymes. All the information represents a basis for the future use of these enzymes in developing new types of chimeric endolysins and enzybiotics.

Phage genomes
Whole-genome sequences of phages infecting Streptomyces spp. were obtained from the Actinobacteriophage Database (https://phagesdb.org/) and NCBI (https://www.ncbi.nlm.nih.gov/) Genome and Nucleotide databases. Information regarding host species, phage life cycle, taxonomy, and continent where the host streptomyces was isolated, was also collected from these databases. If the data were not accessible, they were searched for in other sources, publications and databases.

Identification of endolysins and their functional domains
Open reading frames were predicted by annotation tools PHASTER (http://phaster.ca/; Arndt et al., 2016;Zhou et al., 2011) and BASys (https://www.basys.ca/; Van Domselaar et al., 2005). If neither of these tools predicted gene for endolysin in phage genomes, Protein BLAST with blastp algorithm (protein-protein BLAST) (https://blast.ncbi.nlm.nih.gov/Blast.cgi) was used for functional prediction of gene products and searching for the endolysin. For function domain was used putative conserved domains prediction in graphic summary of blastp which is linked to CD-search (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi; Marchler-Bauer et al., 2017). CD-search also predicted amino acid residues in catalytic site, which were marked in CLC Sequence viewer.
Endolysins, peptidoglycan hydrolases encoded by bacteriophages, degrade bacterial cell wall and are promising alternative to antibiotics. Howewer, the selection and bioengineering of endolysins needs previous bioinformatic characterization. This study focuses on endolysins encoded by viruses infecting Streptomyces spp., describes Streptomyces phages diversity and association. Set of 143 endolysins were predicted in 175 Streptomyces phage genomes from NCBI database and The Actinobacteriophage Database and were characterized their phylogeny and function domains, especially catalytic domains. Predicted endolysins have amidase (Ami_2 and CHAP domain), muramidase (glycol_hydro_25) and peptidase activity (CHAP, NlpC/P60). This bioinformatic characterization serve as a base to next research in developing of endolysins with new properties in enzybiotik form.

Multiple sequence alignment
For identification of conserved amino acid residues, whole endolysins sequences were aligned by ClustalW (Jeanmougin et al., 1998) integrated in CLC Sequence viewer 8.0. Conserved amino acid residues were identified in aligned sequences of the catalytic domains extracted from whole endolysin sequences. Graphic representations of multiple sequence alignments were performed using WebLogo (weblogo.berkeley.edu/; Crooks et al., 2004) by creating sequence logos for each functional domain type identified in the studied set of endolysin sequences. Conserved amino acid residues predicted in catalytic site were marked in each WebLogo.

Phylogenetic trees
The neighbour-joining algorithm was used to calculate the phylogenetic trees from whole endolysins sequence alignment and alignments of catalytic domains (Saitou and Nei, 1987), applying the Jones-Taylor-Thornton model (Jones et al., 1992) of the amino acid changes, both integrated in CLC Sequence viewer 8.0. Interactive Tree Of Life (iTOL) (Letunic and Bork, 2019) was used to visualize phylogenetic trees.

Protein tertiary structure models
On 3D level, models were created by Phyre2 ab initio modeling (Kelley et al., 2015). 3D models were visualized by PyMOL(TM) 2.0.6, in this program were also made structure alignments and visualization of conserved amino acid residues in active site. To compute the protein charge Protein isoelectric point calculator (http://isoelectric.org/calculate.php) was used at pH 7.4.

Endolysins and their domains
Endolysins, proteins involved in late host lysis, were predicted in 143 genomes from 175 studied genomes and classified into 5 groups according the activity: amidases, muramidases, peptidases, cell wall binding and others (associated with glycosidehydrolyses but with unclear function). Most of the endolysins were predicted with modular domain structure ( Figure 2, Table 2), where on the Nterminus is EAD (Ami_2, CHAP, NLPC/P60, gglyco_hydro_25) and' on Cterminal end is CWBD (LysM, PG_binding_1, CW_7). Fn3 domain was included because this protein is presented in the location near the gene for holine, where endolysin is usually located, its function is still not clear (Valk et al., 2017) and this domain is connected with glycosidehydrolases. Generally, streptomyces actinophage endolysins have the lenght about 300 amino acids (226 AA phiCAM -471 AA NooNoot and Paradiddles). Most of endolysins contain one catalytic and one cell wall binding domain (50) and numerous contain one additional cell wall binding domain (11). In cases where only one domain was predicted, there has been usually space for missing domain (catalytic or cell wall binding, Figure  2). Domains repeats are related with fn3 domain and CW_7, which occur with two and three repeats on the C-terminus, respectively. In phylogenetic tree of Siphoviridae virus endolysins ( Figure 3A) are endolysins with CHAP or NLPC/P60 domain grouped together and on for endolysins from phages Podovoridae family have not been predicted any EAD, only CWBD PG_binding_1 domain. Endolysins from this family all have a negative net charge.      In this case the host or cycle specificity was not recorded, but almost all endolysins in this group are from endolysins belonging to Siphoviridae family and have positive net charge. According to the alignment and CD search domain prediction of all predicted Amidase_2 and PGRP domains was shown two main groups. First group contains Zn 2+ binding residues His6, His123 and Cys131, and catalytic residues Tyr47 and Thr19. His11, His149 and Asp159 are predicted to be zinc binding residues and Cys42, Lys157 are catalytic residues predicted for the second group. Remaining predicted amidase catalytic domains (endolysins from Raleigh, Darolandstone, Ibantik, Immanueal3) contain different amino acid residues or these sequences were predicted without strictly defined ends.

Figure 4
Weblogo of catalytic domains with predicted amidase activity (Amidase_2 and PGRP). Zinc binding residues are marked by red stars, catalytic residues by black dots. A: Domains with predicted Tyr47 and Thr129, as catalytic residues and zinc binding amino acid residues His6, His123, Cys131. B: Domains with predicted zinc binding residues His11, His149 and Asp159 and catalytic residues Cys42 and Lys157. The Amidase_2 domain model (Figure 4) from endolysin phage Aaronocolus, a representative of the first group, Zn 2+ binding residues (two histidines and cysteine) have the same position as in the template (PDB 4ZXM, Branchiostoma belcheri tsingtauense peptidoglycan recognition protein 3). Catalytic residues (tyrosine and tryptophan) are also in the same position but with different rotation.
The model therefore contributes to the accuracy of the prediction within the first group. In the model of Amidase_2 from endolysin phage SV1, a representative of the second group Zn 2+ binding residues (two histidines and aspartic acid) match the template (4BOL.A, AmpDh2 from Pseudomonas aeruginosa complexed with pentapeptide) as well as catalytic the rest is lysine, but the second catalytic residue is not cysteine as in the weblogo. In the model, instead of cysteine, the second catalytic residue is glutamic acid, which is in the same position as Glu106 in the template. Endolysins containing CHAP domain are only from phages of Siphoviridae family ( Figure 3B) except endolysins from StrepC and Sros11 with unknown data. All of them have positive net charge and almost all contain PG_binding_1 domain as cell wall binding domain. For this domain CD-search did not predict conserved catalytic amino acid residues, but alignment with CHAPk domain from LysK shows conserved Cys61 and His130 ( Figure 6). The structure alignment of modelled Microdon endolysin did not show noticeable similarity with its template because of very low sequence identity (generally lower than 30%).

Peptidases
The  The organization of genomes has typical mosaic structure as was described before for streptomyces viruses (Smith et al., 2013), but in case of larger genomes, the mosaic structure is different. These genomes contain lot of insertions, non-typical proteins for this type of viruses and also predicted endolysins have not typical position after tail proteins and contain numerous types enzymes containing domains related to endolysins. So, it was harder to predict which enzyme is a typical late endolysin. Similar mosaic structure and anomalies were found in Arthrobacter viruses (Klyczek et al., 2017). For most of endolysins was predicted domain structure ( Figure 1, Table 2)  Predicted endolysins with FN3 CBD domain are also localized in this area, so it is possible that they contain modified Amidase_2 or PGRP domain. Amidase_2 have Zn 2+ depending N-acetylmuramoyl-L-alanine amidase activity. In comparison, the two groups were clearly visible after marking the predicted amino acid residues located in the active site according to CD-search. The first contained as Zn 2+ binding residues His6, His123 and Cys131 (positions in the weblogo) and catalytic residues Tyr47 and Thr19. In the second group, His11, His149, Asp159 and Cys42, Lys157 binding residues were predicted as Zn 2+ binding residues. Exceptions to these two groups were the amidase domains of endolysins from the Raleigh, Darolandstone, Ibantik and Immanueal3 phages for which other catalytic residues were predicted in the active site, or the domain boundaries were not clearly predicted for these sequences. The zinc-binding residues of the first group, two histidines and cysteine, coincide with already structurally confirmed amino acid residues, where zinc is coordinated by these residues in a tetrahedral formula (Gu et al., 2014). The second group, in which cysteine is replaced by aspartic acid, is the conformation that is possible (Zoll et al., 2010), but as mentioned above, has not yet been identified (Love et al., 2020). The second group would therefore require experimental verification. As for the catalytic residues, they only partially coincide with the above, glutamic acid and lysine, where it is believed that the water molecule as a nucleophile is activated by glutamic acid and lysine stabilizes the intermediate of the reaction, but instead of lysine there may be threonine at this position (Low et al., 2005). Thus, the alternation of lysine and threonine is identical to said endolysins. Such replacement has also been performed for bacteriophage T7 endolysin (Cheng et al., 1994), which contains the catalytic residues Tyr46 and Lys128, which could confirm the predicted residues within the first group. When we look on models, the representant of the first group have in the same position as template zinc binding residues, two His and Cys and catalytic residues, Tyr and Trp. However, the representant of second group although have conserved zinc binding residues, two His and Asp are same as the template, but the second catalytic residue is not Cys. In the model, instead of Cys, the second catalytic residue is Glu, which is in the same position as Glu106 in the template. When we look back at the weblogo, we can see that glutamic acid is also fully preserved, so it is necessary to use experiments to verify which prediction is correct. In the past the only fully characterized endolysin was from actinophage Mu1/6 with amidase activity (Farkašovská et al., 2016). CHAP domain according to CD-search belongs to NLPC/P60 superfamily (Marchler-Bauer et al., 2017). In phylogenetic tree are located together and this cluster contains also endolysins with predicted CW_7 CBD. When we look at endolysin alignment part of phylogenetic tree from Peebs to phiBT1 we can see some similar patterns on N-terminus. The CHAP domain, cysteine histidinedependent amidohydrolase / peptidase, contains about 120 amino acids with a unique potential to acquire two different enzymatic activities, amidohydrolase or peptidase (Bateman and Rawlings, 2003). As a peptidase, it cleaves the bonds between D-alanine and the first glycine in the pentaglycine bridge, and as an amidase, it cleaves the bond between the N-acetylmuramic acid residue and Lalanine at the N-terminus of the peptide (Bateman and Rawlings, 2003; Ridgen et al., 2003). Thus, proteins containing a CHAP domain can acquire only peptidase activity (e.g., LysK) or only amidase activity (e.g., Sk1), or both enzymatic activities (e.g., LytN) (Vermassen et al., 2019). Cysteine, Histidine-dependent Amidohydrolases/Peptidase (CHAP) superfamily is involved in cell wall hydrolysis (Xu et al., 2009). This superfamily contains two families according to CD-search, the CHAP family (PF05257) and the NLPC/P60 family (PF00877). In InterPro they belong also in Papain-like cysteine peptidase superfamily IPR038765. Both domains contain strictly conserved cysteine and histidine residues and have the length about 110-140 amino acid residues. The NlpC/P60 or CHAP domains are widespread in bacteria and members of this superfamily have also been detected in bacteriophages, viruses, archaea and eukaryotes (Anantharaman and Aravind, 2003). NlpC/P60 domain forms multifunctional proteins with other components, such as LysM, SH3 and choline-binding domains (Smith et al., 2000). In Pfam database we can find almost 300 different architecture of this domain. Glyco_hydro_25, glycoside hydrolase family 25, glycoside hydrolases EC 3.2.1 are a widespread group of enzymes that hydrolyze the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety (https://pfam.xfam.org/family/PF01183). In CAZY database this family comprises enzymes with only lysozyme activity (https://www.cazypedia.org/index.php/Glycoside_Hydrolase_Family_25). In this study was Glyco_hydro predicted only for one endolysin encoded by Shyg phage and according to modelling takes on similar conformation as lysozyme from Streptomyces coelicolor. The results of this study show the diversity of Streptmyces phages and their endolysins relations with host specificity, and origin. The in sillico characterization of domain, primary and tertiary structure spreads knowledge about these endolysins and this information can be used for developing new types of endolysins, enzybiotics by domain shuffling, deletion, target site mutations. Addition of whole domains or parts, e. g. amphipathic regions from endolysins coded by different phages can result in new chimeric endolysins with novel properties that can be either extended or species-specific, strain-specific, or serovar-specific.