trade of animals and animal products, causing significant
economic loss to the industry as a whole [4]. This
discouraging scenario urgently demands short-term but
effective tools for controlling the spread of mycobacterial
infections as quickly as possible.
In this review, a comparative genomics analysis of the
protease-coding genes predicted in the sequenced myco-
bacterial genomes was performed. The idea was to generate
a useful data for those working on proteases. We focused
on proteases based on their critical roles in bacterial
pathogenesis [5], and the absent of a detail comparison of
protease-coding genes across mycobacterial genomes.
Moreover, proteases constitute potential good targets for
the development of novel anti-mycobacterial drugs, an
urgent need in the mycobacterial field.
So far the entire genome sequences of M. leprae [6],
M. tuberculosis [7,8], M. bovis [9],andM. avium paratuber-
culosis str. K10 [10] have been deciphered, and of another
11 are at various stages of completion [11]. By comparing
general genome features, the most striking feature found
was the extensive deletion and inactivation of genes
observed in the leprosy bacillus chromosome, a process
termed reductive evolution [12]. Since diverging from the
last common mycobacterial ancestor, the leprosy bacillus
may have lost over 2000 genes, defining the minimal gene
set for a pathogenic mycobacteria [13]. Thus, the precise
analysis of the M. leprae protease-coding genes could
provide clues about the group of essential genes necessary
for intr acellular survival and disease outcome.
A comparative analysis of the protease-coding genes was
performed using the Basic Local Alignment Search Tool
(BLAST) [14]. M. leprae, M. tuberculosis H37Rv, and
M. bovis AF2122 CDS were predicted and annotated by
The Welcome Trust Sanger Institute [15]. The sequence of
M. avium paratuberculosis str. K10 chromosome has been
completed by the University of Minnesota [16]. A search was
initiated at the mycobacterial databases of the Pasteur
Institute (Paris, France) [17] by way of protease, peptidase,
and hydrolase inputs. Databases were constructed with these
output results, which were used as templates to identify the
homologues present in M. avium paratuberculosis str. K10
genome using the BLASTP program in conjunction with the
BLOSUM-62 weight matrix. Validation of the results was
based on predictors of the protein domain/motif Pfam,
InterPro, and Prosite. Gene products were considered
homologs when their identity was equal or over 60%.
We examined closely the genome of M. leprae and a
review of the genes annotated as proteases was performed.
Besides the 32 protease genes originally grouped into the
functional sub-class II.B.3—proteins, peptides and glyco-
peptides, 12 additional protease genes were found. These
genes were ML0222 or ftsH, ML0691 or dacB1, ML1199
or lspA, ML1 339, ML1582, ML1612 or lepB, ML16 32,
ML1633, ML2278 or htpX, ML2295, ML2490 or clpB
and ML2704. Additionally, an update of gene nomination
was performed based on the nomenclature used for
M. tuberculosis, as followed: ML0041—mycP1, M L0864—
pepB, ML0997— hflX, ML1486—pepN, ML1538—mycP5,
and ML2528—mycP3. The nomination of the three genes
encoding predicted HtrA-like proteases was also revised
based on further bioinformatics analysis using the site
Merops—the peptidase database [18]. ML0176, ML1078
and ML2659, which homologs in M. tuberculosis were
originally designated as pepD, htrA and pepA, respectively,
were now more appropriately designated as htrA2, htrA3
and htrA4. Actually, recent experimental data on the
recombinant ML0176 product indicate an optimal enzy-
matic activity at 45–55 1C, reinforcing the idea that this
enzyme constitutes a HtrA-like protease (M. L. Ribeiro-
Guimara
˜
es, unpublished results).
The complete group of protease genes found in each
of the mycobacterial genomes here analyzed is listed in
Table 1. These genes were grouped according to their
function. A summary of the differences in protease genes
found across the four mycobacterial species are better
visualized in Fig. 1 . When compared to other gene families,
a relatively high level of preservation of protease-coding
genes was observed in the genome of M. leprae [13]. A total
of 39 genes were identified in M. leprae, 38 of them
constituting a core of conserved protease genes found
across all four species with homology ranging from 63% to
97%. A quite similar set of proteases composed of 43 genes
was found in M. avium paratuberculosis, reinforcing
previous analysis indicating a closer phylogenetic proxi-
mity between M. leprae and M. avium paratuberculosis as
compared to the M. tuberculosis complex [21]. All the genes
present in M. leprae and M. avium paratuberculosis were
shared with M. tuberculosis and M. bovis. One gene present
in M. leprae, ML2613—a probable metalloprotease, was
not found in M. avium paratuberculosis.
Forty-nine genes were found in M. bovis, all shared
with M. tuberculosis (Fig. 1). Rv1977, a gene coding
for a putative iminopeptida se that probably acts as a
Zn-dependent enzyme with chaperone function, was found
only in M. tuberculosis. Another feature only found in the
M. tuberculosis H37Rv strain was related to the ptrB gene
coding for the oligopeptidase B, a gene first characterized
in Escherichia coli and probably involved in host cell
invasion [22]. A single ptrB gene coding for a protein with
around 700 amino acids was found in M. leprae, M. avium
paratuberculosis, M. tuberculosis CDC 1555 strain and
M. bovis. In contrast, this gene was split into two open
reading frames in the genome of M. tuberculosis H37Rv
strain, Rv0781 and Rv078 2, and denominated ptrBa and
ptrBb, respectively. In fact, this was the only difference
found between M. tuberculosis H37Rv and M. tuberculosis
CDC 1555 strains in the context of protease-coding genes.
The proteas e genes missing in at least one of the
mycobacterial genomes analyzed probably play redundant
roles or are responsible for differences in the pathogenesis
between the species. One example is the family of mycosin
(myc) genes. Five myc genes are found in M. tuberculosis
with identi ty ranging from 40% to 47%, suggesting that
they probably arose through gene duplication. Individual
ARTICLE IN PRESS
M.L. Ribeiro-Guimara˜es, M.C.V. Pessolani / Microbial Pathogenesis ] (]]]]) ]]]–]]]2
Please cite this article as: Ribeiro-Guimara
˜
es ML, Pessolani MCV. Comparative genomics of mycobacterial proteases. Microb Pathog (2007),
doi:10.1016/j.micpath.2007.05.010