| Small
RNAs of less than 40 nucleotides (nt) in length, such
as microRNAs (miRNAs), small interfering RNAs (siRNAs),
scanRNAs (scnRNAs) and piwi-interacting RNAs (piRNAs),
constitute a large family of tiny regulatory molecules.
Among them, miRNAs, piRNAs and siRNAs are also discovered
in mammals, which have diverse and important functions.
Although our knowledge of mammalian
small RNAs has advanced rapidly, the primary transcripts
of most mammalian small RNAs remain to be determined.
Uncovering primary transcripts of small RNAs is very
important to our understanding of the biogenesis of
small RNAs. It facilitates (a) identifying the regulatory
regions such as transcription factor binding sites (TFBS)
and hence discovering upstream regulators in the network,
(b) detecting other sequence and structural motifs required
in small RNA processing and (c) providing essential
information for small RNA knockout (Saini
et al., 2007). The direct and reliable way to determine
primary transcripts is wet experiments such as rapid
amplification of cDNA ends (RACE). However, wet experiments
are expensive and not suitable for large-scale analysis.
Genomic analysis is an efficient substitute. By combining
transcription information available from public databases
such as GenBank, genomic analysis can provide preliminary
evidence for primary transcripts. Several studies (Saini
et al., 2007; Jin
et al., 2006; Rodriguez
et al., 2004) have attempted to delineate the genomic
boundaries of miRNAs by large-scale genomic analysis.
BatchGenAna is specifically developed
for large-scale genomic analysis of small RNAs, which
has the following seven distinct features:
1. Provide batch mapping and annotation
for as many as 1,000 nucleotide sequences or 10,000
genomic loci of small RNAs at a time.
2. Utilize two alignment algorithms, miBLAST
(Kim
et al., 2005) and blat (Kent
2002), for sequences shorter than 40nt and sequences
longer respectively, to improve both computational efficiency
and accuracy; In our case, miBLAST is ~30 times faster
than BLAST and ~10 times faster than WUBLAST for sequences
shorter than 40nt and has the same sensitivity.
3. Provide genomic features including RefSeq genes,
mRNAs, ESTs and repeat elements.
4. Produce two types of results. One is tabular results,
i.e. files in tab-delimited format, in which the genomic
loci and names of the genomic features overlapping submitted
queries are listed separately; The other is graphical
views, in which users’ queries are denoted as green
boxes (in plus strand) and red boxes (in minus strand).
ESTs, mRNAs, RefSeq genes and repeat elements are displayed
in different tracks with different colors. Users may
select either or both of them.
5. Find clusters of submitted queries automatically
according to their genomic loci, and display the whole
cluster in one genome view.
6. Provide options for fetching flanking sequences of
submitted queries and the overlapping transcripts. The
flanking sequences of the overlapping transcripts can
be extracted according to their GenBank accession numbers
in a second run.
7. Notify users via email when their jobs are completed.
The results will be kept in the server for 120 hours.
|