Database of Standard Transcript Sequence (dbSTSeq) of human and other model species
Search by Type
 
 
Search by ID
(
for example: NM_000035)
 
Locus ID:
Search by Keyword
 
Keyword:

The dbSTSeq,abbreviated for database of Standard Transcript Sequence is focused on building a collection of revised transcript sequences based on officially released genomic DNA sequence of human and other model species. The transcript sequences are collected from the cDNA, mRNA and EST sequences in public domain. The polymorphisms, sequencing errors or contamination of vectors in the transcript sequences will be "masked" after mapped to a unique set of genomic DNA sequence. A program named EIparser was specially developed to meet this purpose. EIparser was used to determine the gene structures in details for advancing gene annotation of each record. According to which, dbSTSeq will provide a "standard" reference database of human beings and other model species with high quality. This database will be widely used for large-scale data analysis for gene transcription and regulation element detection.
1. Source of data for analysis
(1)Human genome database: (Release date: 20050420)
URL: ftp://ftp.ncbi.nih.gov/blast/db/FASTA/human_genomic.gz
(2)Human RefSeq database (Release date: 20050418, containing 29176 records)
URL: ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/
2. dbSTSeq for human genes based on RefSeq
(1)In the records of 29176 of the RefSeq database, there are 29156 records could be mapped to the genome
sequence. However, there are also 20 records could not find chromosomal localization.
(2)Three programs, EIparser, Blat and Sim4 were used to evaluate the quality for each RefSeq record with
its corresponding genomic DNA region. The cross-comparison was summarized in following table.
Program Perfect match
(Type I)
Revisable match
(Type II)
Unrevisable match
(Type III)

EIparser

15663

11758

1735

Blat

15500

13270

386

Sim4

14292

13795

1069


Cross comparison based on three programs

    Perfect match
(Type I)
Revisable match
(Type II)
Unrevisable match
(Type III)

 

 

EIparser

Blat

Sim4

EIparser

Blat

Sim4

EIparser

Blat

Sim4

Perfect match
(Type I)

EIparser

15663

15434

14250

 

224

1174

 

5

239

Blat

15434

15500

14153

45

 

1114

21

 

233

Sim4

14250

14153

14292

33

136

 

9

3

 

Revisable match
(Type II)

EIparser

 

45

33

11758

11669

11329

 

44

396

Blat

224

 

136

11669

13270

12550

1377

 

584

Sim4

1174

1114

 

11329

12550

13795

1292

131

 

Unrevisable match
(Type III)

EIparser

 

21

9

 

1377

1292

1735

337

434

Blat

5

 

3

44

 

131

337

386

252

Sim4

239

233

 

396

584

 

434

252

1069


Reference

LI Yu-Jian, LI Zhi-Feng, ZHANG Cheng-Gang. EIparser: An Efficient Tool for Parsing Exon/IntronStructure of Spliced Genes.(submitted)<download EIparser> <supplement>

Contact:
Questions on the program EIparser, please contact
Dr. Yujian Li and Dr. Chenggang Zhang
Questions on dbSTSeq, please contact
Ms. Zhifeng Li and Dr. Chenggang Zhang.

BMSCC, BioMedicine Super-Computing Center
Copyright 2005 Xiaolei Wang and Dongsheng Zhao All Rights Reserved