生物信息学实习一

实习 一 : 序列查询

实验目的:

1. 了解三大生物信息中心的资源;

2. 学会用Entre系统查找目标序列;

3. 学会用SRS系统查找目标序列

实验内容:

(一)三大生物信息中心浏览

NCBI、EBI、DDBJ

(二) Entrez的使用

Limits and Advanced Search

(三) SRS的使用

作业:

1. Introduce the following NCBI databases in your own words:MMDB,CDD,dbGap, PMC.,OMIM, UniGene, PubChem, RefSeq.

1).MMDB: Structure (Molecular Modeling Database ) is Three dimensional structure which provide a wealth of information on the biological function and the evolutionary history of macromolecules. 2).CDD: Conserved Domain Database (CDD) is a protein annotation resource that consists of a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins.

3).dbGap: The database of Genotypes and Phenotypes(dbGaP) is developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype.

4).PMC: PubMed Central (PMC) is a free archive of biomedical and life

sciences journal literature to keep with NLM’s legislative mandate to collect and preserve the biomedical literature.

5).OMIM: Online Mendelian Inheritance in Man(OMIM) is a comprehensive, authoritative, and timely compendium of human genes and genetic phenotypes.

6).UniGene: UniGene is a database that computationally identifies transcripts from the same locus; analyzes expression by tissue, age, and health status; and reports related proteins(protEST) and clone resources.

7).Pubchem:PubChem is a chemical module database that provides information on the biological activities of small molecules which includes substance

information, compound structures, and BioActivity data in three primary databases, Pcsubstance, Pccompound, and PCBioAssay, respectively

8).RefSeq: The Reference Sequence (RefSeq)is a collection which aims to

provide a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins.

2. Make a list of the molecular biology related books on the NCBI bookshelf,

specifying the book title, authors and publishing press. How about bioinformatics related books?

Molecular biology books:

1).Molecular Biology of the Cell. 4th edition Alberts B, Johnson A, Lewis J, et al.. New York: Garland Science; 2002.

2) Lodish H, Berk A, Zipursky SL, et al. Molecular Cell Biology. 4th edition. New York: W. H. Freeman; 2000.

Bioinformatics related books:

1) Madame Curie Bioscience Database [Internet]. Austin (TX): Landes

Bioscience; 2000.

2) National Research Council (US) Committee on Frontiers at the Interface of Computing and Biology; Wooley JC, Lin HS, editors. Catalyzing Inquiry at the Interface of Computing and Biology. Washington (DC): National Academies Press (US); 2005.

3. Introduce the following EBI databases in your own words:chEBI, ENA, UniProt, Array Express, Ensemble, PDBe

1) chEBI:Chemical Entities of Biological Interest (ChEBI) is a freely

available dictionary of molecular entities focused on ‘small’ chemical compounds.

2) ENA: The European Nucleotide Archive (ENA) is a database which

captures and presents information relating to experimental workflows that are based around nucleotide sequencing.

3)UniProt: UniProt is a databases to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information

4)Array Express: The ArrayExpress Archive is a database of functional

genomics experiments including gene expression where you can query and download data collected to MIAME and MINSEQE standards.

5)Ensemble: The Ensembl project produces genome databases for

vertebrates and other eukaryotic species, and makes this information freely available online.

6)PDBe: PDBe is the European resource for the collection, organisation and dissemination of data on biological macromolecular structures.

4. Do a search for the 16S ribosomal RNA gene from Aeromonas hydrophila strain AE7.

a. Give the search details that you used to find this sequence.

Enter the Entrez →Type the key word” 16S ribosomal RNA gene eromonas hydrophila strain AE7”in the search box →The result only turns out in Nucleotide database →Click it and scan the result

b. What is the accession number?

The accession number is DQ855289.

c. How many base pairs are in this sequence?

The sequence has 992bp base pairs.

d.When was the entry last modified?

The entry last modified at 21-AUG-2006

e. Is there another organism that produces the same protein? If so, name the organism and show your evidence. Yes, find 16S ribosomal RNA in Protein Database, there are 8132 results. For example, Lactobacillus salivarius CECT 5713 16S ribosomal

RNA dimethyladenosine transferase) (16S ribosomal RNA dimethylase)

5. Search for the nucleotide sequence with accession number NM_013161.

a. What organism is this sequence from?

The sequence comes from Rattus norvegicus (Norway rat)

b. What is the accession number of the protein linked to this sequence?

The accession number is NP_037293

c. What is the function of this protein?

The Pancreatic lipase hydrolyzes dietary fat molecules in the human

digestive system, converting triglyceride substrates found in ingested oils to monoglycerides and free fatty acids.

d. Find a reference by Hjorth, et al, related to this protein. What is the PubMed ID for this article?

,ID is 8490016

e. In your own words, briefly describe what the researchers reported in the article.

A structural domain (the lid) found in pancreatic lipases is absent in the guinea pig (phospho)lipase. The amino acid sequence of guinea pig

(phospho)lipase is highly homologous to that of other known pancreatic lipases, with the exception of a deletion in the so-called lid domain that regulates access to the active centers of other lipases. We propose that this deletion is directly responsible for the anomalous behavior of this enzyme. Thus GPL challenges the classical distinction between lipases, esterases, and

phospholipases.

6. Search for orthologous nucleotide and protein sequences in more than 5 organisms, save all the sequences in fasta and genbank format for next practice.

a. Give the titles of those sequences.(first line of the fasta format).

b. What organisms are those orthologous sequences from?

c. What is the sequence length of each sequence?

d. When was each entry last modified?

Sequence1- gene:

Title: >gi|360039204|ref|NM_001008215.2| Homo sapiens cytochrome C oxidase assembly factor 5 (COA5), mRNA

Length: 1767 bp Organism: Homo sapiens Modified PRI 09-JUN-2012 Sequence1- protein:

Title >gi|56118949|ref|NP_001008216.1| cytochrome c oxidase assembly factor 5 [Homo sapiens]

Length: 74 aa Organism: Homo sapiens Modified: PRI 09-JUN-2012 Sequence2- gene:

Title: >gi|303324588|ref|NM_001195024.1| Bos taurus cytochrome c oxidase assembly factor 5 (COA5), mRNA

Organism: Bos taurus Length: 834 bp Modified MAM29-APR-2012 Sequence2- protein:

Title: >gi|303324589|ref|NP_001181953.1| cytochrome c oxidase assembly factor 5

[Bos taurus]

Length: 74 aa Modified: MAM 29-APR-2012 Sequence3- gene:

Title: >gi|390474111|ref|XM_002757406.2| PREDICTED: Callithrix jacchus cytochrome C oxidase assembly factor 5 (COA5), mRNA

Organism: Callithrix jacchus Length: 786 bp Modified PRI

08-JUN-2012 Sequence3- protein:

Title: Protein: >gi|296223030|ref|XP_002757452.1| PREDICTED: cytochrome c oxidase assembly factor 5 [Callithrix jacchus]

Length 74 aa Modified: PRI 08-JUN-2012 Sequence4- gene:

Title: >gi|284520949|ref|NM_001171783.1| Xenopus laevis cytochrome C oxidase assembly factor 5 (coa5), mRNA

Length: 730 bp Organism: Xenopus laevis Modified: VRT 26-MAY-2012 Sequence4- protein:

Title: Protein: >gi|284520950|ref|NP_001165254.1| cytochrome c oxidase assembly factor 5 [Xenopus laevis]

Length: 75 aa Modified: ? VRT 26-MAY-2012

Sequence5- gene:

Title: ?>gi|238859532|ref|NM_001161497.1| Danio rerio cytochrome C oxidase assembly factor 5 (coa5), mRNA

mRNA linear

Length: 504 bp Organism: Modified? VRT 26-MAY-2012

Sequence5- protein:

Title: >gi|238859533|ref|NP_001154969.1| cytochrome c oxidase assembly factor 5

[Danio rerio]

Length: 75 aa ?Modified: VRT 26-MAY-2012

相关推荐