基因功能注釋

基因功能的注釋依賴於上一步的基因結構**，根據**結果從基因組上提取翻譯後的蛋白序列和主流的資料庫進行比對，完成功能注釋。常用資料庫一共有以幾種：

注意，後續分析中一定要保證你的蛋白序列中不能有代表氨基酸字元以外的字元，比如說有些軟體會把最後乙個終止密碼子翻譯成"."或者"*"

blastp

# download wget -4 -nd -np -r 1 -a *.faa.gz mkdir -p ~/db/refseq zcat *.gz > ~/db/refseq/plant.protein.faa # build index ~/opt/biosoft/ncbi-blast-2.7.1+/bin/makeblastdb -in plant.protein.faa -dbtype prot -parse_seqids -title refseq_plant -out plant # search ~/opt/biosoft/ncbi-blast-2.7.1+/bin/blastp -query protein.fa -out refseq_plant_blastp.xml -db ~/db/refseq/uniprot_sprot.fasta -evalue 1e-5 -outfmt 5 -num_threads 50 &

swiss-prot裡收集了目前可信度最高的蛋白序列，一共有55w條記錄，資料量比較小，

# download wget -4 -q gzip -d uniprot_sprot.fasta.gz # builid index ~/opt/biosoft/ncbi-blast-2.7.1+/bin/makeblastdb -in uniprot_sprot.fasta -dbtype prot -title swiss_prot -parse_seqids # search ~/opt/biosoft/ncbi-blast-2.7.1+/bin/blastp -query protein.fa -out swiss_prot.xml -db ~/db/swiss_prot/uniprot_sprot.fasta -evalue 1e-5 -outfmt 5 -num_threads 50 &

關於結果整理，已經有很多人寫了指令碼，比如說我搜尋blast xml csv，就找到了所以就不過多介紹。

interproscan

下面介紹的工具是interproscan, 從它的9g的體量就可以感受它的強大之處，一次執行同時實現多個資訊注釋。

命令如下

執行時間

基因功能注釋

使用MAKER進行基因注釋基礎入門

使用MAKER進行基因注釋基礎入門）

go kegg 差異基因的GO與KEGG注釋

基因功能注釋

使用MAKER進行基因注釋 基礎入門

使用MAKER進行基因注釋 基礎入門）

go kegg 差異基因的GO與KEGG注釋

相關推薦

使用MAKER進行基因注釋基礎入門

使用MAKER進行基因注釋基礎入門）