题目描述
脱氧核糖核酸(DNA\texttt{DNA}DNA)由核苷酸碱基对组成,形成双螺旋结构。通过一系列复杂的生化过程,生物体DNA\texttt{DNA}DNA中的核苷酸序列被翻译成其生命所需的蛋白质。本题的目标是编写一个计算机程序,接受一个DNA\texttt{DNA}DNA链并报告由其产生的蛋白质(如果有的话)。
背景知识
- 核苷酸碱基:腺嘌呤(A\texttt{A}A)、胞嘧啶(C\texttt{C}C)、鸟嘌呤(G\texttt{G}G)、胸腺嘧啶(T\texttt{T}T)
- 互补配对:A↔T\texttt{A} \leftrightarrow \texttt{T}A↔T,C↔G\texttt{C} \leftrightarrow \texttt{G}C↔G
- DNA\texttt{DNA}DNA链通常只列出主链,互补链可通过替换互补碱基得到
- 转录:从DNA\texttt{DNA}DNA主链产生信使RNA\texttt{RNA}RNA(mRNA\texttt{mRNA}mRNA),mRNA\texttt{mRNA}mRNA与互补链相同,但胸腺嘧啶(T\texttt{T}T)被尿嘧啶(U\texttt{U}U)取代
- 翻译:mRNA\texttt{mRNA}mRNA中的碱基序列被分为三个一组(密码子),AUG\texttt{AUG}AUG标记蛋白质序列的开始,UAA\texttt{UAA}UAA、UAG\texttt{UAG}UAG、UGA\texttt{UGA}UGA标记结束
输入与输出
输入包含多行DNA\texttt{DNA}DNA序列,每行一个,以*结束。每个给定的DNA\texttt{DNA}DNA链可能是主链或互补链,可能是正向或反向顺序。起始和终止密码子不一定出现在链的末端。
对于每个输入,输出翻译得到的蛋白质(氨基酸缩写用连字符连接)。如果无法翻译出有效蛋白质,输出*** No translatable DNA found ***。
样例输入
ATACTGTAATTCACTCC CACCTGTACACAGAGGAATACTG TTAATACGACATAATTAT GCCTTGATATGGAGAACTCATTAGATA AAGTGTATGTGAATTATATAAAACGGGCATGA ATGATGATGGCCTGA *样例输出
Ser-Ile-Lys Cys-Leu-His Ser-Tyr *** No translatable DNA found *** Leu-Asn-Tyr-Ile-Lys-Arg-Ala Met-Met-Ala题目分析
问题的本质
这是一个生物信息学中的DNA\texttt{DNA}DNA翻译问题。需要:
- 考虑DNA\texttt{DNA}DNA链的四种可能形式:主链正向、主链反向、互补链正向、互补链反向
- 将DNA\texttt{DNA}DNA转录为mRNA\texttt{mRNA}mRNA(
T→U) - 在mRNA\texttt{mRNA}mRNA中寻找起始密码子
AUG - 从起始密码子后开始读取密码子,直到遇到终止密码子
- 将密码子翻译为氨基酸缩写
四种链形式
对于给定的DNA\texttt{DNA}DNA链S,需要考虑:
S(主链正向)reverse(S)(主链反向)complement(S)(互补链正向)reverse(complement(S))(互补链反向)
翻译过程
- 将DNA\texttt{DNA}DNA中的
T替换为U得到mRNA\texttt{mRNA}mRNA - 在mRNA\texttt{mRNA}mRNA中查找
AUG - 从
AUG后的位置开始,每次读取 3 个碱基 - 如果遇到终止密码子(
UAA、UAG、UGA),停止 - 否则,将密码子翻译为氨基酸缩写
密码子表
使用map<string, string>存储密码子到氨基酸缩写的映射。
参考代码
// DNS Translation// UVa ID: 385// Verdict: Accepted// Submission Date: 2016-07-06// UVa Run Time: 0.040s//// 版权所有(C)2016,邱秋。metaphysis # yeah dot net#include<bits/stdc++.h>usingnamespacestd;// DNA 碱基互补映射map<char,char>DNA={{'A','T'},{'T','A'},{'C','G'},{'G','C'}};// 密码子到氨基酸缩写的映射map<string,string>code={{"UUU","Phe"},{"UCU","Ser"},{"UAU","Tyr"},{"UGU","Cys"},{"UUC","Phe"},{"UCC","Ser"},{"UAC","Tyr"},{"UGC","Cys"},{"UUA","Leu"},{"UCA","Ser"},{"UUG","Leu"},{"UCG","Ser"},{"UGG","Trp"},{"CUU","Leu"},{"CCU","Pro"},{"CAU","His"},{"CGU","Arg"},{"CUC","Leu"},{"CCC","Pro"},{"CAC","His"},{"CGC","Arg"},{"CUA","Leu"},{"CCA","Pro"},{"CAA","Gln"},{"CGA","Arg"},{"CUG","Leu"},{"CCG","Pro"},{"CAG","Gln"},{"CGG","Arg"},{"AUU","Ile"},{"ACU","Thr"},{"AAU","Asn"},{"AGU","Ser"},{"AUC","Ile"},{"ACC","Thr"},{"AAC","Asn"},{"AGC","Ser"},{"AUA","Ile"},{"ACA","Thr"},{"AAA","Lys"},{"AGA","Arg"},{"AUG","Met"},{"ACG","Thr"},{"AAG","Lys"},{"AGG","Arg"},{"GUU","Val"},{"GCU","Ala"},{"GAU","Asp"},{"GGU","Gly"},{"GUC","Val"},{"GCC","Ala"},{"GAC","Asp"},{"GGC","Gly"},{"GUA","Val"},{"GCA","Ala"},{"GAA","Glu"},{"GGA","Gly"},{"GUG","Val"},{"GCG","Ala"},{"GAG","Glu"},{"GGG","Gly"},};intmain(intargc,char*argv[]){ios::sync_with_stdio(false);string line;while(getline(cin,line),line!="*"){// 生成四种可能的 mRNA 序列vector<string>mRNA;mRNA.push_back(line);// 主链正向string rev=line;reverse(rev.begin(),rev.end());mRNA.push_back(rev);// 主链反向string comp;for(charc:line)comp+=DNA[c];mRNA.push_back(comp);// 互补链正向reverse(comp.begin(),comp.end());mRNA.push_back(comp);// 互补链反向// 将 DNA 转换为 mRNA(T → U)for(string&s:mRNA)for(char&c:s)if(c=='T')c='U';boolproteinProduced=false;// 尝试每种 mRNA 序列for(inti=0;i<mRNA.size();i++){string message=mRNA[i];size_t start=message.find("AUG");if(start==string::npos)continue;start+=3;// 跳过起始密码子vector<string>proteins;boolended=false;// 读取后续密码子直到遇到终止密码子while(start+2<message.length()){string block=message.substr(start,3);if(block=="UAA"||block=="UAG"||block=="UGA"){ended=true;break;}if(code.find(block)!=code.end())proteins.push_back(code[block]);start+=3;}if(ended&&!proteins.empty()){proteinProduced=true;for(size_t i=0;i<proteins.size();i++){if(i>0)cout<<"-";cout<<proteins[i];}cout<<endl;break;}}if(!proteinProduced)cout<<"*** No translatable DNA found ***"<<endl;}return0;}