Some numbers:
Human genome gene numbers: 50,000 - 100,000
protein-coding genes: ~20,000
Mutation rate in DNA replication: 10-8 ~ 10-10 per bp
Genes can overlap with each other.
Gene structure
启动子(Promoter): RNA聚合酶特异性识别和结合的DNA序列。它不属于intron和Exon的任何一个,属于noncoding sequence。
UTR (UntranslatedRegions): 非翻译区,是信使RNA(mRNA)分子两端的非编码片段。UTR在DNA序列中属于外显子。
5’-UTR从mRNA起点的甲基化鸟嘌呤核苷酸帽延伸至AUG起始密码子,3’-UTR从编码区末端的终止密码子延伸至多聚A尾巴(Poly-A)的末端。
Alternative splicing
Relationship between transcripts and isoforms:
It’s the same thing, more or less. You can use “isoform” to refer to different versions of protein that originate from the same locus. And “transcript variant / alternative transcript” means different version of transcript. However, this is applicable not only for mRNAs, but also for lncRNAs.