1College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, PR China
2Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, PR China
3Biodesign Center, Tianjin Institute of Industrial Biotechnology Chinese Academy of Sciences, Tianjin 300308, PR China
4National Center of Technology Innovation for Synthetic Biology, Tianjin 300308, PR China
5These authors contributed equally to this work
| Received 23 May 2025 |
Accepted 28 Jul 2025 |
Published 28 Aug 2025 |
Codon optimization enhances heterologous gene expression by modulating synonymous codon usage, a critical task in genetic engineering and synthetic biology. Achieving optimal expression requires balancing multiple interdependent factors, such as host codon bias, GC content and mRNA secondary structure, turning optimization into a challenging multiobjective problem. Here, we introduce DeepCodon, a novel deep learning tool focused on preserving functionally important rare codon clusters, which are often overlooked in previous methods. Using Escherichia coli as the host species for gene expression, a protein-CDS translation model was first trained on 1.5 million natural Enterobacteriaceae sequences and then fine-tuned with highly expressed genes. To protect functionally important rare codon clusters, we integrated a conditional probability strategy that preserves conserved rare codons. Compared with conventional approaches, DeepCodon generates sequences that better match host preferences, achieves superior in silico metrics and maintains critical rare codons. Experimental validation of seven low-yield P450s and thirteen AI-designed G3PDHs in E. coli revealed that DeepCodon outperformed traditional methods in nine cases. These results demonstrate DeepCodon's potential as a practical solution for codon optimization.