A transformer model called GEMORNA generates mRNA sequences from scratch, boosting protein output up to 41-fold over an optimized benchmark in cells. The approach also extended to circular RNA and CAR-T cells.

The mRNA behind the COVID vaccines was not written from scratch. It was tuned. Researchers took a coding sequence, swapped in synonymous codons the ribosome reads more smoothly, tinkered with the flanking untranslated regions, and hoped the cell would make more protein for longer. It worked well enough to reach billions of arms. But the tuning was mostly rule-of-thumb, and the design space is astronomically large. A single 1,000-nucleotide message has more synonymous versions than there are atoms in the observable universe.
A team led by scientists at Raina Biosciences, with collaborators including MIT synthetic biologist Timothy Lu, decided to hand that search to a machine. Their model, GEMORNA, borrows the transformer architecture that powers large language models and points it at RNA instead of English. It writes coding sequences and untranslated regions the way a language model writes sentences, one token at a time, having learned the statistical grammar of messages that translate well. The work appeared in Science on 6 November.
The premise is that a genetic message has structure a model can learn. The coding region and the two untranslated regions that bracket it each follow patterns tied to how efficiently a ribosome grabs the message and how long the cell keeps it around before chewing it up. GEMORNA was trained separately on those parts, then made to generate new versions optimized for expression and stability rather than copied from nature.
The numbers are the headline. When the researchers designed full-length mRNAs and measured firefly luciferase, a standard glow-in-the-dark reporter, output rose as much as 41-fold over an already-optimized industry benchmark in cultured cells. For a therapeutic payload, human erythropoietin, the hormone that tells the body to make red blood cells, GEMORNA designs pushed expression up to 15-fold higher. The model also drew up an mRNA vaccine against COVID that raised antibody levels in mice.
What makes this more than a codon calculator is its reach beyond ordinary linear mRNA. The team applied it to circular RNA, a looped form with no free ends that resists the enzymes that degrade normal messages, and got a large jump in circular EPO output. They then used GEMORNA-designed circular RNA to arm CAR-T cells, the engineered immune cells used against cancer, and reported stronger killing of tumor cells in the dish.
Most attention on mRNA drugs lands on the coding sequence, because that is what gets translated into protein. But the untranslated regions are where a lot of the control lives. The stretch before the start codon governs how readily the ribosome loads. The stretch after the stop codon carries signals that decide the message's lifespan. These regions are short, unglamorous, and hard to optimize by hand because their effects come from folded shapes and binding sites rather than a simple readout. A generative model that has seen enough examples can propose sequences a human would not think to try, which is a large part of where GEMORNA's gains seem to come from.
The practical appeal is speed. Designing a better message today often means building dozens of candidates and testing them one by one at the bench. A model that ranks or writes strong candidates up front narrows that list before anyone pipettes anything. For a field racing to extend mRNA past vaccines into protein-replacement drugs and cell therapies, shaving iterations off the design loop is worth a lot.
The caveats sit where they usually do with flashy fold-changes. A 41-fold gain in luciferase came from cultured cells and a reporter chosen because it is easy to measure, not from a person. The EPO and vaccine results in mice are more meaningful but still early, and mice are forgiving compared with the human immune system. Higher expression is not automatically better either; too much of a therapeutic protein can be as much a problem as too little, and durability, delivery, and safety all get decided outside the reach of a sequence model. The paper shows the design engine works across several formats. It does not show that any single design is ready for the clinic.
Two authors are affiliated with the company commercializing the model, which is worth keeping in mind when reading the strongest numbers. Still, the direction is hard to argue with. mRNA is a programmable medicine, and until now the programming has leaned heavily on intuition and trial and error. Turning the design of the message itself into a learned, generative problem is the kind of shift that tends to compound. The first sequences a model writes are rarely its best.
Weekly research updates, breakthrough summaries, and new articles — straight to your inbox. Free, always.
Comments