So You've Sequenced Everything. Now What?
Systematically mapping gene function in non-model bacteria
TL;DR (bioRxiv)
Sequencing has scaled far faster than our ability to experimentally establish gene function, which is needed to understand and engineer biology
We built JERBOA, a toolkit of 3,888 transposon variants, to systematically identify genome-wide genetics that work in new organisms
JERBOA enabled transposon mutagenesis in 43 bacteria across twelve phyla. We demonstrated its utility in Comamonas testosteroni, a PET plastic and lignin waster stream degrader, producing the first genome-wide genetic map of its aromatic metabolism and validating key transport systems, catabolic operons, and gene regulation
The Gap Between Reading Genomes and Understanding Them
Over the last decade, biologists have learned how to read genomes at extraordinary scale. Microbial sequencing is now routine, fast, and cheap. Entire genomes can be generated in days, annotated electronically, and deposited into public repositories.
Public databases now contain hundreds of thousands of microbial genome sequences. Yet, we still don’t know what most of those genes actually do.
Yes, we can assign function to many genes computationally. Sequence homology, conserved domains, and protein structure can give us predictions which are useful as an informed guess. But when you are trying to understand mechanism of action or engineer a strain, you need to know what a gene does with more certainty.
The best way to figure out what a gene does is to disrupt it and see if something breaks or gets better.
For a small number of model organisms, this kind of experimental validation is routine. For more than 99.9% of the microbial world, we simply don’t have the tools to do this.
Why Genome-Wide Genetics Breaks Down
For more than thirty years, transposon mutagenesis has been the backbone of genome-wide functional genetics. Transposons are mobile genetic elements that randomly insert themselves across the genome. Since a bacterial genome is packed with genes, transposons usually insert into genes, inactivating them and creating what we call a knockout. You can use a functional transposon to generate large collections of mutant strains that collectively represent a genome-wide knockout library. This library can then be screened under specific conditions: classically, to isolate mutants that grow well under a given condition, thereby implicating a gene’s function; or, in the modern era of DNA sequencing, to comprehensively track the abundance of all mutants across the genome, where changes in abundance reveal which genes help or hurt the organism under that condition.
In model organisms like E. coli, this approach works reliably. Outside a small set of non-model microbes, however, existing transposon systems usually fail.
Transposon activity is not governed by a single variable. Productive mutagenesis requires a narrow balance across multiple host-dependent parameters. The transposase must be expressed strongly enough to drive insertion but not so strongly that it becomes toxic. The selectable marker must function in the host’s physiology. The regulatory sequences controlling both must be recognized by the host’s transcriptional machinery. Promoters that work well in one bacterium can be silent or disruptive in another.
We do not have a general model that predicts how these parameters interact across diverse species. Establishing genome-wide genetics in a new organism remains an empirical exercise. Researchers test a handful of constructs, iterate slowly, and often abandon the effort altogether. When this process fails, genomes remain electronically annotated but experimentally unverified, limiting our ability to move from sequence to mechanism.
Treating Genetics as a Search Problem
Rather than trying to design the right transposon in advance, we reframed the challenge: finding a functional transposon system is a search problem across a large combinatorial space. Build many variants, screen them in parallel, and let the data reveal what works.
We built JERBOA (Jumping Element libRaries with Broad hOst rAnge), a toolkit comprising 3,888 transposon variants. These variants span two transposases, six antibiotic resistance markers, and eighteen promoters drawn from mobile genetic elements across diverse bacterial lineages. The promoters were selected to span a range of expression strengths rather than a single presumed optimum, and to reflect regulatory sequences that have already functioned across multiple hosts. All of these variants were tagged with a unique DNA barcode.
All barcoded variants were delivered together in a single pooled experiment. Sequencing then revealed which specific combinations successfully generated genomic insertions in each organism.
Instead of guessing what should work, we let the data tell us what actually does.
What Worked Across Microbial Diversity
We applied this screen to 92 non-model bacteria spanning twelve phyla. Functional transposon insertions were observed in 43 of them, including seven species with no previously reported genetic tools of any kind.
Intriguing patterns emerged. Closely related organisms often favored similar promoter and transposase combinations. In several cases, variants that performed well in one member of a genus or family also performed well in others, suggesting that genetic access may scale by lineage more often than previously assumed. This effect was not universal, but it was reproducible enough to be practically useful.
The result was not a universal transposon. It was a repeatable way to discover functional genome-wide genetics in new organisms without months of bespoke optimization. By making these measurements across a wide range of microbes, we are starting to build a model for how gene expression and transposon mutagenesis behave across diverse species.
When Experimental Causality Becomes Possible
We showcased this toolkit on Comamonas testosteroni KF-1, a bacterium involved in PET plastic and lignin degradation.
Using the top-performing transposon identified with JERBOA, we generated the first genome-wide mutant library in this organism. The library contained 73,195 unique insertion sites and covered 92% of annotated genes.
We then grew this pooled mutant library on breakdown molecules either from PET plastic (terephthalate) or lignin (4-hydroxybenzoate), and compared it with a control carbon source. By measuring tracking these 73,195 transposon insertions across the genome, we were able to:
Experimentally validate the tctA/tctB transport system and tph operon for terephthalate utilization
Distinguish the 4,5-meta cleavage pathway for protocatechuate metabolism from alternative annotated pathways
Identify additional genes whose roles were misassigned or missed entirely by electronic annotation alone
In several cases, disruptions to regulatory genes improved growth on plastic-derived substrates, revealing biology that would not have been apparent using computational tools (sequence or structure-based prediction) alone.
What Comes After Sequencing
The point is not that we made one bacterium tractable. The point is that genome-scale genetics in non-model bacteria no longer requires years of bespoke optimization. When finding a working transposon becomes a systematic search, understanding what a gene does can finally start scaling with sequencing.
At Cultivarium, we want to move from artisanal methods to systematic frameworks, putting us on the path to reducing the time and cost of studying and engineering biology. Our goal is to harness the genetic potential of the biosphere by turning non-model organisms into tractable laboratory systems.
The preprint is available on bioRxiv. Plasmids are available via Addgene (soon as a kit). Code for transposon sequencing analysis is available on GitHub.
If you are working with a microbe that has no tools but too much potential, we can help you! Reach out via partnerships@cultivarium.org



