DNA Storage Search Engine GeneFind Launches: Millisecond-Level Retrieval Across Trillions of Base Pairs

Microsoft Research and Illumina's jointly developed GeneFind search engine solves DNA data storage's biggest bottleneck—retrieval speed—achieving millisecond-level random access to DNA data for the first time.

Microsoft Research released GeneFind on September 5, a DNA storage search engine jointly developed with gene sequencing company Illumina. The system solves the biggest practical bottleneck facing DNA data storage technology: how to quickly find and read target data segments among trillions of base pairs.

DNA data storage offers extraordinary density (1 gram of DNA can store 215 petabytes), but retrieval speed has been the primary obstacle to commercialization. Traditional methods require full sequencing of the entire DNA pool to locate target data, taking hours to days.

GeneFind introduces a technology called "molecular indexing." When data is written to DNA, the system appends a unique index sequence to each data block. During retrieval, GeneFind uses the CRISPR-Cas system as a "molecular search engine"—carrying guide RNA complementary to the target index sequence, it precisely locates and cuts target segments from the DNA pool, then rapidly reads them through a nanopore sequencer.

In benchmark tests, GeneFind achieved an average retrieval latency of 3.2 milliseconds in a DNA storage pool containing 10 trillion base pairs—approximately 10 million times faster than traditional full-sequencing methods. Illumina's CTO said: "GeneFind transforms DNA storage from a cold archive technology into a hot storage technology."

Disclaimer

Content is AI-generated. Do not use it as a basis for real decisions. Do not cite it as factual reporting.