DARPA wants fast data encoding and processing of big data using molecules

DARPA has announced its Molecular Informatics program, which seeks a new paradigm for data storage, retrieval, and processing. Instead of relying on the binary digital logic of computers based on the Von Neumann architecture, Molecular Informatics aims to investigate and exploit the wide range of structural characteristics and properties of molecules to encode and manipulate data.

“Chemistry offers a rich set of properties that we may be able to harness for rapid, scalable information storage and processing,” said Anne Fischer, program manager in DARPA’s Defense Sciences Office. “Millions of molecules exist, and each molecule has a unique three-dimensional atomic structure as well as variables such as shape, size, or even color. This richness provides a vast design space for exploring novel and multi-value ways to encode and process data beyond the 0s and 1s of current logic-based, digital architectures.”

Molecular storage concepts, such as those based on DNA sequences, have advanced in recent years and show promise for archiving digital data in a format that takes up extremely small physical space, Fischer said. But DNA storage doesn’t allow for rapid retrieval and processing of selected portions of the DNA-encoded data without having to first decode the molecule-based data back into an electronic digital format to use with existing information systems. The primary technical challenge posed by the Molecular Informatics program is the integration of dense storage concepts with processing of molecule-encoded information via completely new, non-binary information structures. The intent of the program is to explore such opportunities in the much broader design and encoding space of millions of molecules, which offers far more opportunity than do the four building-block molecules (As, Ts, Cs, and Gs) of DNA.

The Molecular Informatics program seeks to develop new concepts and approaches for data storage and processing using the molecules instead of 1s and 0s used in current digital information systems.

To achieve its goals, the program will require a diverse, collaborative community of researchers from fields including chemistry, computer and information science, mathematics, and chemical and electrical engineering. These integrated teams will need to answer foundational questions such as: How can data be encoded in molecules? What types of data operations can molecules execute? What does “computation” mean in a molecular context? By addressing mathematical and computational problems that challenge our current capabilities, the Molecular Informatics program aims to discover and define opportunities for the use of molecules in information storage and processing.

“Fundamentally, we want to discover what it means to do ‘computing’ with a molecule in a way that takes all the bounds off of what we know, and lets us do something completely different,” Fischer said. “That’s why we absolutely need the diverse knowledge of many different fields working together to jump into this new molecular space to see what we can discover.”

The Molecular Informatics program seeks to explore this design space by developing and testing completely new approaches to store and process information with molecules. Such an undertaking requires a diverse, collaborative community of researchers from fields including chemistry, computer and information science, mathematics, and chemical and electrical
engineering. These groups will come together to answer questions such as:
(1) How and what can we encode in molecules?
(2) What types of operations can molecules execute?
(3) What are the representational abstractions, mathematical or computational primitives that can describe these operations?
(4) What does ‘computation’ mean in a molecular context?
(5) What functions can be decided via molecular means and what equivalence might they have to traditional
computing methods? and
(6) Can we design approaches to compute directly on and with molecular data? By addressing a series of mathematical and computational problems with molecule-based information encoding and processing, Molecular Informatics will discover and
define future opportunities for molecules in information storage and processing.

Anticipated outcomes of the program include:
(1) New approaches to represent information and execute computational operations in molecular form;
(2) Scalable strategies to extract and process information from large molecular data stores; and
(3) Molecular computing concepts that provide capabilities beyond our conventional computational architectures.

Molecular Informatics approaches must ultimately enable information processing directly on molecular data so that advantages molecules offer (such as ultrahigh information storage densities and inherently parallel processing) can be realized. Approaches that more fully exploit the rich diversity of molecular structures and properties (e.g., complex molecular mixtures, nonnatural polymers, etc.) and offer capabilities beyond binary, digital encoding and serial, logic based computation are of most interest. Ideas based on molecular logic gates, biomolecular computing strategies and those that are inherently not scalable, are not within the scope of the Molecular Informatics program.

Molecular Informatics performers will validate their information encoding and processing strategies during the first program phase and develop a method to integrate their capabilities and demonstrate processing directly on molecular data in the second program phase (option period).

Proposals must provide strong technical justification for the molecular approach and clearly describe the relevant computational and/or mathematical problems that will be addressed in the effort. Proposals should also provide evidence for the compatibility of the encoding and processing concepts and detail the option period integration strategy. Design modifications during the program to address weaknesses and improve versatility are encouraged.

Proposed approaches must ultimately be scalable to encode and process large datasets; however, demonstrating scale (e.g., developing a fully automated system that operates on a given timescale for TBs worth of data) is not within the scope of the program. Rather, performers will compare their results to conventional storage and computing technologies at several points throughout the program and provide projections at the end of the program for the potential opportunities.

Encode, read and write: Performers will validate their molecular encoding concepts by demonstrating storage densities ≥10^18 bytes/mm3 with at least 1 GB of data. Strategies must enable random access (i.e., the ability to selectively access and read a file or set of files). Proposals should clearly describe the technical approach including molecular properties and designs that will be exploited, as well as justify how the approach is inherently scalable (e.g., in terms of factors such as encoding complexity, read/write speeds, information density, etc.).

Approaches that simplify synthesis requirements (e.g., use of molecular mixtures vs. sequence defined polymers) and automate processes with existing technologies (e.g., microfluidics) are highly encouraged. While long-term stability is not an explicit metric, proposals should clearly describe projected stability and storage constraints (e.g., requirements for water-free storage, specified temperature ranges, etc.). Proposals should also indicate whether the approach is writeonce/read-once (i.e., reading destroys the molecular data) or write-once/read-many. Truly nondestructive techniques—not those that simply rely on many copies to preserve data integrity— are encouraged.

Process: Performers will validate their molecular processing approach against at least two distinct mathematical and/or computational problem classes. Proposals must clearly describe the problems that will be addressed and justify the selection based on projected molecular capabilities and advantages that might be realized with respect to current computational architectures. Versatile approaches that could potentially address many computational problem classes are highly encouraged. A detailed description of current computational approaches for each question, including key metrics such as accuracy, processing time, energy consumption, etc. for a notional n-member data set must be provided. Molecular approaches projected to offer advantages such as faster processing speeds because of factors such as parallel operation or a completely new computational capability are of most interest. Proposals that do not clearly describe the benefits of the molecular approach, particularly in terms of the benefits it might offer to information processing, will be viewed as non-conforming.

It should be noted that notions of information processing and “computation” in a molecular context may be radically different than the traditional Turing/VonNeumann methods we have come to know so well. While transforming traditional computational problems into molecular form is certainly of interest and within scope, DARPA would be keenly interested in knowing if there were representations, abstractions, etc. of molecules and molecular dynamics that offered alternative notions of information processing other than Turing/VonNeuman.

Processing approaches that mimic current computational architectures (e.g., molecular electronics, molecular logic gates, etc.) are explicitly out of scope, as are approaches that propose to miniaturize chip-based components or provide evolutionary enhancements to past molecular computing efforts. DNA-based approaches are within the scope of this program, but proposers must clearly justify scalability of the proposed methods.

So-called biological computing that requires RNA, enzymes and/or proteins is out of scope.

Molecular informatics: Performers that successfully demonstrate information encoding and processing concepts will integrate their approaches during the program option period to demonstrate processing directly on molecular data. The ability to store and process molecular information, without having to convert it to digital form for processing on our conventional computers is a critical aspect of realizing practical molecular approaches for information processing. Performers will demonstrate processing on one of the problem classes from the first program phase to validate the integration and must fully characterize their system to establish design modifications and system improvements. Importantly in this phase, performers will develop projections for the capabilities and limitations of their approaches from a molecular encoding and processing perspective to better define the ultimate potential for molecules in information processing.