Unrestricted Error-Type Codebook Generation for Error Correction Code in DNA Storage Inspired by NLP

Yi Lu, Yun Ma, Chenghao Li, Xin Zhang, Guangxiang Si
Computer Science, Information Theory, Information Theory (cs.IT)
2024-01-29 00:00:00
Recently, DNA storage has surfaced as a promising alternative for data storage, presenting notable benefits in terms of storage capacity, cost-effectiveness in maintenance, and the capability for parallel replication. Mathematically, the DNA storage process can be conceptualized as an insertion, deletion, and substitution (IDS) channel. Due to the mathematical complexity associated with the Levenshtein distance, creating a code that corrects for IDS remains a challenging task. In this paper, we propose a bottom-up generation approach to grow the required codebook based on the computation of Edit Computational Graph (ECG) which differs from the algebraic constructions by incorporating the Derivative-Free Optimization (DFO) method. Specifically, this approach is regardless of the type of errors. Compared the results with the work for 1-substitution-1-deletion and 2-deletion, the redundancy is reduced by about 30-bit and 60-bit, respectively. As far as we know, our method is the first IDS-correcting code designed using classical Natural Language Process (NLP) techniques, marking a turning point in the field of error correction code research. Based on the codebook generated by our method, there may be significant breakthroughs in the complexity of encoding and decoding algorithms.
PDF: Unrestricted Error-Type Codebook Generation for Error Correction Code in DNA Storage Inspired by NLP.pdf
Empowered by ChatGPT