Chinese scientists refresh the sequencing accuracy of genes

Release date: 2018-01-02

Since Alpha Go has become No.1 in the world of chess, the potential of "smart" has been widely known, especially for a lot of repetitive work, writing an "algorithm" to let the computer "run", and the results may be stronger than others. .

Scientists not only have large brain holes, but also extraordinary execution. This time it was a biologist who drew on the thinking of the information science and invented a new method of gene sequencing. A paper entitled "High-accuracy fluorescence-generated DNA sequencing methods based on information theory to correct errors" was published online in Nature and Biotechnology. The researchers were from a team led by Professor Huang Yanyi of Peking University.

“This design is very clever,” said Lu Zuhong, a professor at Southeast University. “Perhaps it is a 'small trick' in information science, but it is a breakthrough in thinking in biological research and it works.”

Sequencing accuracy is "king"

Similar to the “faster, higher, and farther” in the sports world, the “gold standard” of the genetic sequencing community is “faster, longer, more correct, and not expensive”.

The famous "Human Genome Project" is based on the 1st generation sequencing technology, which takes more than ten years to measure a complete set of human genome cryptography. With the existing 2 generation sequencing technology, this time can be shortened to half a day.

"Second-generation sequencing technology, also known as high-throughput sequencing technology," Lu Zuhong said, it can complete hundreds of millions of reactions on a single biochip. "Each reaction is one base at a time."

The reaction unit on the biochip is very small. A few square micron chips will contain 1000 single-stranded molecules of DNA to be tested. Under the action of DNA polymerase (promoting a single base polymerization), a single base will be synthesized according to the pairing rules. There is a complementary strand of DNA molecules, one at a time, while releasing fluorescence. Different bases (A, T, C, G) have different fluorescence, and when the difference in fluorescence is detected, it is possible to determine what base is and then read the DNA.

However, the synthesis of 1000 molecules in each unit is difficult to synchronize. "When this molecule is synthesized to 99, that molecule may be synthesized to 101, so that the wavelength of the captured fluorescence will be different, and the reliability will be significantly reduced. "Luo Zuhong said, therefore, the current "read length" of the 2nd generation gene sequencer is currently limited to 200 base pairs (bp). 400 bp can be achieved by DNA two-end sequencing, but it is difficult to further improve. The longer you read, the lower the accuracy of the measured sequence.

In the field of human gene sequencing, this is a pair of disparity figures: 3 billion, 200. The former is the number of base pairs in the human genome, and the latter is the single "read length" of the second-generation gene sequencer with the highest sequencing accuracy (99%). It can be seen that sequencing of the target DNA in units of 200 inevitably causes a large amount of error.

Sequencing technology is advancing on the way to satisfy the "gold standard", and the ECC (Error Correction Coding) sequencing method published by Chinese scholars is correcting and supplementing existing methods.

"Software Derivation" is insufficient for hardware

The biological research method has always been WYSIWYG. This time, the information theory method was introduced. Using redundant information and obtaining accurate conclusions through calculation, Lu Zuhong believes that ECC sequencing is the perfection of the above-mentioned 2 generation sequencing methods. The basic principle is consistent with the second-generation sequencing method. It is commendable that it breaks the mindset and calculates the base information.

For example, to answer "Which house where A, B, and D are living," the previous method was to open the door directly. ECC obtained a set of logical questions by measurement, such as the red house on the right side of the blue house, the left side of the white house; The owner of the house is from Hong Kong, and his house is not on the far left. The person who loves pizza lives next door to the person who loves to drink mineral water... and so on, and finally judges the conclusion through calculation.

"Before one test, now is a group of tests, the same amount of sampling each time, but the sampling method is different, more information is obtained in a single look," Lu Zuhong said, redundant information can be mutually verified, will be "accurate" Efforts to make more "software derivation" to undertake, to compensate for the integrity of the enzyme, signal capture and other hardware unavoidable deficiencies.

The data shows that ECC encoding and decoding strategies have been widely used in other fields such as information communication and storage, and have been proven to effectively detect and correct errors that occur during data transmission or storage. The research team introduced ECC for the first time in sequencing technology, and independently synthesized a low-error-rate fluorescence generating substrate. The combination of the two was combined in a laboratory-built prototype to obtain single-ended sequencing with over 200 bases of read length and no error. Experimental results.

"The combination of BT and IT has become an industry consensus," said Jiang Hui. In early December, Google released a tool called DeepVariant, which said it has the ability to learn artificial intelligence (AI) and sequence 2 generations. Accurate splicing of fragments in the technology to more accurately identify mutations in DNA sequences.

There is still a long way to go for applications

"Gene sequencer is very complex, involving different fields such as electromechanical, biochemical reaction, software computing, etc." As the only leader in China to independently produce genetic sequencers, Jiang Hui, vice president of intelligent science, is deeply touched. She said that sequencing The entry barrier for the production and manufacture of the instrument is very high.

The gene sequencing industry has upstream and downstream chains. "In addition to producing high-precision sequencers, it is also equipped with effective kits and complete sets of solutions," said Jiang Hui, who also has to undertake with downstream application development companies. ability. “Sequencer is like a mobile phone. It should be widely used. It should be able to be loaded and compatible with different 'app's, ie prenatal screening, tumor detection and other application scenarios.”

After nearly 5 years of continuous investment in research and development, the domestic producer of sequencer with the capacity of clinical sequencer is only one of Huada Gene, and its sequencer is coming from the R&D to the market. "After China created its own sequencer, it has been squeezed out by large international companies, such as the implementation of blockade through the supply of reagents, enzymes, etc." Lu Zuhong said that the new technology is good, let the market "discard old and new" The resistance is very large, especially the leading position of foreign companies is difficult to shake.

Under the huge resistance, China's sequencer industry is faltering, but it is still rising. In addition to the Huang Yanyi team's announcement of the prototype, Bohai Gene of China Southern University of Science and Technology has released the world's most accurate three-generation gene sequencer.

Source: Technology Daily

Laundry Detergent

Laundry Detergent,Organic Detergent Liquid,Laundry Liquid Detergent,Detergent Liquid

Wuxi Keni Daily Cosmetics Co.,Ltd , https://www.kenidailycosmetics.com

Posted on