Lectures on bioinformatics: data analysis, neural networks, and their application in biology and medicine

Almost a year ago, in the summer of 2017, a traditional summer school from the Institute of Bioinformatics was held at the MIPT base. The main theme of the school this year was data mining . Why? The amount of data obtained in biology and medicine is growing at an incredible rate. At the same time, it is physically impossible to find previously unknown things in such a volume of information (and classical algorithms are also difficult), so you have to use statistics and supplement natural intelligence with artificial ones.

This is what the summer school participants were actively doing. This post contains 22 video lecture videos with slides and descriptions for all those interested in the topic of data analysis in bioinformatics. Lectures that can be watched without additional preparation are marked with an asterisk “*” (there are half of them).

one*. Introduction to Bioinformatics (Alexander Predeus, Institute of Bioinformatics)

Video | Slides

The lecture examined the main areas in which bioinformatics work in science and industry, the specifics of bioinformatics and the reasons for its popularity today.

2 *. Introduction to machine learning (Grigoriy Sapunov, Intento)

Video | Slides

The constant increase in the amount of data contributes to the development of more and more complex processes of processing, searching and extracting information. One of the ways to solve such problems is to use artificial intelligence. This lecture is devoted to a brief introduction to the basics of machine learning. Gregory spoke about the general terminology in this area, and also described the types of problems solved by machine learning. In addition, the lecture introduces the main stages of machine learning, types of models and quality metrics of the data.

3 *. Introduction to Deep Learning (Grigori Sapunov, Intento)

Video | Slides

Deep learning (or deep learning) is currently gaining popularity due to the possibility not to prescribe specific algorithms to solve a problem, but to use training for ideas. The development of these methods also contributes to an increase in the computing power of the processors. The lecture is devoted to the basics of neural networks: their types (fully connected neural networks, autoencoders, convolutional, recurrent) and the problems they solve. Separately, Gregory described the current state and trends.

four*. Introduction to oncogenomics and analysis of omix data in oncology (Mikhail Pyatnitsky, VN Orekhovich Research Institute of Biomedical Chemistry)

Video | Slides

The sequencing of the human genome, the study of human genetic variations, the sequencing of human metagenome, the transcriptome analysis of human tissues — all these biological methods applied to “Big Data” gave scientists a large amount of valuable information about what distinguishes humans from other animals. This lecture is devoted to Omsk and their practical use. Separately, Michael touched on the use of this data in oncology.

5. Multiomics in Biology: Technology Integration (Konstantin Okonechnikov, German Cancer Research Center)

Video | Slides

The rapid development of experimental technologies in molecular biology, such as, for example, sequencing, made it possible to combine the study of a large range of functional processes occurring in cells, organs, or even the whole organism. The lecture describes how to correctly combine massive experimental data obtained from genomics, tranxryptomy and epigenomics to establish links between the components of the ongoing biological processes. Illustrative examples of the use of multi-omics are selected from a highly sought-after field of cancer research with a focus on pediatric oncology.

6. Quantitative Genetics: History and Prospects (Yuri Aulchenko, Laboratory of Theoretical and Applied Functional Genomics, FEN, NSU, Group of Methods of Genetic Analysis, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences)

Video | Slides

Quantitative genetics is an exact science, which is based on a small number of key observations and basic models, allowing to give a quantitative description of natural (micro) evolutionary phenomena and predict the results of genetic experiments. She uses a powerful mathematical apparatus. Many modern statistical methods were originally developed to solve the problems of quantitative genetics. The breakthrough development of molecular biological technologies over the past decade has enabled the characterization of hundreds of thousands of living organisms by millions of genomic and other omix parameters. The total number of experiments conducted and already accumulated data is enormous. The actual task of modern quantitative genetics is the development of models that will allow one to describe the inheritance of multilevel phenotypic high dimensions. In his lecture, Yuri gave a brief overview of the history of quantitative genetics and the problems facing this science.

7 *. Sequencing technologies (Cyril Grigoriev, Caribbean Genome Center, University of Puerto Rico)

Video | Slides

The development and evolution of sequencing processes is inextricably linked with the evolution of technological capabilities. The lecture shows the history and development process of sequencing technologies from Sanger to the present day. Separately, Cyril told about the advantages and disadvantages of each of the currently existing methods, as well as the nature of the data obtained and their application in various fields.

8. Transcriptome: practical methods and algorithms used (Alexander Predeus, Institute of Bioinformatics)

Video | Slides

Transcriptome confidently took place in the list of the most popular tasks facing NGS-bioinformatics. Differential analysis of gene expression, clustering of expression data, and interpretation of the data in terms of metabolic and signaling cascades provide a wealth of information about virtually any system. The lecture discusses the best pipelines, the main problem areas in the design of experiments and processing, as well as practical cases of successful application of transcriptome approaches.

9. Analysis of NGS data in medical genetics: definition, abstract and interpretation of genetic variants (Yuri Barbitov, St. Petersburg State University, Alexander Predeus, Institute of Bioinformatics)

Video | Slides

The use of sequencing a new generation has long gone beyond the limits of classical science and has been successfully used in many other areas, including health care. The lecture is devoted to key aspects of the analysis of sequencing data of a new generation in medical genetics. Yuri showed all the way from getting raw reads to making a diagnosis, mentioning the difficulties encountered in determining, annotation and interpretation of genetic variants. Separately, he touched upon common mistakes made at each stage of data processing. In conclusion, a brief review is given of promising areas of research that can improve the accuracy of diagnosis using high-throughput sequencing methods.

10. Practical application of ChIP-Seq and related methods (Alexander Predeus, Institute of Bioinformatics)

Video | Slides

The methods of ChIP-Seq, as well as "genomic footprinting" (ATAC-Seq, FAIRE-Seq, DNase-Seq) are widely used to find mechanisms for the regulation of biological processes, in particular, for transcriptional regulation. The potential space of the studied factors is very multidimensional, however, a selective approach allows obtaining rich information about regulation in the system on the basis of just a few experiments. Using the example of conflicting modern theories, Alexander showed the main difficulties in interpreting regulatory information, and how to consolidate the results obtained.

eleven*. What can be done with iScan data (Tatiana Tatarinova, University of La Verne)

Video | Slides

Illumina manufactures a large number of devices for various needs. Chiping allows you to quickly detect single nucleotide polymorphisms (SNPs) for a large number of samples. The lecture is devoted to the review of these iScan chips and their application in clinical diagnostics.

12. Deep learning in computational biology (Dmitry Fishman, University of Tartu)

Video | Slides

Deep learning is actively used not only to improve machine translation or speech recognition, but also allows you to solve many problems in the field of computational biology. The lecture is devoted to the application of deep learning methods to specific biological examples. Dmitry spoke about what is new in biology and medicine using deep learning, and can we say that machines revolutionize medicine and biology.

13*. The use of machine learning methods to search for potential pathogenic mutations in the human genome (Anna Ershova, MIPT, Research Institute of Physico-Chemical Biology, Moscow State University named after MV Lomonosov, NF Gamaleya State Research Center for Epidemiology and Microbiology)

Video | Slides

The search for pathogenic mutations has become relevant in connection with the sequencing of the human genome. However, to manually solve this problem is simply impossible. The lecture is devoted to how machine learning can help cope with this task.

14*. Immunoinformatics (Vadim Nazarov, HSE, IBC RAS)

Video | Slides

Machine learning has been actively used for quite some time in various spheres of life, but in immunology it has been found recently. In this lecture, Vadim spoke about several examples of the use of machine and in-depth training in immunology, including the task of predicting the binding of MHC-peptide complexes and analyzing the repertoires of T-cell receptors.

15*. Study of host adaptation and development of resistance in HIV and hepatitis C viruses using structural bioinformatics methods (Olga Kalinina, Max Planck Institute for Informatics of the Society)

Video | Slides

The human immunodeficiency virus (HIV) and the hepatitis C virus cause serious illnesses that are difficult to treat. Like many other retro-and RNA viruses, these viruses evolve rapidly and, thus, can adapt to the effects of specific antiviral drugs, as well as to the adaptive immune response from the host organism. In this lecture, Olga showed how by combining sequence analysis of viral proteins with an analysis of their spatial structure, one can make predictions about the development of resistance mechanisms and the interaction of viruses with the host's immune system.

16. Prediction of the effect of mutations (Vasily Ramensky, MIPT)

Video | Slides

Modern sequencing methods provide a huge amount of information on the genome polymorphism, that is, the differences between individual genomes from each other. These differences (variants) arise as a result of mutations during DNA replication and are partially fixed in the population. The prevalence, localization and functional effect of genomic variants vary greatly - from complete mortality to the absence of any effect on the individual phenotype. The lecture deals with modern approaches to predicting the functional effect of variants used in personalized medicine, medical and population genetics.

17. Multiscale modeling and design of biological molecules (Nikolai Dokholyan, University of North Carolina at Chapel Hill)

Video

The life of biological molecules covers the scales of time and length corresponding to the scales of time and length from atomic to cellular. Consequently, new approaches to molecular modeling should be inherently multiscale. In his lecture, Nicholas described several methodologies developed in his laboratory: an algorithm for fast discrete molecular dynamic modeling, protein design and structural refinement tools. Using these methodologies, several applications can be described that shed light on the molecular etiology of cystic fibrosis and find new pharmaceutical strategies to combat this disease, simulate the structure of three-dimensional RNA, and develop new approaches to the control of proteins in living cells and organisms.

18. Homologous protein folding (Pavel Yakovlev, BIOCAD)

Video

In modern structural biology, there are a number of computational methods that make it possible to characterize biological molecules with high confidence, their similarities and differences, methods of interaction, and functions. For the construction of such calculations, the spatial structure of the protein always acts as an input parameter; however, obtaining it may be difficult, despite half a century of progress in the field of crystallography. The lecture is devoted to solving this problem with the help of homologous modeling of protein structures - the construction of three-dimensional structures from similar fragments. For example, the variable domains of antibodies, proteins with a unique structural diversity of variable loops, are considered.

19. How to stop meditating and start modeling (Arthur Zalevsky, Moscow State University. MV Lomonosov)

Video | Slides

A large amount of data obtained by the NGS method allows not only to derive biological conclusions from this, but also to use them for modeling. The constructed models make it possible to better understand biological data and to obtain even more biological meaning from the experiment. The lecture is devoted to modeling and the initial stages of this process.

20*. Standing on the shoulders of giants, or why consortiums are needed (Herman Demidov, Center for Genomic Regulation, Universitat Pompeu Fabra)

Video | Slides

Over the past decades, the development of biology has been associated with the accumulation of arrays of data so huge that individual research groups could no longer cope with their bioinformatics analysis. In order to solve this problem, consortia of dozens of laboratories began to be created, such as the Human Genome Project, 1000GP, ENCODE and others. Thanks to such collaborations, in the public domain there is data of various types, obtained using various technologies. As a result, the comparison of new experimental data with existing ones has become a standard part of any research. Consortia produce not only data, but also bioinformatic pipelines for their processing, and standard formats, and quality assessment procedures. This lecture discusses how consortia work, how to use the results of their work and what to do if you suddenly find yourself a member of such a consortium and you need to process terabytes of data, and then share the results with all the other participants.

21 *. Overview of bioinformatics companies in Russia and the world (Andrey Afanasyev, yRisk)

Video | Slides

In the modern world, science and business are more and more intertwined. This trend and the field of bioinformatics have not bypassed. Andrew spoke about the expectations and reality of the market, success stories and stories of failures, about people and places related to bioinformatics.

22. Advanced variation analysis (SNV, InDel, SV) using the NGB genomic browser (Gennady Zakharov, EPAM, IP Pavlov Institute of Physiology, RAS)

Video | Slides

The lecture covers the process of visual analysis of simple (SNV, InDel) and structural variations in the genomic browser. All examples are demonstrated using the NGB browser, which meets most requirements and recommendations for analyzing structural variations, including various types of visualizations and obtaining annotations from external databases. The lecture on real examples shows scenarios of validation and analysis of the consequences of simple and structural variations.

Afterword

For those who ~~do not understand, he~~ wants to develop in the field of bioinformatics - until May 27, the acceptance of applications for summer school in this year 2018 is still open. The school itself will be held July 23–28 near St. Petersburg. There is a chance to jump into the last car and proudly show everyone a post with an overview of next year’s lectures, saying that they saw it personally.

In 2017, the school was conducted with the support of our regular partners - the companies JetBrains , BIOCAD and EPAM Systems , for which many thanks to them.

By the way, the post with the lectures before last schools .

All bioinformatics!

Source: https://habr.com/ru/post/412453/

All Articles

Lectures on bioinformatics: data analysis, neural networks, and their application in biology and medicine

Afterword

More articles: