Content of review 1, reviewed on October 25, 2022

This book discusses the fundamental theory of data science, its methods, validity, and scope. The author introduces and addresses, in an appropriate order, key concepts of data science as an inductive methodology in order to guide scientists and researchers who may use data science in their scientific works. The author starts the book by defining the term “inductivism” and then arguing against it.

The introductory chapter defines data science as the scientific practice of gaining information or knowledge from data, and includes ten theses on data science where the first thesis presents data science as the application of machine learning methods. Unfortunately, the author fails to explains to readers--especially those with less experience in data science as well as machine learning--what the machine learning methods are. The remaining nine theses introduce the key concepts of data science, for example, conventional statistics and causality.

The author further presents data science as an inductive framework. Data science is seen as an inductive approach, that is, it should start with facts and rely on inductive inferences (inductivism). He addresses the recent emergence of inductivist paradigms and then presents arguments against it--hypothetico-deductivism is discussed as a broad argument against inductivism. He further discusses the distinction between theoretical and phenomenological science, where knowledge in phenomenological science is causal and aims to predict and manipulate. Chapter 3 presents a case study of successful data science as machine learning, including a considerable number of algorithms (for example, convolutional neural networks) where epistemological questions such as the interpretation of the hierarchy of layers in deep neural networks have proceeded.

The history of variational induction is briefly introduced. The author points out that machine learning relies on variational induction, while enumerative and eliminative induction may occasionally play a role in machine learning. He points out the distinction between these three induction classes, defines them, and provides classic examples for each term.

The book is divided into nine chapters and is well structured. Readers are taken on a journey where they will discover step-by-step methodologies for data-driven research. Judiciously, each key concept of data science is concisely defined and examples and the when, why, and how to use them are provided. The reader will gain a broad knowledge of the key concepts such as causation and evidence by reading chapters 5 to 9.

This book provides good support for researchers such as computer scientists and data scientists of all fields and I fully recommend it. It provides readers with an epistemological perspective and a conceptual framework for data analysis.

Source

    © 2022 the Reviewer (CC BY 4.0).

References

    Pietsch, W. 2022. On the Epistemology of Data Science. Collective Agency and Cooperation in Natural and Artificial Systems: Explanation, Implementation and Simulation.