Review of Exploiting Social Media and Tagging for Social Book Search: Simple Query Methods for Retrieval Optimization

Content of review 1, reviewed on April 22, 2020

Review of 'Exploiting Social Media and Tagging for Social Book Search: Simple Query Methods for Retrieval Optimization'

Summary and Contributions

The paper deals with the problem of book retrieval in the context of Social Book Search (SBS) aiming to improve query processing. It attempts to produce more effective queries by exploiting Named Entity Recognition (NER) and Parts-of-Speech (POS) tagging to extract relevant topics (queries) that the authors hypothesized to bring more relevant results than using the original topic set of INEX 2014/15 SBS dataset.

Strengths and Impact

The research aims to support users in book searching by exploiting the professional and social metadata of Amazon/LibraryThing (A/LT) dataset, topic sets and relevance judgments. The research attempts to enhance search queries by exploiting NER and POS tagging for extracting relevant topics to bring more relevant search results and enhance the search experience of users. The research has implications for those working in Information Retrieval, Book Search, Information Science and libraries. Overall the paper is sufficient in conveying its message of using simple query methods for retrieval optimization. However, I suggest the following major and minor revisions in further improving its content, structure and presentation.

Major Points/Revisions/Suggestions

The introduction should have been written in the context of Social Book Search (SBS). The section is good at opening but gives unnecessary details to Amazon AWS, mobile phones, Z39.50 protocol, etc., which introduce a topic drift from the main them of the paper. The introduction, as the title suggests, should start from SBS, i.e., the third paragraph, and with reference to the latest literature especially from Koolen et al., rather than citing old papers dating back to 2012. Because the paper appeared in 2017, it would have been better to write this section with citation to the following papers. It should be noted that some of these papers have been cited in rest of the paper, the need is to bring them here.

      1. Koolen M. et al. (2016) Overview of the CLEF 2016 Social Book Search Lab. In: Fuhr N. et al. (eds) Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2016. Lecture Notes in Computer Science, vol 9822. Springer, Cham
      2. Koolen M. et al. (2015) Overview of the CLEF 2015 Social Book Search Lab. In: Mothe J. et al. (eds) Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2015. Lecture Notes in Computer Science, vol 9283. Springer, Cham
      3. Koolen, M., Bogers, T., Kazai, G., Jaap, K., & Preminger, M. (2014). Overview of the INEX 2014 Social Book Search Track. In L. Cappellato, N. Ferro, M. Halvey, & W. Kraaij (Eds.), Proceedings of the CLEF2014 Working Notes (Vol. 1180, pp. 462-479). CEUR Workshop Proceedings.

The introduction is unable to put the problem into context with neither clear problem statement and research question and nor proper justification. The related work is also passively written and is unable to make the case for the solution proposed. The SBS research has been mostly conducted under INEX/CLEF SBS Labs/Tracks. Although, this paper appeared in 2017, it missed several important papers from these proceedings, that if included, could improve this work further. It would have also enabled them to accurately compare their work with the state of the art using suitable evaluation metrics.
The paragraph next to Fig. 2 gives the impression if the authors themselves have gathered the descriptions from Amazon and LibraryThing but actually they are already part of the A/LT dataset. Therefore, it should be rephrased and included into the second paragraph of Section 3. Methodology.
Fig.1 was introduced in first paragraph of Section 3 with no explanation about how they carried out the required experiments, although this detail is available from fourth paragraph. First they should discuss Fig. 2 and Fig. 3 and then Fig.1 by re-numbering these three figures as Fig. 2 -> Fig. 1, Fig. 3 -> Fig. 2, and Fig. 1 -> Fig.3. Finally, Fig.4 should be discussed in relation to Fig. 3 (new, originally Fig. 1).
Fig. 3 (Fig. 2, when renumbered by following previous comment) needs significant attention. It should be re-created from the screenshot of a single XML document of the A/LT collection. In an original A/LT document, the appears at the very end but before . All the tags appear between and tags, then how it is possible that tags appear after . This is a serious mistake, it can be seen that screenshots of two different books were combined to represent a single book. The indicates that the book is about flowers, which is also evident from its , whereas the shows that the book is about Christ's life.
The authors mentioned evaluating results by MAP but in SBS research, nDCG@10 has been used as the official, or more authentic evaluation metric for comparision with other existing runs (experiments). Other metrics including P@10, MRR, and R@1000 were also used. The authors should have evaluated their work on all these metrics, where nDCG@10 could be used for comparison with the state of the art. These metrics are already part of the TREC_EVAL library, I wonder why only MAP was used as the authoritative metric.
The authors have compared their results on the basis of 2014 and 2015 topic sets and relevance judgments. However, Section 4 says that their results using 2015 topic set, outperforms the existing runs on 2014 topic set, which seems wrong from the perspective of a fair comparison. A fair comparison is possible only if it is made on the same topic set. Also, the authors claim that they produced better results in terms of MAP and rank at first position among the published runs on 2015 topic set, whereas the published runs of both years have produced far better MAP scores than this paper (See the following two papers). Therefore, the proposed approach appears less promising than the published ones. Also, being nDCG@10 as the authoritative evaluation metric in SBS literature, comparing results using MAP seems pointless.
```
       1. Koolen M. et al. (2015) Overview of the CLEF 2015 Social Book Search Lab. In: Mothe J. et al. (eds) Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2015. Lecture Notes in Computer Science, vol 9283. Springer, Cham
       2. Koolen, M., Bogers, T., Kazai, G., Jaap, K., & Preminger, M. (2014). Overview of the INEX 2014 Social Book Search Track. In L. Cappellato, N. Ferro, M. Halvey, & W. Kraaij (Eds.), Proceedings of the CLEF2014 Working Notes (Vol. 1180, pp. 462-479). CEUR Workshop Proceedings.
```
The authors should discuss the limitations of the proposed work and discuss them as the opportunities of future work.

Minor Points/Suggestions/Revisions

Table 1 should be revised in terms of content and structure to justify its purpose and convey its meaning and message. It should be placed under Section 4.
Okapi BM25 should be given proper citation. The authors need to justify why they selected BM25 as a scoring function.
Fig. 2 should be captioned as "An Example Topic from 2014 Social Book Search Topic Set."
Fig. 3 should be captioned as "A Sample XML document (book description) from A/LT book corpus."
A footnote/end note should be provided for Lemur Project/Indri in Section 3. Methodology. References 19 and 20 can be used for this purpose.
References 2, 3, 10, 20, 21, 22 and 25 should be made footnotes or end notes.
Reference 13 has missing details.
References 5, 6, 7, 10, 11, 14, 15, 16 and 23 needs DOI or URL.
Reference 9 reports the publication year 2015 but gives reference to an old paper published in 2012.

Post-publication Review of

Exploiting Social Media and Tagging for Social Book Search: Simple Query Methods for Retrieval Optimization

Reviewed On April 22, 2020

Reviewed by

Actions

Content of review 1, reviewed on April 22, 2020