Content of review 1, reviewed on October 11, 2023

The paper by Xie et al. delves into the crucial question of how far fine-tuned LLM can be useful in predicting chemical properties. This is a highly significant topic, and the experiments have been designed and executed with great care. The manuscript is well-written, and the insights provided from the study are going to be valuable for the scientific community. However, I recommend addressing these two concerns before publishing this work:

  • It would be helpful if the authors could include simpler baselines in their analyses and comparisons. I am curious to know the level of complexity required by the model to achieve high accuracy for the tasks studied in this manuscrip. For instance, how well would a basic machine learning classifier based on molecular weight perform? This will help in revealing the significance of the fine-tuned LLM in predicting properties.

  • When creating chemical descriptors, it is important to consider physical symmetries such as translation, rotation, and permutation invariances, as well as uniqueness. While SMILES strings respect translation and rotation symmetries, the question of uniqueness arises. Can the authors generate all possible SMILES strings from a unique molecule and then analyze the predictions of a fine-tuned LLM for different SMILES representations of that molecule?

Source

    © 2023 the Reviewer.

Content of review 2, reviewed on November 15, 2023

The authors addressed all my concerns. The new discussion on canonicalization of SMILES and performance fine-tuning of GPT was particularly enjoyable. The paper can be accepted for publication without further revision.

Source

    © 2023 the Reviewer.

References

    Zikai, X., Xenophon, E., H., O. O., Alessandro, T., I., C. A., Linjiang, C. 2024. Fine-tuning GPT-3 for machine learning electronic and functional properties of organic molecules. Chemical Science.