Review of Stochastic rounding: implementation, error analysis and applications

Content of review 1, reviewed on November 08, 2021

This paper presents a survey about Stochastic Rounding (SR) addressing the theory, hardware and software implementation, and, overall, applications. I was excited about reading this paper since I wanted to learn more about SR, but it has let me down. A survey should facilitate understanding the topic and deeply analyze the literature to offer new insights, i.e., new knowledge, not just a summary of individual papers. For instance, the hardware implementations sections should compare the effectiveness of different implementation options, not just enumerate them. In general, I think the drawbacks of using SR are not adequately addressed.
Moreover, the paper is not well written and organized. In many parts, it isn't easy to understand and convoluted. Sometimes, the explanation is obscured by the mathematics, but at the same time, this mathematic is not used correctly to be rigorous. Many references are just mentioned and not deeply analyzed or compared. The latter should be the main contribution of a survey.
Below, there are some specific examples to support my comments above. However, this is not an exhaustive enumeration. Many other changes are likely required before this paper is ready for publication.

Pag.2, line 9. I think this is the perfect spot to explain stagnation by going on with the example.
Pag2. line 19. "…can attenuate the growth of worst-case error bounds" it's misleading because it seems that the worst-case error is going to be reduced when actually it is doubled. Rephrase to clarify that statistically, the error obtained may be improved but not the worst-case one.
Pag2, line 31. "…denote a number system." Although it is understandable, it lacks rigor. I think that it is more precise to say: "denote the subset of numbers represented exactly in a number system."

Fig. 2.1 does not really help to understand better SR. It needs further elaboration.
Pag.3 line 20-29. I guess these are the common features between RN and SR, but that is not clearly said on the paper. Moreover, these properties are not demonstrated, or their demonstrations are all cited.
Section 4 is too late on the paper when floating-point numbers have been used since Section 2. It also provides too many details that are not used later in the paper. For example, Table 4.1 and Table 4.2 do not help understand the paper: the bit widths for each precision and the definition of RN, RZ, and RA would be enough. The same happens with subsection 4.b.
Section 5 begins with Table5.1 without giving any explanation of how SR is implemented in a primary or more familiar way.
I do not see the necessity of introducing Eq. (5.1) and (5.2) since they are identical to Eq. (2.1) and (2.2). Different symbols to represent the same operation.

Pag.7 Line 40 Why has P to be a floating-point number? A fixed-point number may work adequately. Actually, footnote 3 said something similar. So there is no reason to complicate it.
Eq (5.3) seems to express how SR is implemented, but it does not agree with the implementations explained below. So, they should elaborate more to explain how 5.3 justified the implementations in the following sections

I think talking about "precision p" or "precision k" is confusing. I think it is better to refer to the bit-width of the significand or any given number.

In Section 5.c, there is only an enumeration of different implementations without explaining how they work. Only reference [35] is explained but in a very convoluted way.

The same happens with section 5.d. A basic HW implementation should be explained before reviewing all these different implementations. Moreover, a deeper insight into how these units are implemented would be very useful, instead of focusing on numbers sizes, inputs, and outputs. For instance, how does the adder[44] actually round stochastically? Only [47] is explained. It would be best if you did the same with the others.
Pag.10 line 46-49. The explanation is very convoluted. You could just say, "the 30-bit random number is added to the 30 least significant bit of the internal register before they are set to zero". Or even better if you have explained this procedure before, which seems is very similar for all implementations.
Why is subsection 5.d.(II) in the hardware section when it is said (line 14) that the algorithm is valid for software or hardware implementation? Should it be before Section 5.c and 5.d? Besides that, it is unclear if this is the typical way of implementing it or an author's proposal.
Pag. 11 lines 20-22 do not contribute anything good. They are incomplete, and they are repeated later.
Pag 11 line 58 "…the whole significand has to be shifted by p + k places" I think it is a mistake

Pag 12 line 13 These three cases correspond to an effective subtraction, but that is not said.

Pag. 12 line 15 "…and this does not violate the bottom k bits." Should they be shifted by one position?

Pag 15 line 10 I think it is misleading because this is the result if you repeat the computation by N times, with N big enough, and compute the mean of all results.

It is not clear how the data shown in Fig 7.1 and 7.2 are obtained. Is this the result of just one computation?

The review of section 7.b is not very helpful. There is no order in the presentation of the different wok. The references are grouped in paragraphs without any selection criteria, mixing fixed- and floating-point and training and deployment. Moreover, the comments are very superficial, and there is no comparison between them.

Pag 16 line 57 "half precision fixed-point arithmetic" does not exist.

Elaborating a little more in Monte Carlo arithmetic would be helpful.

In Section 7.d, why are truncation and rounding errors separated? Truncation error (RZ) is a specific case of rounding error.

The examples in Section 7.a,d,e show better performance using SR than using RN, but, being a stochastic process, sometimes the results may be much worse than using RN. Is that true? How often does that happen?
In Section 7.h, when talking of [85], clarify if "standard rounding" means truncation, which is very likely in DSP. A comparison of SR with RN would be fairer. Elaborate on how the SR is implemented.

Source

References

Matteo, C., Massimiliano, F., J., H. N., Theo, M., Mantas, M. 2022. Stochastic rounding: implementation, error analysis and applications. Royal Society Open Science.

Pre-publication Review of

Stochastic rounding: implementation, error analysis and applications

Reviewed On November 08, 2021

Submitted to

Reviewed

Actions

Content of review 1, reviewed on November 08, 2021

Source

References