Review of Why did an effective Dutch complex psycho-social intervention for people with dementia not work in the German healthcare context? Lessons learnt from a process evaluation alongside a multicentre RCT

Content of review 1, reviewed on February 17, 2011

THE STUDY I was confused by the description of which variables were the primary outcome or outcomes and use of baseline/pre-treatment and pre-treatment baseline terms in the paper. These descriptions I am sure can be straightforwardly amended. I would also suggest not correlating baseline change with baseline scores as was done I suspect in Table 1 on page 9 (see my comments below). RESULTS & CONCLUSIONS Just needs a bit of tightening up on the descriptions of variables in the results (see my comments later). GENERAL COMMENTS The authors use ANCOVA and regression analyses to look at the association between participant characteristics, including baseline data, and various primary outcomes. The descriptions of the results and particularly the explicit and consistent naming of which outcomes of interest and covariates are used in each analysis would significantly enhance understanding. I also warn against correlating baseline scores with change from baseline and suggest some alternatives. Pages 6 and 9. Primary outcome is used to refer to two change scores (should this be the plural primary outcomeS?) I would rewrite this as "The outcomes of interest were the differences in in IDDD and PRPP scores after intervention compared to those at baseline". I would also rewrite the last sentence on page 9 something along the lines of "There was no difference in experimental and control group PRPP or IDDD mean change from baseline after controlling for baseline values of patient mood (CSDD) and daily functioning (PRPP;IDDD) using an ANCOVA" ie explicitly mentioning the name of the outcome measures. Page 6. I think the last sentence on page 6 going to page 7 can be rewritten in a clearer way. Something along the lines of "An ANCOVA was used to investigate mean changes fron baseline in IDDD and PRPP between the COTiD and control groups controlling for ??,?? and ??. Percentage variance explained in post-treatment IDDD and PRPP scores was assessed using multiple regression." Page 9. I assume the correlations in Table 1 and the ANCOVA and regression results are based on a pooled German and Dutch sample? If so a rationale for this pooling reassuring us about consistency of results in the two samples that have been pooled would be welcome. Tables should state if the sample is pooled or of a particular nationality. Page 9. An explanation should be given about why a nonparametric test (Mann-Whitney) was used to compare groups. In particular which t-test assumptions were violated? I suspect the distributions of healthcare resources may have been negatively skewed in the two groups. Page 9. The variable correlating with patients mood to give a minor correlation of 0.21 is not mentioned - is this one of the primary outcomes? Page 10. Clinically relevant change of 20% improvement is described on page 10 as a change from baseline (post-treatment minus baseline) of 'approximately 1 point on item level'. I would instead mention the exact change here (of 0.8) to complement the footnote to Table 2 on page 11. Page 11: Table 2.'Overall mean' can replace 'Sum' and sample sizes and SDs should be given for this overall mean. T0-T1 differences that are above the clinical threshold of 20% improvement can be highlighted in bold font. Page 16 & Table 6. I would add a range and skewness coefficient to describe each healthcare resource in each group in Table 6. This will also enable the reader to see what the distributions of each resource were. I suspect most people used a relatively small numbers of resources but a few may have needed more. If this is the case there will be high values of negative skewness. A comment on any asymmetry of distributions in healthcare resources should be stated in the results part of section 3 on page 16 and interpreted. References ========= Hair, J.F., Tatham, R.L., Anderson, R.E. and Black W. (1998). Multivariate data analysis. (Fifth Ed.) Prentice-Hall:London. Harris, R.J. (2001). A primer of multivariate statistics. Lawrence Erlbaum:Mahwah,NJ,USA. Jin, P. (1992). Toward a reconceptualization of the law of initial value. Psychological Bulletin 111(1) 176-184. Myrtek, M. and Foerster, F. (1986). The law of initial value: a rare exception. Biological Psychology 22 227-237. Tu, Y-K. and Gilthorpe, M. S. (2007). Revisiting the relation between change and initial value: A review and evaluation. Statistics in Medicine 26 443-457. (A free pdf copy of this paper is available at http://dionysus.psych.wisc.edu/lit/Topics/Statistics/RegressionToMean/tu_RegressionToTheMean_SiM2007.pdf).

Source

Content of review 2, reviewed on April 14, 2011

THE STUDY I would remove the correlations with changes to baseline in Table 1 and their description in the text on pages 9, 13 and 14 as they are based on too restrictive a range of healthy low baseline IDDD/PRPP Germans to inform us about correlations with change from baseline. GENERAL COMMENTS The authors have, to their credit, clarified many points including why they used the Mann-Whitney test, the samples upon which the results, particularly those in Table 1, are based, defining the change scores as changes to baseline and have a more meaningfully concise account of how they performed the ANCOVAs as well as highlighting possible confounding differences between the Dutch and German studies which are very important for enabling comparisons of results. Skewnesses and ranges have also been added to Table 5 (page 15). The link between a 20% improvement and a pre-post treatment difference of 0.8 is now stated in the text on page 15 and referenced, with the reference presumably able to explain how this link was arrived at. The main result (Table 6 and page 18), however, is actually a very simple one – namely that for most IDDD items, at least, the Dutch sample have higher baseline scores. Table 6 (page 16) makes the further key discussive point (page 18) that for very high baseline scores where there is more room for improvement, as occurs for certain items in the Dutch sample but not in the German sample, an improvement is made at 6 weeks. The authors hint at an important point (page 17) that this paper does NOT preclude similar changes over time in both Dutch and German individuals as there appear to be very few, if any, German individuals with such high baseline scores as the Dutch and consequent potential to improve. We can, therefore, only conclude that the Dutch have higher baseline scores than the Germans on most items. Due to the higher baselines between the studies I would, therefore, not compare changes from baseline between the Dutch and German samples or correlate with change to baseline scores in the German study. I think dropping these analyses would not change the conclusion that there are differences in „health‟ between different nationalities with German people not benefitting on any measure (by showing clinically meaningful change) from treatment because they are so „healthy‟ on all the measures (Table 6). I, therefore, raise a query about the usefulness of the correlations in Table 1 (pages 9 to 10) and associated results on page 9 and also the results described on pages 13 to 14 and make a few remaining suggestions for additions to the description and results of the ANCOVA analyses mentioned on page 10. My conclusion from the above is that, providing there is no likelihood of confounding variables between the German and Dutch studies, the result of this paper is in saying that at baseline the Dutch tend to be „less healthy‟ than comparable German individuals (page 18). Pages 9, 10, 13 and 14. Given the low baseline scores in the German sample, as given in Table 6, it follows that the low correlations described on page 9 and in Table 1 and pages 13 and 14 are not informative. Low correlations would be expected due to the limited, low baseline range of PRPP and IDDD scores which leads to a restriction of range where correlations tend to be underestimated. The changes to baseline are so small due to low baseline (Table 6) as to not have a clinical relevance (see Table 6) in IDDD, at least. A larger range of baseline scores would be needed to assess correlations with change to baseline score which are sadly unavailable (page 17). As I mentioned in my previous review the two largest correlations (between PRPP/IDDD baseline and their baseline change) could, in any case, be artifactual caused by measurement error indicating regression to the mean. There are methods for adjusting correlations for restriction of range (e.g. Chan and Chan, 2004) but you need to know the variance of one of the variables in the unrestricted range. This approach would also assume that there were Germans with baselines comparable to the Dutch which may be questionable given the authors‟ unsuccessful attempt to find Germans with higher baselines (page 17). Page 10. The authors do correctly (following e.g. Senn, 2006) remove baseline PRPP and IDDD as covariates, as well as mood, in the ANCOVAs mentioned on page 10 to compare change between the COTiD and control groups. I assume, as Senn and others advocate, the dependent variables in the ANCOVAs described are the follow-up scores at week 6 – this could be stated on page 10. This analysis does, it seems to me, address the additional point that there are no differences in change scores when mood scores are taken into consideration. Page 10. I wonder about the magnitude of the (COTiD minus control group) difference between the average IDDD and PRPP 6 week follow-up scores, adjusted for baseline, as tested by the ANCOVAs on page 10 in the German sample. A sentence or two could be added to reassure that these baseline adjusted average IDDD and COTiD group differences are small given they are not statistically significant. These covariate adjusted group means can be outputted from ANCOVA software e.g. by asking for estimated marginal means in the ANCOVA option in SPSS. Pages 13 and 14. Are the results described relating to the association between COTiD and primary outcome for the German sample only? I suspect they are but this should be stated in the section heading. Page 16. I think, to be precise, we could say in the footnote to Table 6 that T0 represents IDDD score on entry to the study and T1 is the score at a point 5 weeks after entry time (T0). This emphasises the fact that T0 and T1 are points in time rather than periods of time. References Chan W, Chan DW-L (2004). Bootstrap standard error and confidence intervals for the correlation corrected for range restriction: a simulation study. Psychological methods 9(3) 369-385. Senn, S. (2006). Change from baseline and analysis of covariance revisited. Statistics in Medicine 25(24) 4334-44.

Source

Content of review 3, reviewed on April 27, 2011

THE STUDY I would not suggest correlating a baseline score with change from baseline in that same score for reasons I mention later. GENERAL COMMENTS The authors have further improved this paper, in particular, by acknowledging the restriction of range in the German sample in the discussion (page 18) and indicating that baseline score was used as a covariate in the ANCOVAs (page 9). It is good that the discussion emphasises the differences in German and Dutch baseline measures and there is more of a mention of differences in variance. I make an additional couple of points below based upon the relative sizes of the IDDD item variances at baseline and after treatment. The authors may, consequently, wish to comment upon these in their paper. The authors refer to variances on page 18. In this spirit the variances of the IDDD high responsiveness items at T0 and T1 in Table 6 (page 16) appear similar in most cases in both the Dutch and German samples. This is particularly interesting looking at the Dutch COTiD sample in that we have a change (improvement) in the means of highly responsive daily living items but, apart from the possible exception of „Making tea or coffee‟, not in their variances. I would have thought that if high scorers were improving (downwards) more than lower scorers the variance of scores would reduce at T1 compared to T0. In other words there is no evidence of a floor effect where people with low baseline item scores stay low and are at least partly caught up at T1 by higher scorers responding to the treatment thus reducing the overall item variance at T1 compared to T0. A difference in T0 and T1 variance would be expected if the rate of change in an item between T0 and T1 was related to T0 score (baseline) (Myrtek and Foerster, 1986). Myrtek and Foerster and others do mention that it is misleading to make inferences from correlating the baseline with change from baseline as is done in Table 1 (page 10) for PRPP and IDDD items. This misleading nature of the baseline, change from baseline correlation is also supported by similar variances in IDDD items at T0 and T1 in the German sample (Table 6 Page 16) which suggest the high correlation between IDDD baseline and change from baseline found in Table 1 (page 10), in the German combined sample, is artifactual as I mentioned in previous reviews because a relationship between baseline and change from baseline should result in different T0 and T1 variances. In any case, the means in Table 6 for the German samples suggest there is little change between T0 and T1 in any item so one would not expect to see a relationship between change score and baseline as there is so little change observed. I would, therefore, prefer to see the relationship between T0 and T1- T0 change described in terms of differences in variance rather than the two IDDD and PRPP baseline, difference correlations given in Table 1 (page 10). The suggestion, from the similar T0 and T1 variances in Table 6, is that there is an improvement with treatment in IDDD items which had high responses in the Dutch COTiD sample but that this is not related to baseline. To illustrate how correlating baseline with change from baseline can be misleading I generated two sets of 10 standard Normal random variables (ie with the same mean of zero and variance of one) and correlated one of these columns with the difference between these columns and obtained a Pearson correlation of -0.38 (similar to the correlations obtained in Table 1 for PRPP and IDDD baseline with change from baseline). As I mentioned previously the key point is the difference in the nationality baseline scores rather than the relationship between baseline and change scores so the above does not effect the discussion. Reference Myrtek, M. and Foerster, F. (1986). The law of initial value: a rare exception. Biological Psychology 22 227-237.

Pre-publication Review of

Why did an effective Dutch complex psycho-social intervention for people with dementia not work in the German healthcare context? Lessons learnt from a process evaluation alongside a multicentre RCT

Reviewed On February 17, 2011 , April 14, 2011 , and April 27, 2011

Submitted to

Reviewed by

Actions

Content of review 1, reviewed on February 17, 2011

Source

Content of review 2, reviewed on April 14, 2011

Source

Content of review 3, reviewed on April 27, 2011

Source