Published by


Interested in reviewing for this journal?
Editors on Publons
Top reviewers on Publons (Manuscripts reviewed in last 12 months)
Endorsed by


  • Kang et al. developed a valuable tool to visualize differences in effect sizes between studies included in a meta-analysis and understand where these differences stem from. I only have a few minor comments and suggestions: 1. As P and M-values have similar roles and the M-value is a better indicator of whether there is an effect, it seems to me the PM plot shows redundant information. Would there not be more value in displaying the effect size (Log odds ratio) on the y-axis of the PM plot rather than P-value?

    1. To interpret potential causes of heterogeneity, it would help to be able to visualise the covariates (e.g. sex) on the PM plot. A colour code could be used to visualise if/how the studies cluster by covariate (at the moment the colour code is redundant with the grey shading indicating the M<0.1 / 0.10.9 sections).

    2. In the example section, please give P-value rather than P>0.03. How is the P-value for differential effect obtained? Is it based on studies 3 and 4 only or is it based on all studies? If it is based on all studies, as the M-value is too, why is study 3 blue and study 4 red? If it is based on studies 3 and 4 only, I understand the result but the reasoning developed in the Example section could be explained even more clearly.

    3. A few study numbers are hard to read, especially when they appear in blue dots.

    4. What does the 10-6 threshold correspond to in the PM plot?

    5. It would help to know how the studies are ordered in the Forest plot.

    6. Running ./ initially failed because the plotrix package was not installed on my computer (error comes from forestpmplot.R). After installing the package,

    Submitted to
    Reviewed by
    Ongoing discussion
  • Tsaih et al. report the results of experiments aimed at identifying (one of) the causal genes underlying a previously identified locus for glucose and insulin traits on chromosome 1 in Heterogeneous Stock rats.

    They collected expression data for ~18,000 genes including 58 of the 86 genes in the region of interest, in 46 HS rats and the HS founders. They initiate the study by looking for those genes that are differentially expressed between glucose tolerant and glucose intolerant rats, where glucose tolerance is defined as a complex function of measurable traits (e.g. fasting levels, slope of the curve etc.), despite showing in previous work differences in the genetic basis of these measurable traits in this region (Solberg Woods et al. 2010, Solberg Woods et al. 2012; different confidence intervals for the different traits, different founders contributing to different traits, etc.). The authors found that the only gene 1) measured by the expression array, 2) within the region of interest, and 3) differentially expressed is Tpcn2 (nominal p-value 0.05). Evaluation of Tpcn2 by qPCR in a larger, non-independent sample (including the same 46 rats plus another 72) confirmed differential expression of Tpcn2 (p-value 0.05/0.017 depending on the housekeeping gene used for quantification). When tested for correlation with each measurable trait, Tpcn2 expression only correlates with fasting glucose (p < 0.01). The authors do not report on the potential correlation of other genes in the region with these individual traits. The authors also map a strong cis-eQTL for Tpcn2 using haplotype mapping, the method recommended for mapping in Heterogeneous Stocks.

    After looking for candidate SNPs at the QTL and finding no strong candidate, they use a non-synonymous SNP to test for association with the phenotypes and expression levels.

    The authors also report phenotypic differences between Tpcn2 KO and wild type mice (with fasting glucose, pvalue = 0.05; with insulin_AUC p < 0.0001; with QUICKI p=0.036).

    Finally, they report significant association (without giving the p-values) of a few SNPs within Tpcn2 with fasting insulin, and insulin sensitivity in humans, not with fasting glucose.

    This study has the potential to illustrate how one can go towards identifying causal genes at QTLs mapped in multi-parental populations. However, by reporting associations with a single SNP and not making full use of the founders sequences (see below), the authors do not fully acknowledge and exploit the multi-parental characteristics of the HS rats.

    Major points:

    • p20: “We also demonstrate a significant correlation between Tpcn2 expression levels and fasting glucose (r = 0.255, p < 0.01; see figure 5).” p23: “in the HS rats, in which decreased Tpcn2 expression levels are associated with increased, as opposed to decreased, fasting glucose” These two sentences contradict each other. The legend of Figure 5 suggests this confusion arises from confusion between qPCR Ct score and expression level: “expression of Tpcn2” in legend while Ct score is plotted (and Ct score is inversely proportional to expression) So: is Tpcn2 positively or negatively correlated in the HS rats, and does it contrast with the results from the KO mice? Support from the mouse experiment is very important to the claim made by this article. In addition: why does the Ct score vary between 0 and 10 in Figure 5 panels A B C and between 0 and 150 in panel D?

    • statistical evidence for differential expression of Tcpn2 and correlation between its expression and the individual traits is not very strong (nominal p-values of 0.05 and 0.01 respectively). Because this study chose to use expression analysis to investigate candidate variants, the relationship between gene expression and phenotypic variation is critical to support the claim of the authors. It would be worth exploring whether other genes in the region are significantly correlated with each individual trait (e.g. fasting glucose etc), rather than excluding them because they were not differentially expressed between tolerant and intolerant rats (especially because the authors showed that glucose tolerance is likely to have a complex genetic basis in this region).

    • in light of the above, the authors should seek additional evidence for their claim. It could come from running a merge analysis, as they mention.

    • because Tcpn2 has a strong cis-eQTL, correlation between its expression and phenotypic variation may result from linkage between Tcpn2 and a nearby causal gene whose expression affects the phenotypes. This should be addressed or discussed.

    • unless it is further exploited (for example in a merge type of analysis), testing for association between one SNP in the region and the phenotypes/expression levels is a step back from haplotyping mapping (performed in previous articles for the phenotypes, and in this article for the association with Tpcn2 levels)

    Minor points:

    • I suggest the authors report the p-value for each test rather using p < x and using different x in different sections of the paper

    • they should indicate how the subset of 39 SNPs tested for association in humans were chosen. If they were chosen based on rat SNPs: do these positions also vary in Humans (e.g. 1000 Genomes project)?

    • mouse KO: why were the heterozygotes not included in the analysis?

    • the authors need to point more clearly to the results in their previous work that suggest a particular attention should be given to the F344 allele, if they want to justify their choice of the SNP at position chr1:205,715,459 to test for association.

    • the title “Five independent clusters of glucose and insulin are identified in HS rats” is not clear to me

    Published in
    Reviewed by
    Ongoing discussion
  • This note tackles an important, interesting and challenging question, that of obtaining significance thresholds for phenotypes mapped in populations with unequal relatedness. The work reported here is of high quality. The results provide interesting insights into the problem, and raise questions that will require further work to be answered. The simulation study performed for this note shows that, in a population that includes individuals with different levels of relatedness, permutations can provide reliable significance thresholds if the phenotypes are mapped with a statistical model that takes polygenic variation into account. This is demonstrated on phenotypes simulated to have polygenic variation and normally distributed errors. The authors further investigate the robustness of their finding by simulating phenotypes with non normally distributed errors, and by simulating different relatedness patterns and allelic frequencies. The demonstration that permutations can provide reliable thresholds when the phenotype is mapped with a model that takes polygenic variation into account is conceptually interesting, and the deduction that the permutations can be analysed with a simple linear model with no random polygenic term is well done too. However, it seems to me that the main interest of this strategy resides in its applicability to real data. Since the authors investigate the robustness of their finding across different scenarios, I believe they share this view. If so, I reiterate and clarify the main comment of my previous review (1) and give a suggestion (2): (1) Hereafter I will call “larger effect QTLs” QTLs with effect sizes large enough that they are detectable, in this study explaining 2-5% of the phenotypic variation (based on the effect size chosen for the power study). By “larger effect QTLs” I do not mean major QTLs such as the MHC for immune phenotypes. Confounding from “larger effect QTLs” in structured populations is established (Valdar et al., Segura et al. for example). It is not clear though how well mixed models, which primarily account for polygenic variation, correct for the confounding due to correlations with “larger effect QTLs”. Therefore, it is not clear whether permutations will provide reliable significance thresholds for phenotypes with such QTLs. Since phenotypes with “larger effect QTLs” are the type of phenotypes one can hope to identify QTLs for, it is important to consider their case and establish whether the strategy suggested by the authors is appropriate. In my previous review, I requested that the authors, in their study of the type I error rate and power, simulate phenotypes with “larger QTLs” in addition to polygenic variation. I acknowledged that this would mean new simulations (possibly with different definitions of true and false positives, type I error rate and power to accommodate simulation and detection of QTLs on the same chromosome – when confounding is worst). If the authors decided not to undertake this, I suggested they add a comment to warn the reader of the limitations of their study and therefore of the generality of their conclusion. The authors added the following comment in the discussion: “In addition, a major QTL may result in false positives due to uncontrolled confounding between the QTL and a scanning locus. In such a case, incorporating major QTLs as covariates in the model may address this concern (Valdar et al., 2009; Segura et al., 2012)”. This comment addresses the case of a major QTL whose presence would be evident, in which case it could easily be included as a covariate in the model. This was not the point of my review, which might not have been precise enough in terms of effect sizes. I suggest the comment in the discussion is modified and acknowledges that mixed models might not correct for confounding from “larger effect QTLs” well enough for permutations to provide reliable thresholds. (please note: “larger effect QTLs” is not a suggestion of phrasing for the note, but I could not find a better one) (2) In my previous review I also suggested (minor point) that the authors should investigate larger sibships to support their claim that permutations are appropriate under different population structures. The results they provide in their response show that permutations control the type I error rate correctly at 2 out of 3 significance thresholds. An interesting addition to the manuscript is an analysis with different allele frequencies at the founder generation. In that case permutations fail to control the false positive rate appropriately at 2 out of 3 significance thresholds too, and things get worse when the residuals are non-normally distributed. “Fail” here means that the type I error rate is significantly different (p-value < 0.05 for the failures mentioned above) from the expected level. However, it is difficult to get an idea of the magnitude of the problem, because the results are scattered between Table 2S (impact of simulating non normally distributed residuals), Table 1 (DAF), and response to my review (sibship size). I suggest the results from the simplest scenario (polygenic variation + normally distributed errors with sigma in {0.7, 1, 1.5}) are reported in one paragraph and Figure 1, and the results for alternative scenarios (non normally distributed errors, DAF, different population structures) are gathered in another paragraph so that the robustness of the strategy can be easily examined. This suggestion is one of minor reorganization and should not make the note longer. Simplifying Figure 1 by considering only the normally distributed residuals would also make it easier to read (less dense). For comparison purposes, the standard deviation of the residuals (or the proportion of phenotypic variation explained by the polygenic term) used for Table 1 should be indicated. Other points raised in my previous review: - In my previous review (minor point), I also wished to have an idea of the complexity and organization of the relatedness pattern in the population. The authors provided a heat map of the relatedness for 100 animals. However, I could not relate the heat map to the pedigree since there was no information about which animals were siblings/half-sibs/cousins etc., nor could I see the magnitude of the differences in relatedness because no scale was provided. The heat map looks quite homogenously green. - My point on non normally distributed phenotypes and their simulation through non normally distributed residuals is indeed out of the scope of this note so I won’t discuss it further here. New important point: The fact that the significance thresholds obtained by parametric bootstrap and unrestricted permutations are very similar and very different from those obtained with the other two methods (when the phenotypes are not mapped with mixed models) is puzzling, as is the fact that the thresholds obtained by parametric bootstrap are very different from those expected (when the phenotypes are not mapped with mixed models again). I previously wrongly assumed that the model for bootstrapping was a simple sibship model, instead of one based on the full pedigree. While the results were not too surprising to me under this assumption, they are now. I will explain below why these results are surprising to me: The phenotypes simulated by parametric bootstrap to calculate significance thresholds are simulated in the same way as the phenotypes to be analysed, except for their polygenic component. Indeed, the polygenic component of the former is drawn from a multivariate normal distribution with covariance matrix G (from the pedigree), while the polygenic component of the latter, as I understand from the supplemental section “Simulation details”, was simulated by adding two components: one component drawn from a multivariate normal distribution with covariance matrix G, and one component simulated from one every five markers on chromosomes 11 to 20. The use of both components to simulate polygenic variation is unusual (I believe it was not used in Cheng et al, Genetics 2010), and, in light of the failure of parametric bootstrap to provide adequate threholds (see below), might not be without consequences. Therefore it should be explained. Also, it results that the sentence “The phenotype was generated such that polygenic variation approximately accounted for 56%, 46%, or 32% of the total phenotypic variation” (page 2 line 5 main text) is not quite right (this is only for the first component). The inability of parametric bootstrap to control for false positives therefore seems to reside in the modeling of polygenic variation. The authors should attempt to explain the failure of parametric bootstrap (the sentence “The permutation (or bootstrap) test largely dissolves the confounding” is not satisfactory). I suggest the correlation between genotypic similarity (based on the markers used to simulate polygenic QTLs) and the coefficients of G is investigated by the authors, so that if the correlation is poor, the failure of parametric bootstrap can be attributed to poor modeling of the simulated polygenic variation by the coefficients of G (which could be due to the markers not capturing relatedness correctly or the coefficients being inaccurate). New minor points: - The sentence “The take-home message is that if the model is appropriate for a genome-wide scan, we may ignore the random polygenic effect to reduce computation when performing permutation tests to estimate the significance threshold.” is not clear/precise enough: If the phenotype is mapped with a model that appropriately controls for confounding, the permutations can be analysed in a model with no polygenic random term. Since this interesting point is stated only once, it really needs to be clear otherwise the reader might miss the point due to misunderstanding (at least I did on my first study of this note). - The section “Simulation Details” is not structured enough and not precise enough: the fact that the genotypes sets are simulated by gene dropping should come after the choice of the number of chromosomes and markers, and before moving on to the phenotypes. Since it is said that equation (1) is used to generate the phenotypes, it should be indicated whether covariates effects were simulated or not, and indicating when a detectable QTL was simulated (i.e. for the power study but not for the study of the type I error rates) would make things clearer. Finally it should be stressed that a second component to polygenic variation (in addition to the random term with G as covariance matrix) was added by simulating polygenic QTLs. -It should be made clear in the main text that power is misleading unless type I error is controlled. Looking at Figure 1 and Table 1, the reader will pay attention to the values for power no matter what the type I error rate is. - “Computational approximation” part: Why computational? Also, this part is lengthy in its explanation of why the variance components need to be estimated, and the important point that the variance components are estimated in the null model does not stand out. This paragraph would be best in context at the end of the part “Statistical Model”. - Bootstrap part of the supplemental material: It is not clear what model was fitted to estimate the variance components when the phenotypes studied are simulated with a detectable QTL (power study): will the detectable QTL be part of the model or not? The phrase “Under the hypothesis of no QTL” is ambiguous. Moreover, I do not understand the following sentence: “We can generate a bootstrap sample in a similar way when polygenic variation is ignored” - I believe the part ‘pooling procedure’ is much less important/interesting than other points that could be detailed in the supplemental material, and so could be omitted if space was an issue in the supplemental material as well. -page 2 line 17: heritability of the QTL should be changed to proportion of phenotypic variation explained, as discussed with reviewer #2 -page 3 line 1: “different allele (A/a) frequencies at the founder generation: 3/1 in F26 vs 1/3 in F34”. At the founder generation or at F26 and F34? -suppl. material, page 3, Permutation tests: end of paragraph 2, it could be said again that unrestricted permutations were used for this note. -it could be indicated in the title or legend of Figure 1 that it corresponds to simulations with DAF. -Figure 1: relatedness (not) ignored instead of relateness (not) ignored

    Published in
    Reviewed by
    Ongoing discussion