• Thermal Shift: Destabilization could also be an Outcome

    In their report “Monitoring Drug Target Engagement in Cells and Tissues Using the Cellular Thermal Shift Assay” (5 July, p. 84) D. M. Molina et al. describe a novel assay that takes advantage of increased thermostability of proteins upon ligand binding in intact cells. Noteworthy, formation of a protein-ligand complex in a simple solution might result in an increased thermostability for some protein-ligand pairs, but on contrary might have destabilizing effect for others. A thermodynamic model has been described that explains this two experimentally observed possibilities with a preferential ligand binding to the native (more stable) protein confirmation or to unfolded (less stable) protein state, respectively [1]. Adding additional level of complexity in the context of the novel intact cell-based assay developed by Molina et al., ligand binding-induced protein degradation, as described for example for the nuclear receptors RXR [2] and PPARgamma [3], is an often observed phenomena likely constituting a physiologically relevant negative-feedback regulation mechanism aiming to eliminate some receptor proteins immediately after the ligand-induced signalling has been initiated. Last but not least, since Molina et al. recommend the newly developed method as a tool for drug discovery, it should be pointed that in cases when particular protein target needs to be blocked it could be of advantage to discover ligands that are having destabilizeing rather than stabilizing effect on this protein target. The benefit from such selective protein target-destabilizing action is for example demonstrated from ICI-182,780 (Faslodex), a dual action estrogen receptor antagonist and destabilizer used for breast cancer treatment [4].


    [1] P. Cimmperman et al., Biophys J 95, 3222 (2008); [2] D. L. Osburn et al., Mol Cell Biol 21, 4909 (2001); [3] S. Hauser et al., J Biol Chem 275, 18527 (2000); [4] M. Fan et al., Mol Endocrinol 17, 356 (2003).

    Originally published at:

    Published in
    Ongoing discussion
  • Referee 1:

    This paper presents an interesting and important result in the field of protein sequence analysis and classification. Since Chothia presented his result that the majority of proteins fall into no more than 1000 families groups around the world have been trying to collect them in databases of sequence profiles. Recently several studies have reported apparently unbounded growth of protein families as the number of sequences grows. This report reconciles these results with the rather slower growth of the protein family databases. The major result in my view is that the number of single domain families is largely saturated and the number of multi-domain families is effectively unbounded.

    I do have some specific concerns about the methods that I will outline below. If these concerns can be addressed then I think this is an important result.

    Major points

    What the authors describe as single domain architectures may possess more than one domain. Many of the CDART profiles are based on a sequence definition of a protein family that is actually a multi-domain protein. While I agree the presentation as is simple the authors need to make this point clear.

    I found the statement that 1 in every 64 deposited sequences to be a new MDA family to be extremely worrying. More than anything else in this paper this number makes me worry that the authors have something fundamentally wrong with their analysis. Later the authors mention that the majority of MDAs are coming from eukaryotes. My concern is that this fact may be due to a number of confounding factors to the matching of sequences to profiles. I will focus on this issue in the following paragraphs.

    Lack of sensitivity of profiles * False negatives

    Profile methods are great at finding distant matches to proteins. However for very large families such as Ig domains, or P-loop hydrolases there are many cases where for highly related proteins >90% identity a single profile will predict occurrence of a domain in one but not the other. This effect is quite well known and in works such as that by Bornberg-Bauer and Elofsson an approach called refinement is used to try to address these issues (Moore et al TiBS 2008). I suspect that the large number of MDAs might be attributable to annotation artefacts of the profile assignment particularly for large modular proteins with 3+ domains.

    Poor gene predictions

    Many of the eukaryotic genomes available have very poorly predicted gene sets. For example the protein predictions of Vitis vinifera the wine grape are riddled genes incorrectly run together (causing MDAs) and missing exons (causing false variation in MDAs).

    False positives

    Using a blanket E-value of 0.01 for RPS-BLAST will give a large number of false positive hits. With 30,000 profile searches we might statistically expect 300 false positives, but my experience says that this number will be much higher as given any large profile collection some profiles will be pathological and give large numbers of false positives. One could run the entire library on an NR-like randomized sequence database to get a feel of what the false positive rates might be.

    In the supplementary materials section I do not understand the rationale for the SCORE. The E-value makes essentially no contribution to the equation in the printed form and it is dominated by the length of the match. As far as I can tell if you use a 0.01 E-value threshold, the term *eval/100 can never contribute more than 0.0001 to the score. So longer (multi-domain) matches will always win even if ony 1 residue longer. Very strange! This would mean that MDAs might be called as SDAs too often.

    Minor points

    "The number of family's" should be changed to "The number of families" on page 1.

    In figure 2, I cannot differentiate the symbols used for the different plots. Presentation should be reconsidered.

    In the section on Relations to earlier work I would suggest to remove the section on whole sequence matching methods. Almost every analysis has used local search methods such as BLAST and these are not fooled by the A-B- to B-A- case raised by the authors.

    "Dark matter" first appears on line 10 of page 2 without any explanation to the reader. Dark matter is not properly described until page 3.

    I don't understand the final paragraph of the Dark Matter section. I find it difficult to believe that 43% of the pre-1997 sequences are still uncharacterized. Probably this needs to be explained more clearly. Perhaps you mean 43% of the dark matter sequences from pre-1997 have been characterized since that date.

    In table S1 I think the term "number of sequence in PDB" is misleading. Perhaps this should be described as "number of sequences similar to a PDB".

    Submitted to
    Reviewed by
    Ongoing discussion