Imagine a forest, with trees forming a canopy and herbs growing in the understory. Amount of light passing through the canopy is limited, and herbs may need to compete for light by growing taller. One may ask: is the height of plants in the understory related to the amount of light passing through the canopy? Such a question is an example of a common task in community ecology: relating the attributes of species in the community (e.g. plant height) to the attributes of sites at which the community occurs (e.g. the amount of light). One way is to use community weighted mean (CWM) approach, in which values of species attributes are averaged across species in the community occurring at the site (weighted or not by species abundance), and these means are related to site attributes, e.g. by correlation or regression. In the forest example above, this equals to averaging heights of species present at each site and relating this mean to the amount of light passing through the canopy (see the figure above).
Although CWM approach was and still is widely used in a broad range of ecological disciplines, just relatively recently it became clear that it suffers from a statistical problem. If tested by standard parametric (or analogous permutation) tests, CWM approach returns results that tend to be overly optimistic, i.e. more significant than is warranted by data. Sounds like not good news: scientific literature is flooded by reports of species-site relationships from which some are just mere artefacts. However, are all studies affected, and how serious the problem is for those which are?
In this study, I suggest that whether the results based on CWM approach are overly optimistic very much depends on the exact formulation of the tested hypothesis. Species attributes are related to site attributes via the species composition of a community at each site. The relationship involves testing two questions at the same time: is the species composition related to site attributes, and is the species composition related to species attributes? If one of the relationships is known or can be assumed to exist, only one of the two needs to be tested. Three categories of hypotheses tested by CWM approach can be therefore distinguished, and while two return overly optimistic results if tested in a standard way, one does not. This distinction is essential for evaluation of how much the results of published studies are trustable. I also show that if the study falls into the category where overly optimistic results should be expected, it depends on the data properties how optimistic the results will be (mainly on how different is the species composition of the communities from each other).
When using results of the CWM approach studies, one needs to pay attention to what hypothesis the authors were testing and which test they used for it, and apply guidelines introduced here to get informed estimate how reliable the results of the study are.
Linking elsewhere: see related post Five years with community weighted mean at blog.davidzeleny.net.