- Research
- Open access
- Published:
On the use of prior distributions in bayesian inference applied to Ecology: an ecological example using binomial proportions in exotic plants, Central Chile
Revista Chilena de Historia Natural volume 96, Article number: 6 (2023)
Abstract
Background
The use of Bayesian inference (BI) is a common methodology for data analysis in Ecology and Evolution. This statistical approach is particularly useful in cases which information is scarce, because allows formalizing sources of information, other than sampling data (priors), obtained from technical reports, expert opinions and beliefs. Recent reviews detected that most ecological studies use non-informative priors without any justification, ignoring other sources of independent information available to construct informative priors. In this study, we examined how the selection of informative or non-informative priors, affects hypothesis testing. We compared the proportion of occupied sites (occupancy) in four exotic plant species living in two contrasting environments in Central Chile. Given that occupancy is related to binomial proportions, we developed a statistical procedure based on beta distribution, to compare occupancies using Bayes factor.
Results
Bayes factor obtained from different non-informative priors led to similar inferences relative to H0. The use of informative prior drastically changed our decisions about H0 in three of four plant species.
Conclusions
The selection of priors is critical because they determine hypothesis testing. The use of independent information will improve our inferences, which is precisely the strength of BI. We hypothesize that the reluctance to use informative priors in ecological studies reflects extreme positivism and the use of non-informative priors is a strategy to avoid subjectivity; by doing that, ecologists depart from the philosophy of BI which accepts that the subjective knowledge is a valid, and sometimes the only alternative, to know the world.
Background
Bayesian Inference (BI) is increasingly recognized as a useful statistical method in Ecology and Evolution [9, 16]. BI differs from Frequentist Inference (FI) because BI integrates sampling data with independent information obtained from other sources. BI is particularly appropriate for studies of Conservation Biology and Restoration Ecology because of data deficiencies that currently exist in endangered species and ecosystems [18]; in these cases, the use of statistical methods which considers other sources of information, is strongly recommended to anticipate proper conservation decisions [2].
Two kinds of functions are required to conduct BI: (i) the likelihood function which represent sampling data and (ii) a prior distribution which represent independent information. In BI, the parameters of interest are always random variables while and data condition their probabilities; in FI, parameters are fixed while data are random variable. If we multiply the likelihood function by the prior distribution, we can obtain the posterior distribution, i.e. \(f\left(\theta |data\right)\), which represents an updated knowledge of the probabilistic properties of the parameters.
Another difference between BI and FI is the process of hypothesis testing. BI enables the calculation of the probability of different hypotheses \({H}_{i}\), given the available data, \(P\left({H}_{i}|data\right)\), in contrast to FI which evaluates the probability of data given the hypotheses \(P(data|{ H}_{i})\) [9, 28]. BI calculates the probabilities to decide in favor or against hypotheses [17]. Using the Bayes Factor (BF), we can decide in favor or against a particular hypothesis. BI is conceptually supported by the Bayes theorem, which is the basis of conditional probability theory [20].
The use of independent information constitutes the foundation of BI [27]; prior distributions resume independent information and its investigation is a critical part of BI research [27]. Prior distributions can be classified as informative, when they reflect some knowledge about the parameters of interest, which properly analyzed, allows to obtain some estimations of the mean/median/mode as well as standard deviation of the parameters of interest [3]. On the other hand, non-informative priors represent situations in which we have no previous knowledge about the parameters of interest. Among a suite of non-informative prior distributions, the uniform distribution is one of the most utilized; this is an extreme case, where the absence of knowledge of the parameter of interest, gives equal probabilities to the whole range of the parameters [3, 15, 19].
A recent review about the use BI in Ecology reported that prior distributions are mostly utilized with no further justifications [3]. Even more, only 9% of studies published in five influential ecological journals between 2014 and 2018 (n = 187), used informative priors [19]. This situation is noteworthy because the process of hypothesis testing can change drastically depending on the selection of prior distributions [12]. The reasons to invoke the use of non-informative priors in Ecology is the absence of independent information. We sustain that independent information existing in technical reports as well as by experts opinion should be more utilized for the construction of informative priors [21].
One way to evaluate the impact of prior selection in BI is throughout sensitivity analysis. Basically, we can compare the impact of the use of different priors on the inference process [7]. The Bayes factor (BF) provides a tool to sensitivity analysis as it allows to compare results obtained from different priors [30]. Briefly, BF is the ratio of the probability of one hypothesis P(Ho|data) to the probability of another hypothesis P(H1|data). If BF is higher than 1, then evidence supports Ho; if BF is lower than 1, then evidence supports H1. BF constitutes a proper estimate to test whether different priors, impacts (or not) our decisions during hypothesis testing [19].
In this study, we aimed to compare BF contrasting informative vs. non-informative priors. Our hypothesis is that we expect different decisions in hypothesis testing if we use informative or non-informative priors. As an ecological example, we compared occupancy (i.e. the fraction of occupied sites in a region) in four exotic plant species living in two contrasting environments: Coast and Central valley, Central Chile. Given that occupancy is basically a binomial proportion [8, 29],we can use beta distribution to provide a methodology to construct the BF which allowed us to compare species occupancy between Coast and Central valley.
Methods
Conceptual background
Let us consider two vectors of independent random variables \({X}_{i}\) and \({Y}_{j}\) such that
Let us define the joint vector of parameters \(\underline\theta=\left(\theta_1,\theta_2\right)\varepsilon{\lbrack0,1\rbrack}^2\). If \(\underline{X}\) and \(\underline{Y}\) are independent, then the likelihood function is given by.
Under the assumption that \({\theta }_{1}\) and \({\theta }_{2}\) are independent, the prior bivariate distribution is the product:
In our case, we assume that the prior marginal distributions of \({\theta }_{k}\), with \(k = 1, 2\), are given by the beta distributions:
These distributions represent informative beta priors because the hyperparameters can be elicited from other sources of information; for comparison, we can use non-informative priors which are conjugates of the beta distribution: Uniform (Beta 1,1), Jeffrey (Beta 0.5, 0.5) y Haldane (Beta 0,0).
The posterior bivariate distribution is given by the product of two univariate beta distributions:
where \(Beta(\theta |a,b)\) is the univariate density function of a beta distribution with parameters \(a\) and \(b\).
For the purposes of our study, we proposed the following statistical hypotheses:
Then the Bayes factor is defined as an odds ratio:
The posterior probability of \({H}_{0}\) is given by
and the prior probability of \({H}_{0}\) is
Likelihood functions
We used presence/absence data of four exotic leguminous plants: Acacia dealbata, Cytisus striatus, Teline monspessulana and Ulex europaeus. These data were recorded from 30° to 43° south latitude, using two transects, one located along the Coast and the other, at the Central Valley. Within the transect, we disposed plots (2 × 50 m) placed along the verge of secondary or tertiary roads, with low management practices; each plot was located each 10 km encompassing a total of 264 plots (132 plots per transect). This information allowed us to estimate the occupancy of species at regional level either for Coast and Central Valley. For hypothesis testing, we compared occupancy between Coast versus Central Valley for each species. Our expectation was that exotic species should perform better at the Coast than at the Central Valley, given that at the Coast, the oceanic influence reduces temperature variation and increase air humidity relative to the Central Valley (the basic information gathered to construct the likelihood function is in Additional file 1: Appendix section).
To construct the informative priors, we used an independent study [10] which utilized a similar protocol to our field work. Shortly, they collected presence/absence data for the four species at the Coast and Central Valley in the Biobío and La Araucanía Regions, in south-central Chile, between 36° 35´ and 38° 25´ Lat S. They disposed 109 plots distributed along four of the principal highways which cross the study area. These plots were separated systematically approximately 5 km from each other. We distinguish plots of the Coast and Central Valley, simply using as separation line the coastal mountains: plots existing from the highest altitude toward the ocean were assigned as Coast while plots existing from the highest altitude to the east, were assigned as Central Valley.
Given the nature of the state variable (occupancy), informative priors are beta distributed (the basic information gathered to construct the informative prior distributions, is in Additional file 1: Appendix section). For the non-informative priors, we used three distributions: Uniform (Beta 1, 1), Jeffreys (Beta 0.5, 0.5), and an approximation of the Haldane distribution, (Beta 0.001, 0.001); here, it is clear that the Jeffrey's and Haldane's prior distributions are improper i.e. their integration between 0 to 1 is infinite. This fact constitutes one of the problems to the use of non-informative priors because in many cases they are not probability distribution functions [12]; working with improper priors can lead to the marginalization paradox [6] which means that the calculation of the Bayes factor can be affected by infinite value of the prior probabilities of \({H}_{0}\) and \({H}_{1}\). This is another argument to be cautious with the selection of priors. Even so, improper priors are still useful if posterior distribution are well defined [31].
To obtain posterior distributions, we simply multiplied the likelihood function by the specified priors, using Eq. (3) (see above). Finally, to obtain the Bayes factor \({BF}_{01}\), we applied Eqs. (4) to (6) (see above). By convention, data were presented as \({BF}_{10}\) [17] which simply means that the numerator is \(P\left(\theta |{H}_{1}\right)\) and the denominator is \(P\left(\theta |{H}_{0}\right).\)
Results
In three species (Acacia dealbata, Cytisus striatus and Teline monspessulana) the posterior distributions constructed from informative priors were quite different to the posteriors constructed from non-informative priors (Figs. 1 and 2; Table 1). For instance, the mean values of the posteriors constructed from informative priors was more than 10% lower than the mean obtained from non-informative priors. The exception was Ulex europaeus in which posteriors distributions constructed from informative and non-informative these values were quite similar (Figs. 1 and 2; Table 1).
Using Bayes Factor, we changed our decision about the most plausible hypothesis, in three species. For instance, in A. dealbata, non-informative priors led us to decide in favor of H0 while informative prior led us to decide moderately in favor of H1 (Table 2); in T. monspessulana and C. striatus non-informative led us to decide in favor of H1 but using the informative prior we decided strongly in favor of H0 (Table 2); only in the case of U. europaeus both informative and non-informative priors led us to decide strongly in favor of H1 (Table 2). Note that in this last species, although we maintained our decision about Ho, we observed a notable reduction in Bayes Factor using informative priors (criteria to decide in favor or against Hypothesis, were obtained from Andraszewicz et al. [1].
Discussion
We have shown that in three of four species, the change from non-informative to informative priors affected our interpretation of our results. The case of U. europaeus was the exception. This last situation is interesting because in this case, sampling data was sufficient to characterize posterior distribution and to conduct hypothesis testing. Our results, reinforce the idea to be cautious with the selection of priors in ecological studies.
If the selection of priors has been largely discussed in Bayesian Analysis as a potential source of confusion in hypothesis testing [27] why has it not received sufficient attention in Ecology? One possible explanation is the preeminence of positivism in Ecology and the false presumption that BI ought to be objective [23]. From positivism, data obtained from a well-designed sampling procedure, constitutes the only valid source of information; sources other than data, are not considered for statistical analysis [3]. Given that non-informative priors are well specified mathematical constructs, they supposedly add “objectivity” to the analysis, in opposition to informative priors which emerge from information gathered in other contexts, or are just are opinions and beliefs [3, 22, 26].
Subjectivity is present in any statistical approach either BI or FI [13]. However, in BI, subjectivity is explicitly recognized as part of the analysis and constitutes a strength rather than a weakness because it assumes that knowledge is always incomplete and preliminary [3, 14, 22]. We sustain that if an ecologist prefers non-informative over informative priors, he is refusing the use of a vast source of available information which exists outside peer-review journals and books (i.e. grey literature, [4]. As we said, this situation is particularly critical in Conservation and Restoration Biology [24]; in these disciplines, data are scarce and sometimes, the opinion of people (peasants, experts) constitute the only source of information; BI can assists the formalization of ecologically based informative priors using sophisticated techniques based on probability theory [5].
We strongly suggest that ecological studies use informative priors. BI is regarded as a continuous process that actualizes our knowledge about the parameters of interest in a virtuous circle of learning; for that reason, the knowledge is always preliminary [20]; if in the future we obtain new information, we can use the posterior distributions obtained for the first study, as an informative prior and thus conduct new BI that will update our knowledge about the parameters of interest. During this process, every piece of information is important, and BI provides a proper conceptual background for the integration of a variety of information. The only requirement is that selected priors must be clearly explicit about the rationale used for such selection.
In summary, prior distributions are a fundamental part of BI, either in its philosophy, interpretation, and model fitting; therefore, their selection should be considered carefully. We sustain that for ecologists, non-informative priors, can be mathematically adequate, but they do not account of the vast complexity of ecosystems. We encourage ecologists to initiate a debate about the use of informative priors when they are accessible [19]. As a guide to initiate this conversation, we suggest two ideas: (i) to consider grey literature [4, 25] and to learn about the elicitation process with experts or the public for the construction of informative priors [11] and (ii) to accept that subjectivity is part of BI, and offers adequate procedures to formalize the uncertainty and our beliefs about the reality.
Conclusion
Bayesian inference conducted in ecological research use largely non-informative over informative prior distributions. In this study, we demonstrated that the selection of priors is crucial for hypothesis testing in Bayesian Inference. We compared occupancy in four exotic plants living in contrasting habitats in Central Chile. We found that our inferences changed depending on the kind of prior utilized, in the 75% of cases. We encouraged to ecologist to be very explicit during the selection of prior distribution. We also suggest that informative priors should be used more frequently in this kind of analysis.
Availability of data and materials
Data utilized for this study are summarized in Appendix (see Supplementary materials).
Abbreviations
- BI:
-
Bayesian Inference
- FI:
-
Frequentist Inference
- BF:
-
Bayes Factor
References
Andraszewicz S, Scheibehenne B, Rieskamp J, et al. An Introduction to Bayesian Hypothesis Testing for Management Research. J Manag. 2015;41:521–43. https://doi.org/10.1177/0149206314560412.
Applestein C, Caughlin TT, Germino MJ (2022) Bayesian modeling can facilitate adaptive management in restoration. Restoration Ecology 30: https://doi.org/10.1111/rec.13596
Banner KM, Irvine KM, Rodhouse TJ. The use of Bayesian priors in Ecology: The good, the bad and the not great. Methods Ecol Evol. 2020;11:882–9. https://doi.org/10.1111/2041-210X.13407.
Battisti C, Amori G, Angelici FM, et al. Can the grey literature help us understand the decline and extinction of the Near Threatened Eurasian otter Lutra lutra in Latium, central Italy? Oryx. 2011;45:281–7. https://doi.org/10.1017/S0030605310001055.
Choy SL, O’Leary R, Mengersen K. Elicitation by design in ecology: using expert opinion to inform priors for Bayesian statistical models. Ecology. 2009;90:265–77.
Dawid AP, Stone M, Zidek JV. Marginalization paradoxes in Bayesian and structural inference. J Roy Stat Soc: Ser B (Methodol). 1973;35:189–213.
Depaoli S, Winter SD, Visser M (2020) The Importance of Prior Sensitivity Analysis in Bayesian Statistics: Demonstrations Using an Interactive Shiny App. Front Psychol 11:608045. https://doi.org/10.3389/fpsyg.2020.608045
Douma JC, Weedon JT. Analysing continuous proportions in ecology and evolution: A practical introduction to beta and Dirichlet regression. Methods Ecol Evol. 2019;10:1412–30. https://doi.org/10.1111/2041-210X.13234.
Ellison AM. Bayesian inference in ecology. Ecol Letters. 2004;7:509–20. https://doi.org/10.1111/j.1461-0248.2004.00603.x.
García RA, Pauchard A, Escudero A. French broom (Teline monspessulana) invasion in south-central Chile depends on factors operating at different spatial scales. Biol Invasions. 2014;16:113–24. https://doi.org/10.1007/s10530-013-0507-y.
Garthwaite PH, Kadane JB, O’Hagan A. Statistical Methods for Eliciting Probability Distributions. J Am Stat Assoc. 2005;100:680–701. https://doi.org/10.1198/016214505000000105.
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. 3rd ed. Chapman and Hall/CRC. 2013;519–44. https://doi.org/10.1201/b16018.
Gelman A, Hennig C. Beyond subjective and objective in statistics. J R Stat Soc A. 2017;180:967–1033. https://doi.org/10.1111/rssa.12276.
Goldstein M. Subjective Bayesian Analysis: Principles and Practice. Bayesian Anal. 2006;1(3):403–20. https://doi.org/10.1214/06-BA116.
Grzenda W. Informative Versus Non-Informative Prior Distributions and their Impact on the Accuracy of Bayesian Inference. Statistics in Transition New Series. 2016;17:763–80.
Hooten MB, Hobbs NT. A guide to Bayesian model selection for ecologists. Ecol Monogr. 2015;85:3–28. https://doi.org/10.1890/14-0661.1.
Kass RE, Raftery AE. Bayes Factors. J Am Stat Assoc. 1995;90:773–95. https://doi.org/10.1080/01621459.1995.10476572.
Kindsvater HK, Dulvy NK, Horswill C, et al. Overcoming the Data Crisis in Biodiversity Conservation. Trends Ecol Evol. 2018;33:676–88. https://doi.org/10.1016/j.tree.2018.06.004.
Lemoine NP. Moving beyond noninformative priors: why and how to choose weakly informative priors in Bayesian analyses. Oikos. 2019;128:912–28. https://doi.org/10.1111/oik.05985.
Link WA, Barker RJ. Bayesian inference: with ecological applications. Academic Press; 2009.
McCarthy MA, Masters P. Profiting from prior information in Bayesian analyses of ecological data. J Appl Ecol. 2005;42:1012–9. https://doi.org/10.1111/j.1365-2664.2005.01101.x1012-1019.
Northrup JM, Gerber BD. A comment on priors for Bayesian occupancy models. PLoS ONE. 2018;13:e0192819.
Norton BG. Beyond Positivist Ecology: Toward an Integrated Ecological Ethics. Sci Eng Ethics. 2008;14:581–92. https://doi.org/10.1007/s11948-008-9095-0.
Robertson DP, Hull RB. Beyond Biology: toward a More Public Ecology for Conservation. Conserv Biol. 2001;15:970–9. https://doi.org/10.1046/j.1523-1739.2001.015004970.x.
Rothstein H, Hopewell S. Grey Literature. In: Cooper HM, Hedges LV, Valentine JC, editors. The handbook of research synthesis and meta-analysis. 2nd ed. New York: Russell Sage Foundation; 2009.
Torsen E. Objective versus subjective bayesian inference: a comparative study. Int J. 2015;3:56–65.
Van Dongen S. Prior specification in Bayesian statistics: three cautionary tales. J Theor Biol. 2006;242:90–100.
van Zyl, CJJ. Frequentist and Bayesian inference: A conceptual primer. New Ideas in Psychology. 2018;51:44–9. https://doi.org/10.1016/j.newideapsych.2018.06.004.
Warton DI, Hui FKC. The arcsine is asinine: the analysis of proportions in ecology. Ecology. 2011;92:3–10. https://doi.org/10.1890/10-0340.1.
Wei Z, Yang A, Rocha L, et al. A Review of Bayesian Hypothesis Testing and Its Practical Implementations. Entropy. 2022;24:161. https://doi.org/10.3390/e24020161.
Zhu M, Lu AY. The Counter-intuitive Non-informative Prior for the Bernoulli Family. J Stat Educ. 2004;12:3. https://doi.org/10.1080/10691898.2004.11910734.
Acknowledgements
To Marco Méndez and Alex Fajardo, who made critical comments to the final version of the manuscript.
Funding
Sandra Flores-Alvarado’s Ph.D. studies are supported by ANID CONICYT- PFCHA/Doctorado Nacional/2020–21200398). RG and RO Bustamante were funded by Grant ANID/BASAL FB210006, RO Bustamante was funded by the Project Technological Center of Excellence CHIC-AND/BASAL PFB210018.
Author information
Authors and Affiliations
Contributions
ROB conceived the idea; AI and SF formalized it in statistical terms; RG contributed to the construction of the informative priors; Estefany Goncalves conducted the statistical analysis; all the authors contributed to the writing and edition of the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not aplicable.
Consent for publication
Not applicable.
Competing interests
The author(s) declare(s) that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bustamante, R.O., Iturriaga, A., Flores-Alvarado, S. et al. On the use of prior distributions in bayesian inference applied to Ecology: an ecological example using binomial proportions in exotic plants, Central Chile. Rev. Chil. de Hist. Nat. 96, 6 (2023). https://doi.org/10.1186/s40693-023-00118-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40693-023-00118-0