MONTE CARLO ANALYSIS FOR MICROBIAL GROWTH CURVES

Three most commonly used primary models (Gompertz, Baranyi and three-phase linear models) to describe the microbial growth curves were applied to three different isothermal growth data of Listeria monocytogenes. Further Monte Carlo analysis was performed with 100, 1000 and 10000 simulations. The results indicated that there was no reason to use higher number of simulations since the simulations produced almost identical means of the model parameter values for all models. Moreover, the models had similar coefficient of variation values for the initial (log10N0) and maximum (log10Nmax) number of bacteria. On the other hand, the Gompertz model had the highest coefficient of variation for the growth rate (μmax) and the Baranyi model had the highest coefficient of variation for the lag time (λ). Correlations between the parameters log10N0 and λ, and μmax and λ could be easily observed after the Monte Carlo analysis for all models. Deviation from normal distribution for the parameter λ for the three-phase linear model was evident, other than that all parameters for all models had normal distribution. It was concluded that Monte Carlo analysis can be used as a simple yet an effective method to describe the uncertainty in model parameters and correlation between the parameters as well as the spread of the possible parameter values.


INTRODUCTION
Mathematical models have been used to evaluate microbial behavior such as inactivation, survival or growth under different environmental conditions (Gwak et al., 2015). The growth of a homogenous microbial population in a closed habitat can be described by a sigmoidal curve with three phases: lag, exponential and death (Lebert and Lebert, 2006). Growth of microorganisms can be described by several models and one can find many state of the art reviews on the primary growth models (McKellar and Lu, 2004;Peleg and Corradini, 2011). Growth models such as Baranyi and Gompertz models are deterministic viz., models always give the same output when provided with the same input (no uncertainty) (Heldman and Newsome, 2003). However, biological variability in strains of a bacterial specie and in model parameters, and experimental uncertainty play an important role. Hence, stochastic modeling i.e., a model that gives a variable output when provided with the same input (variability is a part of the model) (Heldman and Newsome, 2003) should be investigated more deeply. Monte Carlo (MC) analysis is based on random computer simulations of the experimental data which are described by a mathematical model. It is most probably the simplest and the best analysis to describe the uncertainty in model parameters (van Boekel, 2009). The usage of MC in the field of predictive microbiology can be found in literature Cassin et al., 1998;Coleman and Marks, 1999;Koyama et al., 2019;Nauta, 2000;Nicolaï and Van Impe, 1996). Furthermore, it was also used for primary growth models such as Baranyi model (Poschet et al., 2004;2003) and Gompertz model (Lambert et al., 2012).
The main objective of this study was to investigate MC analysis for most commonly used isothermal microbial growth models, namely the Gompertz, the Baranyi and the three-phase linear models with three different published isothermal growth data sets of Listeria monocytogenes. Further objective was to make comparisons between the models based on the results of MC simulations such as correlations of the model parameters.

Data sets
Growth data of Listeria monocytogenes in three different studies were used as the data bases for this study.

Modified Gompertz model
The British mathematician Benjamin Gompertz introduced a formula to describe the mortality of humans in 1825 (Gompertz, 1825). A century later it was applied as a growth model [Eq.(1)] (Winsor, 1932) and although it was more popular at the beginning of 1990s, today it is still a commonly used model to describe microbial growth data (Peleg and Corradini, 2011). Zwietering et al. (1990) reparametrized the original Gompertz equation in order to obtain interpretable parameters. This modified version of the Gompertz model was used in the present study and for simplicity we used the term "Gompertz model" not the modified Gompertz model: Three most commonly used primary models (Gompertz, Baranyi and three-phase linear models) to describe the microbial growth curves were applied to three different isothermal growth data of Listeria monocytogenes. Further Monte Carlo analysis was performed with 100, 1000 and 10000 simulations. The results indicated that there was no reason to use higher number of simulations since the simulations produced almost identical means of the model parameter values for all models. Moreover, the models had similar coefficient of variation values for the initial (log10N0) and maximum (log10Nmax) number of bacteria. On the other hand, the Gompertz model had the highest coefficient of variation for the growth rate (µmax) and the Baranyi model had the highest coefficient of variation for the lag time (λ). Correlations between the parameters log10N0 and λ, and µmax and λ could be easily observed after the Monte Carlo analysis for all models. Deviation from normal distribution for the parameter λ for the three-phase linear model was evident, other than that all parameters for all models had normal distribution. It was concluded that Monte Carlo analysis can be used as a simple yet an effective method to describe the uncertainty in model parameters and correlation between the parameters as well as the spread of the possible parameter values. where log10N0 is the initial number of bacteria while log10Nmax is the maximum number of bacteria reached during the growth. The maximum growth rate and lag time are denoted by µmax and λ, respectively.
Baranyi model Baranyi and Roberts (1994) proposed a microbial growth model, which is perhaps the most commonly used microbial growth model today: and where q(t) represents the concentration of a critical substance for bacterial growth, and the term q(t)/[1+q(t)] in Eq.
(2) represents the physiological state of the cells and also associated with the with the lag time (λ) through the introduced parameter h0 = ′ ·λ which appears in the solution of the rate equation (Peleg and Corradini, 2011). The curvature parameter, m, is generally set to 1 for simplicity. Note that ′ is the maximum specific growth rate.
For static environmental conditions such as the constant temperature the following analytical solution could be obtained (Poschet et al., 2005): If ≤ log 10 ( ) = log 10 0 If < < log 10 ( ) = log 10 0 + • ( − ) (6) If ≥ log 10 ( ) = log 10 0 + • ( − ) Note that log10Nmax can be calculated as log10N0 + µ·(tmaxλ) where tmax is the time at which maximum population (log10Nmax) is reached. Moreover, since the slope of the exponential phase is constant i.e., exponential phase is described by a linear line, µ was used instead of µmax contrary to Gompertz and Baranyi models.

Monte Carlo analysis
To perform MC analysis or simulation two information are necessary: (i) adequate mathematical model or models; (ii) uncertainty or standard deviation in the experimental data (van Boekel, 2009). Since the aforementioned models were used to describe the growth data of L. monocytogenes, only experimental uncertainty should be determined. For the plate count data experimental error is in the order of 1 log10 unit (Jarvis, 1989;Mossel et al., 1995) therefore, datum point ± 0.5 log10 unit was used for MC simulation (Poschet et al., 2003) for the data of Lambert et al. (2012) and Alavi et al. (1999). For the data of McKellar (1997) datum point ± 0.35 log10 unit was used because generated data with ± 0.5 log10 unit made the models inappropriate (several negative lag times were obtained for each number of simulations)data not shown. MC analysis were performed as follows: 1. Models were fitted to each data set and parameters (log10N0, log10Nmax, µmax and λ) were estimated as usually. 2. By using the models and their estimated parameters, perfect data were obtained. 3. Data were randomly generated on the perfect data with ± 0.5 log10 unit [± 0.35 log10 unit for the data of McKellar (1997)]. Since many generations should be repeated depending on the problem, we ran the simulations for 100, 1000 and 10000 times in order to make comparisons between the simulations and the models. 4. Generated data sets were fitted with the suitable model one by one and the parameters were tabulated. 5. Tabulated parameters were transformed into histograms and correlation plots between the parameters were sketched for 100, 1000 and 10000 simulations separately. MATLAB version 9.3 (The MathWorks, Inc., Natick) was used for data generation and model fitting, and SigmaPlot version 12.00 (Chicago, IL, USA) was used for plotting the graphs.

Comparison of the models
Three models, namely the Gompertz [Eq.(1)], the Baranyi [Eq. (5)] and the threephase linear [Eq.(6)] models were fitted to each data set and parameters were estimated by non-linear regression. Fig. 1 shows the fit of the models for the data set of Lambert et al. (2012) and parameter estimates are given in Table 1. Adjusted coefficient of determination (R 2 adj) and root mean square error (RMSE) values were used to determine the goodness-of-fit of the models. Although the difference between the Gompertz model and the Baranyi model was very small, the Gompertz model was the best model since it had the highest R 2 adj and lowest RMSE values (Table 1). Based on R 2 adj the Baranyi and the three phase linear models had both the best and identical goodness-of-fits for the data set of  Table S1. For the data set of McKellar (1997) the Baranyi model was again the best model followed by the Gompertz and the three phase linear models (Table S2).  Similar initial number of bacteria (log10N0) was obtained for all models whereas the Gompertz model gave the highest maximum number of bacteria (log10Nmax).
On the other hand, the Baranyi and the three phase linear models had almost identical log10Nmax values (Table 1). The Gompertz model produced the highest maximum growth rate (µmax) and lag time (λ) values while the three phase linear model had the lowest and the Baranyi model was in between being closer to the three phase linear model (Table 1). This discussion was also valid for the other data sets (Alavi et al., 1999;McKellar, 1997) where the model fits and parameter estimates are given in Figs. S1 and S2, and Table S1 and S2, respectively. The Gompertz model has a specific curvature around the inflection point of the "S" shaped curve (Baranyi et al., 1993) and therefore it was not surprising that it produced highest µmax and λ values. Lag time calculation with the Gompertz model could give wrong outcomes since growth may occur before the estimated λ. Many researchers pointed out the limitations of the Gompertz equation, especially the overestimation of µmax and λ (Dalgaard, 1995 1992). Moreover, unlike the Baranyi and the three phase linear models the Gompertz model did not reach to a plateau (Figs. 1, S1 and S2) that is why log10Nmax was also highest for the Gompertz model (Tables 1, S1 and S2).   Table 2. There was no significant difference (p > 0.05) between 100, 1000 and 10000 simulations in terms of parameters' means for all the models tested which could be also said for the other two data sets (Tables S3 and S4). This result indicated that there was no reason to use high number of simulations: 100 simulations was enough for microbial growth data.  (Gibson et al., 1987)] and showed that parameter estimates (means) of 100 simulations with Excel and 10000 simulations with Mathematica were almost identical. Our finding with same data set but with different growth models applied were similar to those of Lambert et al. (2012). Moreover, results of two more data sets did support this judgement. The results of Lambert et al. (2012) and the present study also brought out that Excel which is easy to access, could be safely use to generate 100 simulations randomly plus SOLVER Add-In package of Excel could be used to fit the non-linear models to the generated data Lambert et al. (2012). Note that SOLVER in Excel could not produce parameter estimates with their uncertainties, that is to say standard errors or confidence intervals of the parameters could not be obtained. However, MC simulation can be used together with the SOLVER tool in Excel to obtain confidence intervals (Lambert et al., 2012). Also note that the Bayesian approach may also be used instead of the procedure presented here. However, the aim here was to show the MC method. Moreover, Bayesian procedure requires learning and applying of a programming language like R or Phyton (van Boekel, 2020).

Monte Carlo analysis for the growth models
Poschet et al. (2003) performed MC analysis 5 times (for the growth data of L. innocua and Escherichia coli fitted with the Baranyi model) for different number of iterations and observed that 10000 iterations, compared to 100 and 1000, produce more stable mean values of parameters and standard deviations i.e., less variation on the mean value of the parameters of the Baranyi model is seen (Poschet et al., 2003). Nevertheless, the gap between the mean values of the parameters obtained from 100, 1000 or 10000 iterations were low. If the mean of the goodness-of-fit indices (R 2 adj and RMSE) given in Tables 2, S3 and S4 were investigated comparison between the models could be made. Although such a comparison may not be appropriate since MC analysis was based on the perfect fit of each model, it may be useful to differentiate between the number of simulations. For example, the Gompertz model was the best model followed by the Baranyi and three-phase linear models for the data of Lambert et al. (2012) when standard non-linear fitting procedures were appliedsee Table 1. This ranking was unchanged after the MC analysis with 100, 1000 and 10000 simulations. On the other hand, although there was a slight difference between the Baranyi model and the three phase linear models, the Baranyi model produced the best fit while the Gompertz model had the worst fit for the second data set (Alavi et al., 1999). MC analysis changed this sequence: the three-phase linear model was the first, the Gompertz model was the second and the Baranyi model was the third for 100 simulations. For 1000 and 10000 simulations threephase linear model was again the best model followed by the Baranyi and the Gompertz models (see Tables S1 and S3).
Comparing coefficient of variation (CV) (SD/mean×100) of all iterations revealed that similar CVs were obtained for log10N0 and log10Nmax for the models tested (about 1 to 3%) whereas the Gompertz model had the highest CV (about 10%) for µmax followed by the Baranyi (about 8%) and the three phase linear (about 6 %) models. For the parameter λ, the Baranyi model had the CV (about 12%) followed by the three phase linear (about 10%) and the Gompertz (about 9%) models. This could be better visualized by looking at Fig. 3 where the results of 100 iterations are shown for three models: similar CVs for log10N0 and log10Nmax for all models can be observed in Fig. 3 for the data of Lambert et al. (2012). Furthermore, lower CV values of the parameter µ for three phase linear model could also be seen; however, interpreting the parameters µmax and λ was not that easy for the Gompertz and Baranyi models.  MC analysis can also be used to assess the pairwise correlations between parameters (van Boekel, 2009). Figs. 4, 5 and 6 display the results of 1000 iterations of MC analysis for the correlations between the parameters of the Gompertz, the Baranyi and the three phase linear models, respectively. It can be clearly seen that there was a strong correlation between the parameters log10N0 and λ, and µmax and λ. This was also valid for the other data setsdata not shown. Structural correlations between log10N0 and λ, and between µmax and λ for the Gompertz and the Baranyi models were also shown by Baty and Delignette-Muller (2004) by plotting 95 % confidence regions for the λ estimates compared to log10N0, log10Nmax and µmax. We have shown the same result by using MC analysis with three different data sets including the three-phase linear model. Note that correlations had little or no effect on parameter estimationssee Tables 1, S1 and S2 for parameter values and their corresponding standard errors. High correlation between parameters does not always indicate failure of parameter estimation. Furthermore, the scaled sensitivity coefficients (SSCs) for all 4 parameters for each model (see Dolan and Mishra, 2013 for example) could be used to understand at which part of the experiment there is a correlation between the parameters. SSCs would show that the log10N0 and λ were not correlated in the first 1/3 of the experiment, so they can be estimated easily, and the peak of the µmax SSC occurred well before the peak of the λ SC, that is why these two parameters too can be estimated separately.    Tabulated parameter values given in Table 2 were transformed into histograms for 10000 simulations (Figs. 7 and 8). It was clear from Fig. 7 that distributions for the parameters log10N0 and log10Nmax were normal for all models. Similarly, for the growth rate parameter (µmax/µ) all models had normal distribution. However, histogram for three-phase linear model given in Fig. 8 clearly indicated the deviation from normality for the parameter λ which were also true for the other two data setssee Figs. S5 and S6.

CONCLUSION
Three most commonly used growth models were fitted to three different data sets and MC analysis were also performed in the present study. It was observed that 100 MC simulations produced almost identical parameter values with those of 1000 and 10000 simulations. Therefore, it was concluded that using high number of simulations was unnecessary for microbial growth models. All models had similar CV values for the parameters log10N0 and log10Nmax. While the Gompertz and the Baranyi models had the highest CV for the parameters µmax and λ, respectively. MC analysis also showed that there were strong correlations between the parameters log10N0 and λ, and µmax and λ. All models' parameters had normal distribution except the parameter λ for three-phase linear model.