This study applies the SVM, ANN, RT, and GP, for forecasting monthly reservoirs inflow with 1 and 2month time lags. The historical data for inflow to the Dez, Karkheh, and Gotvand reservoirs were collected and used to build the ML algorithms. The inputs to the algorithms for the Dez, Karkheh, and Gotvand reservoirs are the monthly inflows for 1965–2019, 1957–2019, and 1961–2019, respectively. Four projections were designed for the 1month time lag and the 2month time lag patterns based on the input and output months, as depicted in Fig. 1. Figure 2 displays the flowchart of this paper’s methodology.
Support vector machine
Support Vector Machine was introduced by Vapnik et al.^{43}. SVM performs classification and regression based on statistical learning theory^{44}. The regression form of SVM is named support vector regression (SVR). Vapnik et al.^{45} defined two functions for SVR design. The first function is the error function. (Eq. (1), see Fig. 3). The second function is a linear function that calculates output values for input, weight, and deviation values (Eq. 2):
$$ \left {y – f\left( x \right)} \right = \left\{ {\begin{array}{ll} 0 & {if\;\left {y – f\left( x \right)} \right \le \varepsilon } \\ {\left {y – f\left( x \right)} \right – \varepsilon = \xi } & {otherwise} \\ \end{array} } \right. $$
(1)
$$ f\left( x \right) = W^{T} x + b $$
(2)
where \(y\), \(f(x)\), \(\varepsilon\), \(\xi\), \(W\), \(b\), \(T\) denote respectively the observational value, the output value calculated by SVR, a function sensitivity value, a model penalty, the weight applied to the variable \(x\), the deviation of \(W^{T} x\) from the \(y\), and the vector/matrix transpose operator.
It is seen in Fig. 3 that the first function (Eq. 1) does not apply a penalty to the points where the difference between the observed value and the calculated value falls within the range of \(( – \varepsilon , + \varepsilon )\). Otherwise, a penalty \(\xi\) is applied. SVR solves an optimization problem that minimizes the forecast error (Eq. 3) to improve the model’s forecast accuracy. Equations (4) and (5) represent the constraints of the optimization problem.
$$ minimize\frac{1}{2}\left\ W \right\^{2} + C\sum\limits_{{i = 1}}^{m} {\left( {\xi _{i}^{ – } + \xi _{i}^{ + } } \right)} $$
(3)
Subject to:
$$ (W^{T} x + b) – y_{i} < \varepsilon + \xi_{i}^{ + } \;\;\;i = 1, \, 2, \ldots , \, m $$
(4)
$$ y_{i} – \left( {W^{T} x + b} \right) \le \varepsilon + \xi_{i}^{ – } \;\;\;i = 1, \, 2, \ldots , \, m $$
(5)
where \(C\), m, \(\xi_{i}^{ – }\), \(\xi_{i}^{ + }\), \(y_{i}\), and   denote respectively the penalty coefficient, the number of input data to the model in the training phase, the penalty for the lower bound \(( – \varepsilon , + \varepsilon )\), the penalty for the upper bound \(( – \varepsilon , + \varepsilon )\), the ith observational value, and vectorial magnitude. The values of W and b are calculated by solving the optimization problem embodied by Eqs. (3)–(5) with the Lagrange method, and they are substituted in Eq. (2) to calculate the SVR output. SVR is capable of modeling nonlinear data, in which case it relies on transfer functions to transform the data to such that linear functions can be fitted to the data. Reservoirs inflow is forecasted with SVR was performed with the Tanagra software. The transfer function selected and used in this study is the Radial Basis Function (RBF), which provided better results than other transfer functions. The weight vector W is calculated using the Soft Margin method^{46}, and the optimal values of the parameters \(\xi_{i}^{ – } , + \xi_{i}^{ + }\) and C were herein estimated by trial and error.
Regression tree (RT)
RT involves a clustering tree with postpruning processing (CTP). The clustering tree algorithm has been reported in various articles as the forecasting clustering tree^{47} and the monothetic clustering tree^{48}. The clustering tree algorithm is based on the topdown induction algorithm of decision trees^{49}; This algorithm takes a set of training data as input and forms a new internal node, provided the best acceptable test can be placed in a node. The algorithm selects the best test scores based on their lower variance. The smaller the variance, the greater the homogeneity of the cluster and the greater the forecast accuracy. If none of the tests significantly reduces the variance the algorithm generates a leaf and tags it as being representative of data^{47,48}.
The CTP algorithm is similar to the clustering tree algorithm, except that its postpruning process is performed with a pruning set to create the right size of the tree^{50}.
RT used in this study is programmed in the MATLAB software. The minimum leaf size, the minimum node size for branching, the maximum tree depth, and the maximum number of classification ranges are set by trial and error in this paper’s application.
Genetic programming (GP)
GP, developed by Cramer^{51} and Koza^{52}, is a type of evolutionary algorithm that has been used effectively in water management to carry out single and multiobjective optimization^{53}. GP finds functional relations between input and output data by combining operators and mathematical functions relying on structured tree searches^{44}. GP starts the searching process by generating a random set of trees in the first iteration. The tree’s length creates a function called the depth of the tree which the greater the depth of the tree, the more accurate the GP functional relation is^{54}. In a tree structure, all the variables and operators are assumed to be the terminal and function sets, respectively. Figure 4 shows mathematical relational functions generated by GP. Genetic programming consists of the following steps:

Select the terminal sets: these are the problemindependent variables and the system state variables.

Select a set of functions: these include arithmetic operators (÷ , ×, −, +), Boolean functions (such as “or” “and”), mathematical functions (such as sin and cos), and argumentative expressions (such as if–thenelse), and other required statements based on problem objectives.

Algorithmic accuracy measurement index: it determines to what extent the algorithm is performing correctly.

Control components: these are numerical components, and qualitative variables are used to control the algorithm’s execution.

Stopping criterion: which determines when the execution of the algorithm is terminated.
The Genexprotools software was implemented in this study to program GP. The GP parameters, operators, and linking functions were chosen based on the lowest RMSE in this study. The GP model’s parameters and operators applied in this study are listed in Table 1.
Artificial Neural Network (ANN)
ANN, developed by McCollock and Walterpits^{55}, is an artificial intelligencebased computational method that features an information processing system that employs interconnected data structures to emulate information processing by the human brain^{56}. A neural network does not require precise mathematical algorithms and, like humans, can learn through input/output analysis relying on explicit instructions^{57}. A simple neural network contains one input layer, one hidden layer, and one output layer. Deeplearning networks have multiple hidden layers^{58}. ANN introduces new inputs to forecast the corresponding output with a specific algorithm after training the functional relations between inputs and outputs.
This study applies the MultiLayer Perceptron (MLP). A threelayer feedforward ANN that features a processing element, an activation function, and a threshold function, as shown in Fig. 5. In MLP, the weighted sum of the inputs and bias term is passed to activation level through a transfer function to create the one output.
The output is calculated with a nonlinear function as follows:
$$ Y = f\left( {\mathop \sum \limits_{i = 1}^{n} W_{i} X_{i} + b} \right) $$
(6)
where \(W_{i}\), \(X_{i}\), \(b\), \(f\), and \(Y\) denote the ith weight factor, the ith input vector, the bias, the conversion function, and the output, respectively.
The ANN was coded in MATLAB. The number of epochs, the optimal number of hidden layers, and the number of neurons of the hidden layers were found through a trialanderror procedure. The model output sensitivity was assessed with various algorithms; however, the best forecasting skill was achieved with the Levenberg–Marquardt (LM) algorithm^{59}, and the weight vector W is calculated using the Random Search method^{60}. Furthermore, the Tangent Sigmoid and linear transfer function were chosen by trial and error and used in the hidden and output layers, respectively.
70% of the total data were randomly selected and used for training SVM, ANN, RT, and GP. The remaining 30% of the data were applied for testing the forecasting algorithms.
Performanceevaluation indices
The forecasting skill of the ML algorithms (SVM, ANN, RT, and GP) was evaluated with the Correlation Coefficient (R), the Nash–Sutcliffe Efficiency (NSE), the Root Mean Square Error (RMSE), and the Mean Absolute Error (MAE) in the training and testing phases. The closer the R and NSE values are to 1, and the closer the RMSE and MAE values are to 0, the better the performance of the algorithms^{20}. Equations (7)–(10) describe the performance indices:
$$ NSE = 1 – \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{fore,\;i} – Q_{obs,i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{obs,i} – Q_{mean \; obs} } \right)^{2} }} $$
(7)
$$ R = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{fore,i} – Q_{mean \; fore} } \right)\left( {Q_{obs,i} – Q_{mean \; obs} } \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{fore,i} – Q_{mean \; fore} } \right)^{2} } \sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{obs,i} – Q_{mean \; obs} } \right)^{2} } }} $$
(8)
$$ MAE = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left {Q_{fore,i} – Q_{obs,i} } \right $$
(9)
$$ RMSE = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{fore,i} – Q_{obs,i} } \right)^{2} }}{n}} $$
(10)
in which \( Q_{fore,i}\), \(Q_{obs,i}\), \(Q_{mean \; fore}\), \(Q_{mean \; obs}\), \(i\), and \(n\) denote the forecasted inflow, observed inflow, mean forecasted inflow, mean observed inflow, time step, and the total number of time steps during training and testing phases, respectively.
Ethics approval
All authors complied with the ethical standards.
Consent to participate
All authors consent to participate.
Consent for publish
All authors consent to publish.