Discussion: Machine Learning and Neural Networks for Process Simulation: Difference between revisions

From SysCAD Documentation
Jump to navigation Jump to search
mNo edit summary
m (→‎NN API: updated NN API section for clarity)
Line 101: Line 101:
Alternatively, as in the example we show below, we can incorporate a NN model as a unit model within a SysCAD project, where the dataset used for training and validation has been generated externally to SysCAD or from another SysCAD model. To be able to import and calculate a forward pass with the optimised set of parameters (weights and bias for each layer, kernel and or pooling layers), SysCAD can use a NN API to load the parameters and calculate the output of a NN model at each SysCAD iteration while interacting with other SysCAD unit models, in the form of a controller or reactor unit model. In the simplest case, custom PGM code can be implemented to perform the NN forward pass, or as shown in the example below, a custom made unit model running a development C++ NN API can be used directly in place.
Alternatively, as in the example we show below, we can incorporate a NN model as a unit model within a SysCAD project, where the dataset used for training and validation has been generated externally to SysCAD or from another SysCAD model. To be able to import and calculate a forward pass with the optimised set of parameters (weights and bias for each layer, kernel and or pooling layers), SysCAD can use a NN API to load the parameters and calculate the output of a NN model at each SysCAD iteration while interacting with other SysCAD unit models, in the form of a controller or reactor unit model. In the simplest case, custom PGM code can be implemented to perform the NN forward pass, or as shown in the example below, a custom made unit model running a development C++ NN API can be used directly in place.


===NN API===
===KWA's NN API===
This is a C++ dynamic library (DLL) that enables SysCAD to load an optimised DFF or ConvNet model from a file or use a Mass Balance Reactor based on transformation coefficients. The optimised parameters can be generated using the same NN API or some of the publicly available libraries such as [https://www.tensorflow.org/ TensorFlow], [https://pytorch.org/ Pytorch], etc.
This is a set of C++ dynamic libraries (DLL) that enable SysCAD to load an optimised DFF or ConvNet model from a file or set of files or use a Mass Balance Reactor based on a System Component matrix <math>S</math> and its corresponding basis vectors <math>b_i</math>, where the transformation coefficients <math>\lambda_i</math> are calculated based on the optimised DFF parameters for a given set of input amounts and target temperature. An example of this is shown in the next section.<br>
The optimised parameters for the DFF or ConvNet models can be generated using the same KWA NN API outside SysCAD or some of the publicly available libraries such as [https://www.tensorflow.org/ TensorFlow], [https://pytorch.org/ Pytorch], etc.


Once the NN model parameters are loaded, the NN API is used by a SysCAD controller or reactor unit model, providing the necessary inputs to the NN model, calculating the outputs and transferring those outputs back to the SysCAD model.
Once the NN model parameters are loaded, the NN API is used by a SysCAD controller or reactor unit model, providing the necessary inputs to the NN model, calculating the outputs and transferring those outputs back to the SysCAD model.

Revision as of 15:30, 8 December 2023

BlogIcon.png This is a Discussion Page with supplementary user information. It is not part of the core SysCAD Help Documentation - please refer to the User Guide for full documentation links.

Navigation: Product Blog ➔ Discussion Pages ➔ Machine Learning and AI in SysCAD

Related Links:


NOTE: The following discussion page uses TensorFlow as an example for training and optimisation of neural network models for use in SysCAD. However, many other tools are available which can be used to create neural networks.

Introduction to Machine Learning

Machine Learning (ML) is a branch of Artificial Intelligence (AI) based on the idea that systems are capable of learning from data, recognising patterns, and making decisions with minimal human intervention. In ML there are three main categories:

  • Supervised Learning uses datasets for training which contain labelled or true observations for each input set. The algorithm makes predictions that are compared with the actual output values. If the predictions are not within a certain tolerance range, the algorithm is modified until it achieves the correct output. Typically, this type of machine learning requires a large dataset and is typically associated with the use of artificial neural networks (NN).
  • Unsupervised Learning uses unlabelled data, that is, the dataset does not include true or known outputs. In unsupervised learning the algorithm tries to discover a pattern to solve by either clustering (grouping data based on similarities or differences), association (finding relationships between variables), or dimensionality reduction (e.g. principal component analysis).
  • Reinforcement Learning is similar to supervised learning but the model is not trained using a sample dataset of true values. The model learns as it goes in a sort of trial and error whereby a system of reward or "scoring" is used to reinforce the behaviour of the model to achieve certain objective.

In this discussion page we will focus on Supervised Learning, specifically looking at the use of artificial neural network models used within a SysCAD project. In later instalments of this discussion series we will look at other types of ML with SysCAD. For Part 2, we will focus on a step-by-step example of using supervised learning neural network model for green steelmaking. For Part 3, the focus will be on developing a fully trained convolutional neural network for dynamic simulation.

Supervised Learning and Neural Networks

What are Neural Networks?

Artificial Neural Networks (NN) mimic the structure of real neurons in the brain, perceiving/sensing inputs and firing signals through a net of connected neurons. Each of the neurons, also called nodes or perceptrons, combine weighted inputs into a single output. In terms of overall structure, there are various different arrangements for NNs. At a very high-level, NNs can be distinguished based on their intended purpose:

  • Categorical NNs are used for classification (e.g. identifying what number an image represents, assigning a binary true/false output given an input or question).
  • Regression NNs are used to predict one or a set of numerical values, similar to a mathematical function (e.g. mass of compounds in a process stream, temperature, size fraction, etc.).

Both NN types have applications in process modelling. A regression NN could be implemented in SysCAD to solve a mass balance in a unit model representing a reactor of some kind, while a categorical NN could be used to predict a quality (e.g. colour of a given stream) or state of a process (e.g. in-spec or out-of-spec product, equipment failure or not, etc.).

In this discussion we will focus on two NN structures that have been used with SysCAD: Deep Feed Forward (DFF) and Convolutional Neural Networks (ConvNet).

Deep Feed Forward Neural Network

This is the most common type of neural network. The Deep Feed Forward Neural Network (DFF) structure includes an input layer, output layer, and at least one hidden layer containing several neurons, weights, bias, and an activation function. Data flows from the input to the output via a series of calculations (described later in detail). The image below represents the structure of a DFF neural network.

DFF (Neural Network).jpg

Convolutional Neural Network

Convolutional Neural Networks (ConvNet or CNN) is an extended form of DFF and is primarily used for feature extraction from a grid-like matrix dataset. ConvNets are very powerful tools, typically used for image recognition or signal processing. In SysCAD, we have used ConvNet for dynamic time-based process simulation with great success, with much better performance (lowest loss) than DFF using the same training and validation dataset.

ConvNet has many layers which include an input layer (typically a 2D or greater dimension matrix-type), one or more convolutional and pooling layers followed by a fully connected layer represented by a DFF. Essentially this is a DFF with pre-processing steps to simplify and summarise the input data.

ConvNet.jpg

How do Neural Networks Work?

Weight bias.png

A neural network is a network of connected neurons. Each connection is given a weight, and each neuron in the network calculates the sum of weighted inputs, plus a bias'. The calculated value is then subject to an activation function. The output from each neuron is then sent to neurons in the next layer, and so on. The mathematical computation done for each node is represented by the formula below:

[math]\displaystyle{ output= f_{act}\left({\sum_{i=1}^{n_{inputs}}({input_i*weight_i}}) +bias\right) }[/math]

Neural Network Parameters

Neural Network parameters are the learnable or trainable values. The final value of these parameters is obtained by an optimisation process (training or fit) where successive adjustments to these values are made in order to minimise the error (or loss) between the predicted values of the output layer and the true or ground values for each sample in the dataset.

For a DFF network, these are weights and biases which impact the behaviour of each neuron. Weights are usually randomised and biases are zeroed before the learning session begins. Together with an activation functions, they allow the model to propagate forward and produce an acceptable output.

Weight: The weight is multiplied by the input value entering the node. The weights represents the strength of a node connection.
Bias: The output value from a neuron can be shifted using a bias. The bias can be compared to the y-intercept in a linear equation.

For a ConvNet, there are additional trainable parameters. These are the coefficients of a kernel which are used to extract features from the input matrix and also reduce the dimensions or size of the inputs.

Kernel: This is a matrix or set of matrices with a smaller size than the input layer, used to calculate the dot product between a section of the input layer and the kernel. Then, the kernel is displaced, sweeping through the input matrix and calculating again the dot product. This process is repeated until the entire input is processed, producing a new layer containing only the dot products which is typically of a smaller dimension than the input. These act as filters, simplifying and summarising the input data into to more computationally manageable chunks. The vales of this kernel need to be optimised during the training process.

Neural Network Hyperparameters

Below are some of the hyperparameters that can be modified when setting up a neural network model. These are not trainable parameters but rather design parameters defining the structure or makeup of the NN model. Initially, one might not know how many layers, or neurons per layer, are going to result in the lowest validation loss. Similarly, the use of one or more convolutional layers, number of kernels (each convolutional layer might have several kernels, called filters), the size of the kernel window, whether to use or not pooling layer at all, etc.

The process by which one determines the final NN design is often referred to as hyperparameter tuning or optimisation and involves running parametrically, several combinations of designs, training each NN model using the same dataset and evaluating which combination of hyperparameters results in the lowers training and validation losses.

Neuron: A neuron in a neural network is where the sum of the weight multiplied by the input is computed and a bias is added. A large number of neurons can be used, however, some popular number of neurons used are 32, 64 or 128 (aligning with hardware for efficient calculation).
Activation Function: The net output from the neuron is passed through an activation function. The type of activation function can be selected as hyperparameter by the user, whereas the weights and biases are adjusted during the training process. The choice of activation function(s) depends on the problem you are trying to solve. Some examples are Rectified linear function (ReLU), SoftMax and Sigmoid Function. The purpose of this activation function is to introduce non-linearity into the output of a neuron, making the model much more versatile and capable of modelling very complex problems despite the very simple mathematical structure.
Loss: In the neural network you also need to define the loss which is the metric actually used during the optimisation or training of a model. Depending on the type of model, different loss functions can be defined such as Categorical Cross-Entropy and Binary Cross-Entropy (for categorical-type problems) and Mean Squared Error or Mean Absolute Error (for regression-type problems).
Optimiser: These are crucial in assisting the network in learning to generate ever-better predictions during training. They assist in determining the optimal set of model parameters (weights and biases) so that the model can generate the best results for the problem it is solving. There are many types of optimisers that can be used such as stochastic gradient descent (SGD), SGD with decay, SGD with momentum, AdaGrad (Adaptive Gradient), RMSProp (Root Mean Square Propagation), and Adam (Adaptive Momentum). Adaptive Momentum is the most widely used optimiser as is often capable of finding the global minimum and avoiding getting stuck in local minima.
Input Layer: The input layer is the first layer where all the inputs are given for the model. The number of neurons for the input layer depends on the number of features or inputs in the dataset.
Hidden Layer(s): The hidden layer obtains the data from the input or previous layers. In the model there can be as many hidden layers as necessary and each hidden layer can have a different number of neurons.
Output Layer: The output layer is the the last layer of the neural network. The number of neurons in the output corresponds to the number of outputs or variables representing true values there are.
Learning Rate: Choosing the correct value for learning rate is important because if the learning rate is too small then this can result in the training process being too long or it could get stuck. However, if the value is too large then it could result in a sub-optimal set of weights. Learning rate values range between 0 to 1. The learning rate determines the amount that the model will change in response to the estimated error every time the model weights are changed. This is similar to the gain in a PID controller.
Epochs: In neural networks a forward and a backward pass together is counted as one iteration during the training sessions. The number of epochs may be interpreted as the total number of iterations the algorithm has made across the training dataset.

The additional parameters below are only for Convolutional Neural Networks.

Convolutional Layer: In the convolutional layer the dot product between two matrices corresponding to a section of the input layer and the kernel is preformed. The kernel slides across the height and width of the dataset, each time performing a dot product. The result of the multiple dot products is called an activation map. Usually an activation function, similar to that used for DFF NN, is applied to the activation map.
Pooling Layer: The pooling layer summarises statistics of nearby outputs from the activation map. There are different types of pooling functions that can be used such as average rectangular neighbourhood, L2 norm of the rectangular neighbourhood, or max pooling. Max pooling is the most popular type of pooling function.

Creating a Neural Network

In order to create and run a neural network you first need to collect data for all the inputs and outputs you require. For example, process data such as temperature, pressure, input flows, tank levels, etc. The image below shows the workflow of how a NN can be set up and trained:

NN workflow.jpg

Typically, it is a good idea to pre-process the data, such as by shuffling and scaling the data before passing it through the neural network. To avoid overfitting, it is also recommended to take about 20% of the data collected for validation while the remaining 80% will be run through the neural network during training.

Many different tools can be used to obtain the final optimised parameters for a NN. For example, in Python some available tools include TensorFlow, TensorBoard and PyTorch. Other programming languages that can be used are R, Java, C++, and many more.

When running a neural network there are many parameters you can change. For DFF it is recommended to manipulate the following (example values given in parentheses): learning rate (0.01), number of epochs (10000), number of hidden layers (3) and number of neurons (64). Depending on your project, you may want to adjust the values and compare which model gives the best results (lowest loss). If you created a Convolutional Neural Network, then there are a few more parameters to adjust, such as time window size, number for convolutional layers, number of convolutional filters, size of kernel, and pooling layer size.

Finally, it is very important to save the model structure, including the final weights and biases!

Neural Network Examples in SysCAD

There are several ways to use SysCAD with Machine Learning problems. For example, we could use a detailed calibrated SysCAD model of a process to generate a large dataset by running scenarios covering a wide range of input conditions and generating the corresponding outputs, then use that dataset in a NN model.

Alternatively, as in the example we show below, we can incorporate a NN model as a unit model within a SysCAD project, where the dataset used for training and validation has been generated externally to SysCAD or from another SysCAD model. To be able to import and calculate a forward pass with the optimised set of parameters (weights and bias for each layer, kernel and or pooling layers), SysCAD can use a NN API to load the parameters and calculate the output of a NN model at each SysCAD iteration while interacting with other SysCAD unit models, in the form of a controller or reactor unit model. In the simplest case, custom PGM code can be implemented to perform the NN forward pass, or as shown in the example below, a custom made unit model running a development C++ NN API can be used directly in place.

KWA's NN API

This is a set of C++ dynamic libraries (DLL) that enable SysCAD to load an optimised DFF or ConvNet model from a file or set of files or use a Mass Balance Reactor based on a System Component matrix [math]\displaystyle{ S }[/math] and its corresponding basis vectors [math]\displaystyle{ b_i }[/math], where the transformation coefficients [math]\displaystyle{ \lambda_i }[/math] are calculated based on the optimised DFF parameters for a given set of input amounts and target temperature. An example of this is shown in the next section.
The optimised parameters for the DFF or ConvNet models can be generated using the same KWA NN API outside SysCAD or some of the publicly available libraries such as TensorFlow, Pytorch, etc.

Once the NN model parameters are loaded, the NN API is used by a SysCAD controller or reactor unit model, providing the necessary inputs to the NN model, calculating the outputs and transferring those outputs back to the SysCAD model.

Mass Balance Neural Network Reactor

An example of where a mass balance neural network reactor can be used is for a combustion system of CH4 + O2. A mass balance system can be defined by system components (SC) and phase constituents (phC). Both the system components and phase constituents are related by a stoichiometric matrix ([math]\displaystyle{ S }[/math]). A set of independent orthonormal basis vectors [math]\displaystyle{ \vec{(b_{i})} }[/math] can be obtained for the [math]\displaystyle{ S }[/math] matrix by calculating the nullspace of [math]\displaystyle{ S }[/math]. The number of vectors (nullity) represents the DOF's (degree of freedom). Any mass transfer (generation or consumption) in the system with no net overall mass change can be calculated by a set of transformation coefficients ([math]\displaystyle{ \lambda_{i} }[/math]):

[math]\displaystyle{ \vec{y_{out}}[moles]= \vec{y_{in}} + \sum_{i=1}^{DOF} \lambda^{}_{i} \cdot \vec{(b_{i})} }[/math]

The Stoichiometric Matrix and Basis Vector for a CH4+ O2 combustion system (gas phase only) is shown below, where 6 phase constituents (rows in [math]\displaystyle{ S }[/math]) and 3 system components (columns in [math]\displaystyle{ S }[/math]) are considered.

It is important to note that there are infinite sets of independent basis vectors and for this example a rref (reduced row echelon form) of the basis vectors was chosen for convenience only, as it makes easy to calculate the transformation coefficients given a training data set.

[math]\displaystyle{ S =\matrix{H2 \\ CH4\\ O2\\ H2O\\ CO\\ CO2\\} \pmatrix{O & C & H \\ 0 & 0 & 2 \\ 0 & 1 & 4 \\ 2 & 0 & 0 \\ 1 & 0 & 2 \\ 1 & 1 & 0 \\ 2 & 1 & 0 \\} }[/math] ; [math]\displaystyle{ \vec{b} = \pmatrix{1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ -1 & -2 & 0 \\ -1 & -4 & 2 \\ 1 & 3 & -2} }[/math]

The overall mass balance equation for a CH4 + O2 combustion system is shown below.

[math]\displaystyle{ \pmatrix{{y}_{H_2} \\{y}_{CH_4} \\{y}_{O_2} \\{y}_{H_2O} \\{y}_{CO} \\{y}_{CO_2}}_{out} = \pmatrix{{y}_{H_2} \\{y}_{CH_4} \\{y}_{O_2} \\ {y}_{H_2O} \\{y}_{CO} \\{y}_{CO_2}}_{in} + \lambda_{1} \pmatrix{1 \\ 0 \\ 0 \\-1 \\-1 \\1} + \lambda_{2} \pmatrix{0 \\1 \\0 \\-2 \\-4 \\3} + \lambda_{3} \pmatrix{ 0 \\0 \\1 \\0 \\2 \\-2} }[/math]

For any initial stream amount vector, arbitrary transformation coefficients [math]\displaystyle{ \lambda }[/math] will produce a "valid" out amount vector, ensuring strict mass balance is conserved, even considering the natural or expected uncertainties (losses) from a regression neural network model. This is, the total mass of the system is strictly conserve, however, this does not guaranties all phase constituents will have positive mass, that is a problem that needs to be address by improving accuracy of the model and by implementing mechanisms to ensure all output masses are positive.

The problem we are demonstrating now, consists of finding a set of [math]\displaystyle{ \lambda }[/math] values given a known set of input amounts, T and pressure. This is similar to solve a thermodynamic equilibrium problem for the CH4 O2 system, where given a set of input amounts, T and pressure, the thermodynamic model (GFEM or TCE) can calculate the set of output amounts representing the most stable state. However, in this case, in order to find a reasonable set of [math]\displaystyle{ \lambda }[/math] values for each input set, we can use a trained Neural Network, where the training and validation datasets are generated using an equilibrium thermodynamics model for this system.

When creating the neural network, true values for [math]\displaystyle{ \lambda }[/math] need to be calculated for each set of input amounts, T, P and true output amounts. This can be done using the following equation for [math]\displaystyle{ \lambda_{1} }[/math], [math]\displaystyle{ \lambda_{2} }[/math] and [math]\displaystyle{ \lambda_{3} }[/math]:

[math]\displaystyle{ \lambda_{1}= y_{1,out}-y_{1,in}=y_{H_2,out}-y_{H_2,in} }[/math]

[math]\displaystyle{ \lambda_{2}= y_{2,out}-y_{2,in}=y_{CH_4,out}-y_{CH_4,in} }[/math]

[math]\displaystyle{ \lambda_{3}= y_{3,out}-y_{3,in}=y_{O_2,out}-y_{O_2,in} }[/math]

Once the input and true output dataset is either collected or calculated, this can be used to train or fit a neural network model. For this example, a set of 2000 random combinations of input amounts for all 6 phase constituents and random temperature between 300 and 6000 K were generated as input/training data.

The input amounts were normalized so that the sum of all inputs added up to 1, whereas the temperature was divided (scaled down) by the maximum temperature used in this example (6000 K). This ensures that the NN model will work for any set of inputs and temperature, as long as normalized and scaled inputs are provided before performing a forward pass through the trained model. This step is integrated in the NN API.

To train the neural network, almost any programming language can be used such as Python or C# (we will show examples of this in the next part of this discussion). Various parameters such as the number of epochs, number of hidden layers, type of optimizer, learning rate, and decay rate were adjusted to determine the optimal set of weights and bias, and the structure of the neural network.

For this example, the best the parameters were 2 hidden layers, 64 nodes for the hidden layers, Adaptive Momentum (Adam) for the optimizer, a learning rate of 6e-3, a decay rate of 5e-3, and 100000 epochs. The results for the training session for [math]\displaystyle{ \lambda_{1} }[/math], [math]\displaystyle{ \lambda_{2} }[/math] and [math]\displaystyle{ \lambda_{3} }[/math] are shown in the following figures below, where the output of the initially randomized weights is shown at iteration 0, then the results after the 1st and 2nd iterations and finally the results after the last iteration (after 100000 epochs).

As seen below the results for [math]\displaystyle{ \lambda_{1} }[/math], [math]\displaystyle{ \lambda_{2} }[/math] and [math]\displaystyle{ \lambda_{3} }[/math] for the last iteration was very accurate.

Training Results [math]\displaystyle{ \lambda_{1} }[/math]
Training Results [math]\displaystyle{ \lambda_{2} }[/math]
Training Results [math]\displaystyle{ \lambda_{3} }[/math]

The accuracy and loss throughout the training was also calculated. The following graphs shows the accuracy and the loss of the model where the overall accuracy was approximately 70% and the loss was 1e-5 over 100000 epochs.

Training Accuracy
Training Loss

Once the model has been trained, the NN API allows for the optimised parameters to be loaded and used by a SysCAD Mass Balance Reactor as shown in the image below.

NN Mass Balance Reactor on SysCAD

A validation dataset was use to test the model. In this case, fixed amounts of inputs, for example 1 mol of CH4 and 2 moles of O2 within a range of temperature were tested in the SysCAD model.

On the left side is the results using a TCE Model in SysCAD (considered as true values) and on the right side is the results using the NN Mass Balance Reactor Model with the optimised model.

TCE Model
Neural Network Mass Balance Trained Model

When comparing both models, the results are very close and with very good agreement, showing how NN can be used in various different scenarios for process simulation given a good set of supervised data is available.

Why Use a NN Model?

So after looking at this high-level description of supervised machine learning application in SysCAD and looking at the mass balance example for the combustion of methane, one might ask, is it worthwhile? is it more advantageous than just use a GFEM or TCE model or another high definition model. Well, in some cases, yes.

There are various pros and also some cons to this approach, listed below are some advantages and disadvantages of using NN:

Advantages:

  • Neural Networks have simple structures and calculations (sums and multiplications)
  • They are relatively fast to calculate since they do not require an iterative process once trained and does not require highly complex calculations. This is where they really make a different when compared to solving the same problem but using a highly complex and non-ideal thermodynamic solver. A NN model can be seen as an accelerator. It does not replace entirely the thermodynamic model but when implemented and embedded in a large model with multiple recirculation streams and NN models, the gain in speed can drastically reduce the overall model computation time
  • Neural Networks can be used for a wide range of applications and several resources are available. Can be used to model non-linear relationships, pattern recognition, and classification problems. This versatility, allows to add more features to a model that otherwise would be difficult to incorporate.
  • Depending on the amount and quality of training data, the neural network can achieve highly accurate results.
  • Neural Network models can be use for forecasting model outputs, for example, in a time-dependent model, a NN model can be use to predict the immediate outcome, whereas at the same time, a second model that was trained to predict 10, 30, 60 minutes or 1 day ahead of time, can make predictions (probably with lower accuracy but correct trend) can be used in parallel to act as early warning and support in decision making.
  • Neural Networks are the backbone of most Machine Learning algorithms.

Disadvantages:

  • Require large and supervised (checked) dataset. Depending on the problem you are trying to solve it might take a long time to collect all the data.
  • Training might be difficult and lengthy.
  • Limited extrapolation (might not perform as well outside of training dataset). This of course is something to address during training sessions, having a proper balance between training and validation datasets. It is easy to fall into overtraining a model, making it too rigid and compromising the performance of the model outside the training dataset.
  • Despite of the flexibility that ML and NN models provide, there is also a certain rigidity. For example, in the case of the mass balance problem shown above, if we decide to add more phase constituents or system components to the model, then we would need to generate a new training and validation dataset and fit the model again before we can use it in a SysCAD model.

What's Next?

In the next part of this series, we will go in detail in how to create a NN model for steady state process simulation from scratch, going through each step about creating a GFEM model to generate the training and validation dataset, training the model and optimising the model's hyperparameters and then implementing the newly trained model back in a SysCAD project.

References

  1. I. Goodfellow, Y. Bengio, and A. Courville, “Chapter 9: Convolutional Networks,” in Deep learning, MIT Press, 2018, pp. 326–366
  2. M. Mishra, “Convolutional Neural Networks, explained,” Medium, https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
  3. J. Brownlee, “Understand the impact of learning rate on neural network performance,” MachineLearningMastery.com, https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/
  4. K. Nyuytiymbiy, “Parameters and hyperparameters in machine learning and Deep Learning,” Medium, https://towardsdatascience.com/parameters-and-hyperparameters-aa609601a9ac
  5. J. Brownlee, “Understand the impact of learning rate on neural network performance,” MachineLearningMastery.com, https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/
  6. R. Bagheri, “An introduction to Deep Feedforward Neural Networks,” Medium, https://towardsdatascience.com/an-introduction-to-deep-feedforward-neural-networks-1af281e306cd