Discussion: Machine Learning and Neural Networks for Process Simulation: Difference between revisions

From SysCAD Documentation
Jump to navigation Jump to search
Line 192: Line 192:


==Why Use a NN model?==
==Why Use a NN model?==
Listed below are some advantages and disadvantages of using NN:
So after looking at this high-level description of supervised machine learning application in SysCAD and looking at the mass balance example for the combustion of methane, one might ask, is it worthwhile? is it more advantageous than just use a GFEM or TCE model or another high definition model. Well, in some cases, yes.<br>
There are various pros and also some cons to this approach, listed below are some advantages and disadvantages of using NN:


'''Advantages:'''
'''Advantages:'''
*Neural Networks have simple structures and calculations (sums and multiplications)
*Neural Networks have simple structures and calculations (sums and multiplications)
*They are relatively fast to calculate since they do not require an iterative process once trained and does not require highly complex calculations.
*They are relatively fast to calculate since they do not require an iterative process once trained and does not require highly complex calculations. This is where they really make a different when compared to solving the same problem but using a highly complex and non-ideal thermodynamic solver. A NN model can be seen as an accelerator. It does not replace entirely the thermodynamic model but when implemented and embedded in a large model with multiple recirculation streams and NN models, the gain in speed can drastically reduce the overall model computation time
*Neural Networks can be used for a wide range of applications and several resources are available. Can be used to model non-linear relationships, pattern recognition, and classification problems.
*Neural Networks can be used for a wide range of applications and several resources are available. Can be used to model non-linear relationships, pattern recognition, and classification problems. This versatility, allows to add more features to a model that otherwise would be difficult to incorporate.
*Depending on the amount and quality of training data, the neural network can achieve highly accuracy results.
*Depending on the amount and quality of training data, the neural network can achieve highly accurate results.
*Neural Network models can be use for forecasting model outputs, for example, in a time-dependent model, a NN model can be use to predict the immediate outcome, whereas at the same time, a second model that was trained to predict 10, 30, 60 minutes or 1 day ahead of time, can make predictions (probably with lower accuracy but correct trend) can be used in parallel to act as early warning and support in decision making.  
*Neural Networks are the backbone of most Machine Learning algorithms.
*Neural Networks are the backbone of most Machine Learning algorithms.


'''Disadvantage:'''
'''Disadvantages:'''
*Require large and supervised (checked) dataset. Depending on the problem you are trying to solve it might take a long time to collect all the data.
*Require large and supervised (checked) dataset. Depending on the problem you are trying to solve it might take a long time to collect all the data.
*Training might be difficult and lengthy.
*Training might be difficult and lengthy.
*Limited extrapolation (might not perform as well outside of training dataset).
*Limited extrapolation (might not perform as well outside of training dataset). This of course is something to address during training sessions, having a proper balance between training and validation datasets. It is easy to fall into overtraining a model, making it too rigid and compromising the performance of the model outside the training dataset.
*Despite of the flexibility that ML and NN models provide, there is also a certain rigidity. For example, in the case of the mass balance problem shown above, if we decide to add more phase constituents or system components to the model, then we would need to generate a new training and validation dataset and fit the model again before we can use it in a SysCAD model.


==References==
==References==

Revision as of 21:07, 31 October 2023

BlogIcon.png This is a Discussion Page with supplementary user information. It is not part of the core SysCAD Help Documentation - please refer to the User Guide for full documentation links.

Navigation: Product Blog ➔ Discussion Pages ➔ Machine Learning and AI

Related Links:


NOTE: The following Discussion Page uses tensorflow to train or optimized neural networks models to use in SysCAD, however, other tools can be used to create deep fully-connected or convolutional neural networks.

What is Machine Learning?

Machine Learning (ML) is a branch of artificial intelligence (AI) that is based on the idea that systems are capable of learning from data, recognizing patterns, and making decisions with minimal human intervention. In ML there are three main categories which are: supervised learning, unsupervised learning and reinforcement learning.

  • Supervised Learning uses datasets for training which contain labelled or true observations for each input set. The algorithm then makes predictions that are compared with the actual output values. If the predictions are not within a certain tolerance range, then the algorithm is modified until it achieves the correct output. Typically, this type of machine learning requires or benefit from a large dataset and it is typically associated to the use of artificial neural networks.
  • Unsupervised Learning uses unlabelled data, that is, the dataset does not include true or known outputs. In unsupervised learning the algorithm tries to discover a pattern to solve by either clustering (grouping data based on similarities or differences), association (finding relationships between variables), or dimensionality reduction (for example, principal component analysis).
  • Reinforcement Learning is similar to supervised learning but the model is not trained using a sample dataset of true values. The model learns as it goes in a sort of trial and error whereby a system of reward or "scoring" is used to reinforce the behavior of the model to achieve certain objective.

In this discussion page we will focus on Supervised Learning, specifically looking at the use of artificial neural network models used withing a SysCAD project. In later installments of this discussion series we will look at other types of ML with SysCAD.

Part 1 - Supervised learning and Neural Networks applications in SysCAD, an overview

What are Neural Networks?

Neural Networks (or Artificial NN) mimic the structure of real neurons, perceiving (or sensing) inputs and firing signals through a net of neurons. There are various types of neural networks that can be used. However, on this page the focus will be on two types of neural networks that have been used with SysCAD which are Deep Feed Forward (DFF) and Convolutional Neural Networks (ConvNet).
At a high-level, Neural Networks (NN) can be distinguished based on their intended purpose. There are Categorical NN's, used for classification (Ex: classifying an image and identifying what number it represents, assigning a binary output like true or false given an input or question, etc.) or Regression NN, used to predict one or a set of numerical values, similar to a mathematical function (ex: mass of compounds in a process stream, temperature, size fraction, etc.). The structure or components of an artificial neural network are similar for both cases (categorical vs regression), both can be implemented in SysCAD for example to solve a mass balance in a unit model representing a reactor of some kind, or to predict a quality (color of a given stream) or state of a process (ex: out of spec product, equipment failure or not, etc.).
In terms of structure, there are several types and different arrangements for NN. Below we describe two that have been used with SysCAD, that is Deep Feed Forward (DFF) and Convolutional (ConvNet) Neural Networks.

Deep Feed Forward Neural Network

This type of neural network is the most common type of neural network. The Deep Feed Forward Neural Network (DFF) structure includes an input layer, output layer, and at least one hidden layer containing several neurons, weights, bias and an activation function. The image below represents the structure of a DFF neural network.

Deep Feed Forward Neural Network

Convolutional Neural Network

Convolutional Neural Networks (ConvNet or CNN) is an extended form of the deep feed forward neural network and is primarily utilised for feature extraction from a grid-like matrix dataset. ConvNets are very powerful tools, typically used for image recognition or signal processing, however, in SysCAD, we have used ConvNet for dynamic or time-based process simulation with great success, with much better performance (lowest loss) than DFF NN using the same training and validation dataset.
CNN has many layers which include an input layer (typically a 2D or greater dimension matrix-type), oner or more convolutional and pooling layers followed by a fully connected layer represented by a DFF.

Convolutional Neural Network

Neural Network Parameters

Neural Network parameters are learnable or trainable values. The final value of these parameters is obtained by an optimization process (training or fit) where successive adjustments to these values are made in order to minimize the error (or loss) between the predicted values of the output layer and the true or ground values for each sample in the dataset. For a DFF network, these are weights and biases which impact the behavior of each neuron. Weights are usually randomized and biases are zeroed before the learning session begins. Together with an activation functions, they also allow the model to propagate forward and produce an acceptable output.

Weight: The weight is multiplied by the input value entering the node. The weights represents the strength of a node.

Bias: The output value from a neuron can be shifted left or right using a bias. The bias can be compared to the y-intercept in a linear equation.

For a ConvNet, there are additional trainable parameters. These are the coefficients of a kernel which are used to extract features from the input matrix and also reduce the dimensions or size of the inputs.

Kernel: It is a matrix or set of matrices with a smaller size than the input layer, used to calculate the dot product between a section of the input layer and the kernel. Then, the kernel is displaced, sort of like sweeping the input matrix and calculating again the dot product, this process is repeated until the entire input is processed, all the results from the sweeping dot product operation form a new layer which is typically of a smaller dimension than the input. The vales of this kernel, need to be optimized during the training process.

Neural Network Hyperparameters

Below are some of the hyperparameters that can be modified when setting up a neural network model. These are not trainable parameter but rather design parameters defining the structure or makeup of the NN model.
Initially, one might not know how many layers and how many neurons per layer are going to result in the lowest validation loss. Similarly, the use of one or more convolutional layers, number of kernels (each convolutional layer might have several kernels, called filters), the size of the kernel window, whether to use or not pooling layer at all, etc.
The process by which one determines the final NN design is is often referred to as hyperparameter tunning or optimization and involves running parametrically, several combinations of designs, train each NN model using the same dataset and evaluating which combination of hyperparameters results in the lowers training and validation losses.

Weight bias.png

Neuron: A neuron in a neural network is where the sum of the weight multiplied by the input is computed and a bias is added. A large number of neurons can be used, however, some popular number of neurons used are 32, 64 or 128. The mathematical computation done for each node is represented by the formula below:


[math]\displaystyle{ output= {\sum_{i=1}^{n_{inputs}}({input_i*weight_i}}) +bias }[/math]


Activation Function: The net output from the neuron is then passed through an activation function that act on the entire layer. The type of activation function can be selected as hyperparameter by the user whereas the weights and biases are adjusted during the training process. The activation functions you can choose depend on the problem you are trying to solve. Some examples are Rectified linear function (ReLU), Softmax and Sigmoid Function.
The purpose of adding this activation function to the model is to introduce non-linearity into the output of a neuron, making the model much more versatile and capable of modelling very complex problems with very simple mathematical structure.

Loss: In the neural network you also need to define the loss which is the metric actually used during the optimization or training of a model. Depending on the type of model different loss functions can be defined such as Categorical Cross-Entropy and Binary Cross-Entropy for categorical-type of problems and Mean Squared Error or Mean Absolute Error, for regression-type of problems.

Optimizer: Another parameter that can be changed is the optimizer. A critical part of neural network architecture are optimizers. They are crucial in assisting the network in learning to generate ever-better predictions during training. They assist in determining the optimal set of model parameters such as weights and biases so that the model can generate the best results for the problem they are solving. There are many types of optimizers that can be used such as stochastic gradient descent (SGD), SGD with decay, SGD with momentum, AdaGrad (Adaptive Gradient), RMSProp (Root Mean Square Propagation), and Adam (Adaptive Momentum). If you are unsure of which optimizer to use, Adaptive Momentum is the most widely used optimizer as it is often capable of avoiding getting stuck in local minima and find the global minimum, instead.

Input Layer: The input layer is the first layer where all the inputs are given for the model. The number of neurons for the input layer depends on the number of features or inputs in the dataset.

Hidden Layer: The hidden layer obtains the data from the input or previous layers. In the model there can be as many hidden layers as necessary and each hidden layer can have a different number of neurons.

Output Layer: The output layer is the the last layer of the neural network. The number of neurons in the output corresponds to the number of outputs or variables representing true values there are.

Learning rate: Choosing the correct value for learning rate is important because if the learning rate is too small then this can result in the training process being too long or it could get stuck. However, if the value is too large then it could result in a sub-optimal set of weights. Learning rate values range between 0 to 1. The learning rate determines the amount that the model will change in response to the estimated error every time the model weights are changed.

Epochs: In neural networks a forward and a backward pass together is counted as one iteration during the training sessions. The number of epochs may be interpreted as the total number of iterations the algorithm has made across the training dataset.

The additional parameters below are only for Convolutional Neural Networks.

Convolutional Layer: In the convolutional layer the dot product between two matrices corresponding to a section of the input layer and the kernel is preformed. The kernel slides across the height and width of the dataset, each time performing a dot product. The result of the multiple dot products is called an activation map. Usually and activation function, similar to the one for DFF NN is applied to the activation map.

Pooling Layer: The pooling layer summarizes statistics of nearby outputs from the activation map. There are different types of pooling functions that can be used such as average rectangular neighborhood, L2 norm of the rectangular neighborhood or max pooling. Max pooling is the most popular type of pooling function.

Overview of Creating an Optimised Neural Network Model

In order to create and run a neural network you first need collect data for all the inputs and outputs you need. For example, data such as temperature or pressure, input amounts, levels, etc. The image below shows the workflow of how the Neural Network can be setup and trained.

Creating a Neural Networks

Typically, it is a good idea to preprocess the data such as shuffling and scaling the data before passing it through the neural network. In addition, it is recommended to take about 20% of the data collected for validation while the remaining 80% will be run through the neural network during training.

For creating a neural network many different tools can be used to obtain the final optimized parameters. For example, in Python some available tools include TensorFlow, TensorBoard or PyTorch. Other programming languages that can be used if you are familiar with them are R, Java, C++ and many more.

When running a neural network there are many parameters you can change. For the Deep Feed Forward it is recommended to change the learning rate, number of epochs, number of hidden layers and number of neurons. An example of values that can be chosen for these parameters are 1e-2, 10000, 3 and 64 respectively. However, depending on your project you might want to adjust the values and compare which model gives the best results (lowest loss). If you created a Convolutional Neural Network, then there are a few more parameters you need to choose from in addition to the parameters stated above. It may be of interest for the project you are working on to look at changing the parameters for time window size, number for convolutional layers, number of convolutional filters, size of kernel, and pooling layer size.

Finally, it is very important to save the model structure! This includes the final weights and biases.

Neural Networks examples in SysCAD

There could be several ways to use SysCAD with Machine Learning problems. One option, for example, is to use a fully calibrated or validated high definition SysCAD model of a process to generate a large dataset by running several scenarios covering a wide range of input conditions and generating the corresponding outputs, then use that dataset in a NN model.
Alternatively, the example we show below, incorporates a NN model as a unit model within a SysCAD project, where the dataset used for training and validation could have been generated externally to SysCAD or from another SysCAD model. To be able to import and calculate a forward pass with the optimised set of parameters (weights and bias for each layer, kernel and or pooling layers), SysCAD can use a NN API which allows to load the parameters and calculate the output of a NN model at each SysCAD iteration while interacting with other SysCAD unit models, in the form of a controller or reactor unit model.
In the simplest case, a custom PGM code can be implemented to perform the NN forward pass, or as shown in the example below, a custom made unit model running a development C++ NN API can be directly used.

NN API

This is a C++ dynamic library (dll) that enables SysCAD to load an optimised DFF or ConvNet model from a file or use a Mass Balance Reactor based on transformation coefficients as described in more detail below.
The optimised parameters can be generated using the same NN API or some of the publicly available libraries such as TensorFlow, Pytorch, etc.
Once the NN model parameters are loaded, the NN API is used by a SysCAD controller or reactor unit model, providing the necessary inputs to the NN model, calculating the outputs and transferring those outputs back to the SysCAD model.

Mass Balance Neural Network Reactor

An example of where a mass balance neural network reactor can be used is for a combustion system of CH4 + O2. A mass balance system can be defined by system components (SC) and phase constituents (phC). Both the system components and phase constituents are related by a stoichiometric matrix ([math]\displaystyle{ S }[/math]). A set of independent orthonormal basis vectors [math]\displaystyle{ \vec{(b_{i})} }[/math] can be obtained for the [math]\displaystyle{ S }[/math] matrix by calculating the nullspace of [math]\displaystyle{ S }[/math]. The number of vectors (nullity) represents the DOF's (degree of freedom). Any mass transfer (generation or consumption) in the system with no net overall mass change can be calculated by a set of transformation coefficients ([math]\displaystyle{ \lambda_{i} }[/math]):


[math]\displaystyle{ \vec{y_{out}}[moles]= \vec{y_{in}} + \sum_{i=1}^{DOF} \lambda^{}_{i} \cdot \vec{(b_{i})} }[/math]

The Stoichiometric Matrix and Basis Vector for a CH4+ O2 combustion system (gas phase only) is shown below, where 6 phase constituents (rows in [math]\displaystyle{ S }[/math]) and 3 system components (columns in [math]\displaystyle{ S }[/math]) are considered.
It is important to note that there are infinite sets of independent basis vectors and for this example a rref (reduced row echelon form) of the basis vectors was chosen for convenience only, as it makes easy to calculate the transformation coefficients given a training data set.

[math]\displaystyle{ S =\matrix{H2 \\ CH4\\ O2\\ H2O\\ CO\\ CO2\\} \pmatrix{O & C & H \\ 0 & 0 & 2 \\ 0 & 1 & 4 \\ 2 & 0 & 0 \\ 1 & 0 & 2 \\ 1 & 1 & 0 \\ 2 & 1 & 0 \\} }[/math] ; [math]\displaystyle{ \vec{b} = \pmatrix{1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ -1 & -2 & 0 \\ -1 & -4 & 2 \\ 1 & 3 & -2} }[/math]

The overall mass balance equation for a CH4 + O2 combustion system is shown below.

[math]\displaystyle{ \pmatrix{{y}_{H_2} \\{y}_{CH_4} \\{y}_{O_2} \\{y}_{H_2O} \\{y}_{CO} \\{y}_{CO_2}}_{out} = \pmatrix{{y}_{H_2} \\{y}_{CH_4} \\{y}_{O_2} \\ {y}_{H_2O} \\{y}_{CO} \\{y}_{CO_2}}_{in} + \lambda_{1} \pmatrix{1 \\ 0 \\ 0 \\-1 \\-1 \\1} + \lambda_{2} \pmatrix{0 \\1 \\0 \\-2 \\-4 \\3} + \lambda_{3} \pmatrix{ 0 \\0 \\1 \\0 \\2 \\-2} }[/math]

For any initial stream amount vector, arbitrary transformation coefficients [math]\displaystyle{ \lambda }[/math] will produce a "valid" out amount vector, ensuring strict mass balance is conserved, even considering the natural or expected uncertainties (losses) from a regression neural network model. This is, the total mass of the system is strictly conserve, however, this does not guaranties all phase constituents will have positive mass, that is a problem that needs to be address by improving accuracy of the model and by implementing mechanisms to ensure all output masses are positive.
The problem we are demonstrating now, consists of finding a set of [math]\displaystyle{ \lambda }[/math] values given a known set of input amounts, T and pressure. This is similar to solve a thermodynamic equilibrium problem for the CH4 O2 system, where given a set of input amounts, T and pressure, the thermodynamic model (GFEM or TCE) can calculate the set of output amounts representing the most stable state. However, in this case, in order to find a reasonable set of [math]\displaystyle{ \lambda }[/math] values for each input set, we can use a trained Neural Network, where the training and validation datasets are generated using an equilibrium thermodynamics model for this system.

When creating the neural network, true values for [math]\displaystyle{ \lambda }[/math] need to be calculated for each set of input amounts, T, P and true output amounts. This can be done using the following equation for [math]\displaystyle{ \lambda_{1} }[/math], [math]\displaystyle{ \lambda_{2} }[/math] and [math]\displaystyle{ \lambda_{3} }[/math]:

[math]\displaystyle{ \lambda_{1}= y_{1,out}-y_{1,in}=y_{H_2,out}-y_{H_2,in} }[/math]

[math]\displaystyle{ \lambda_{2}= y_{2,out}-y_{2,in}=y_{CH_4,out}-y_{CH_4,in} }[/math]

[math]\displaystyle{ \lambda_{3}= y_{3,out}-y_{3,in}=y_{O_2,out}-y_{O_2,in} }[/math]

Once the input and true output dataset is either collected or calculated, this can be used to train or fit a neural network model. For this example, a set of 2000 random combinations of input amounts for all 6 phase constituents and random temperature between 300 and 6000 K were generated as input/training data.
The input amounts were normalized so that the sum of all inputs added up to 1, whereas the temperature was divided (scaled down) by the maximum temperature used in this example (6000 K). This ensures that the NN model will work for any set of inputs and temperature, as long as normalized and scaled inputs are provided before performing a forward pass through the trained model. This step is integrated in the NN API.
To train the neural network, almost any programming language can be used such as Python or C# (we will show examples of this in the next part of this discussion). Various parameters such as the number of epochs, number of hidden layers, type of optimizer, learning rate, and decay rate were adjusted to determine the optimal set of weights and bias, and the structure of the neural network.

For this example, the best the parameters were 2 hidden layers, 64 nodes for the hidden layers, Adaptive Momentum (Adam) for the optimizer, a learning rate of 6e-3, a decay rate of 5e-3, and 100000 epochs. The results for the training session for [math]\displaystyle{ \lambda_{1} }[/math], [math]\displaystyle{ \lambda_{2} }[/math] and [math]\displaystyle{ \lambda_{3} }[/math] are shown in the following figures below, where the output of the initially randomized weights is shown at iteration 0, then the results after the 1st and 2nd iterations and finally the results after the last iteration (after 100000 epochs).
As seen below the results for [math]\displaystyle{ \lambda_{1} }[/math], [math]\displaystyle{ \lambda_{2} }[/math] and [math]\displaystyle{ \lambda_{3} }[/math] for the last iteration was very accurate.

Training Results [math]\displaystyle{ \lambda_{1} }[/math]
Training Results [math]\displaystyle{ \lambda_{2} }[/math]
Training Results [math]\displaystyle{ \lambda_{3} }[/math]

The accuracy and loss throughout the training was also calculated. The following graphs shows the accuracy and the loss of the model where the overall accuracy was approximately 70% and the loss was 1e-5 over 100000 epochs.

Training Accuracy
Training Loss

Once the model has been trained, the NN API allows for the optimised parameters to be loaded and used by a SysCAD Mass Balance Reactor as shown in the image below.

NN Mass Balance Reactor on SysCAD

A validation dataset was use to test the model. In this case, fixed amounts of inputs, for example 1 mol of CH4 and 2 moles of O2 within a range of temperature were tested in the SysCAD model.
On the left side is the results using a TCE Model in SysCAD (considered as true values) and on the right side is the results using the NN Mass Balance Reactor Model with the optimized model.

TCE Model
Neural Network Mass Balance Trained Model

When comparing both models, the results are very close and with very good agreement, showing how NN can be used in various different scenarios for process simulation given a good set of supervised data is available.

Why Use a NN model?

So after looking at this high-level description of supervised machine learning application in SysCAD and looking at the mass balance example for the combustion of methane, one might ask, is it worthwhile? is it more advantageous than just use a GFEM or TCE model or another high definition model. Well, in some cases, yes.
There are various pros and also some cons to this approach, listed below are some advantages and disadvantages of using NN:

Advantages:

  • Neural Networks have simple structures and calculations (sums and multiplications)
  • They are relatively fast to calculate since they do not require an iterative process once trained and does not require highly complex calculations. This is where they really make a different when compared to solving the same problem but using a highly complex and non-ideal thermodynamic solver. A NN model can be seen as an accelerator. It does not replace entirely the thermodynamic model but when implemented and embedded in a large model with multiple recirculation streams and NN models, the gain in speed can drastically reduce the overall model computation time
  • Neural Networks can be used for a wide range of applications and several resources are available. Can be used to model non-linear relationships, pattern recognition, and classification problems. This versatility, allows to add more features to a model that otherwise would be difficult to incorporate.
  • Depending on the amount and quality of training data, the neural network can achieve highly accurate results.
  • Neural Network models can be use for forecasting model outputs, for example, in a time-dependent model, a NN model can be use to predict the immediate outcome, whereas at the same time, a second model that was trained to predict 10, 30, 60 minutes or 1 day ahead of time, can make predictions (probably with lower accuracy but correct trend) can be used in parallel to act as early warning and support in decision making.
  • Neural Networks are the backbone of most Machine Learning algorithms.

Disadvantages:

  • Require large and supervised (checked) dataset. Depending on the problem you are trying to solve it might take a long time to collect all the data.
  • Training might be difficult and lengthy.
  • Limited extrapolation (might not perform as well outside of training dataset). This of course is something to address during training sessions, having a proper balance between training and validation datasets. It is easy to fall into overtraining a model, making it too rigid and compromising the performance of the model outside the training dataset.
  • Despite of the flexibility that ML and NN models provide, there is also a certain rigidity. For example, in the case of the mass balance problem shown above, if we decide to add more phase constituents or system components to the model, then we would need to generate a new training and validation dataset and fit the model again before we can use it in a SysCAD model.

References

I. Goodfellow, Y. Bengio, and A. Courville, “Chapter 9: Convolutional Networks,” in Deep learning, MIT Press, 2018, pp. 326–366

M. Mishra, “Convolutional Neural Networks, explained,” Medium, https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939

J. Brownlee, “Understand the impact of learning rate on neural network performance,” MachineLearningMastery.com, https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/

K. Nyuytiymbiy, “Parameters and hyperparameters in machine learning and Deep Learning,” Medium, https://towardsdatascience.com/parameters-and-hyperparameters-aa609601a9ac

J. Brownlee, “Understand the impact of learning rate on neural network performance,” MachineLearningMastery.com, https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/

R. Bagheri, “An introduction to Deep Feedforward Neural Networks,” Medium, https://towardsdatascience.com/an-introduction-to-deep-feedforward-neural-networks-1af281e306cd