Artificial Intelligence (AI) and Its Applications
Artificial Intelligence (AI) is already an integral part of our everyday lives. For example, search engines, like Google and Bing, use AI to rank search results based on relevance, location, past search history, search settings, etc. Another example is personalized recommendation engines, like the ones used by Amazon and Netflix, where AI is used to recommend personalized items or content based on user, behavior, preferences, historical data, etc. In addition, Large Language Models (LLMs), like ChatGPT and Gemini, use AI to answer questions, summarize text, and create content like poetry based on patterns and relationships learned from massive amounts of training data such as books, articles, web pages, and video transcripts.
This blog and the associated 30-slide presentation provide a reasonably comprehensive overview of AI fundamentals, along with its applications and ethical considerations.
Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and Generative AI
Artificial Intelligence (AI): The science and engineering of developing machines/computers that mimic human intelligence [1], [2], [3]. We are not yet fully there.
Machine Learning (ML): A subfield of AI that enables machines/computers to learn from past data to make predictions, recommendations, or classifications. ML uses traditional algorithms such as K-means clustering, decision trees, and Principal Component Analysis (PCA). ML requires a considerable amount of human intervention. Example: Prediction if a student will drop from school? Yes/No.
Deep Learning (DL): A subset of ML. DL also enables machines/computers to learn from past data to make predictions, recommendations, or classifications. However, it uses complex multilayered neural networks such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM) networks. DL requires less human intervention when compared to ML. Example: Facial emotion recognition (happy, sad, angry, etc).
Generative AI: A subset of DL that focuses on creating new content (image, text, audio, and more) from existing content (images, texts, or audios). It uses techniques such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Transformers. Example: Generate an image of a rabbit wearing sunglasses and sunhat.


AI is not New, why is it booming now?
AI, ML, and DL were introduced in the 1950s, 1980s, and 2010s, respectively [4], [5].
| Specialized computing Hardware (e.g., GPU) | Abundance of Data | Algorithms | Cloud Computing |
| ![]() | ![]() | ![]() |
| GPU (Graphics Processing Unit) has several advantages when compared to CPU (Central Processing Unit) • GPU has thousands of processor cores • Cores are running simultaneously, allowing for parallel processing • GPU has hundreds of gigabytes per second in memory bandwidth. | Data is the most important factor as it is the fuel for creating AI models • Vast amounts of data are now available from social media, sensors, and online transactions, medical imaging, etc. | Advancement in algorithms and techniques • Since 2010, remarkable algorithms and techniques have been developed to enhance the accuracy of neural networks. | Availability of large number of public cloud providers • Organizations do not need to have their own computing infrastructure because of the availability of scalable and cost-effective infrastructure to store and process large datasets. |
Types of Machine Learning
| Supervised Learning • Model training using labeled data • Human intervention to label data • Higher accuracy • Less computational complexity [6], [7] | Unsupervised Learning • Model training using unlabeled data • No human intervention to label data • Less accuracy • Higher computational complexity [8] | Reinforcement Learning • No data is directly available for training model, only an interactive environment • A trial-and-error method, where agent takes actions in environment and learns via reward for correct actions and punishment for wrong ones [9] | ||
Regression ![]() | Classification ![]() | Clustering ![]() | Association ![]() | ![]() |
| Prediction of continuous values (e.g., price, salary, etc) • Predict company’s stock price • Predict house price • Predict hourly energy consumption in hospital | Assigning unknown instances into one of the known discrete categories • Binary class.: spam, not spam • Multi-class class.: small, medium, large • Muli-label class.: e-mail is internal AND urgent AND confidential | Grouping of similar data items based on their characteristics • Customer segmentation: dividing customers into distinct clusters based on various customer attributes such as age, gender, income age, income, location, etc. | Discover the relationship between data items • X bought bread, milk, fruits, and wheat • Y bought bread, milk, rice, and butter • If C buys bread, it is highly likely that C will also buy milk | Agents are trained on a reward and punishment mechanism • AlphaGo (created by DeepMind) is based on RL techniques. By playing millions of games against itself, AlphaGo improved its strategies and defeated Ke Jie, the world’s number one player for the ancient board game Go Go player. |
Machine Learning vs. Deep Learning
“Traditional” Machine Learning
![]() | Advantages ✓ Training can be done using small to medium datasets ✓ Quick to train ✓ Requires moderate computing power. It can run on traditional Central Processing Unit (CPU) | Disadvantages ✗ Human intervention is required to extract features ✗ Manual feature extraction (e.g., shapes, sizes, colors, etc) can be a time-consuming and tedious task. |
| When to use Use when dealing structured data, limited data size, interpretable models, and handcrafted features [10], [11], [12]. | ||
Deep, “end-to-end” learning
![]() | Advantages ✓ No human intervention is required to extract features ✓ Feature extraction step is done automatically and implicitly by the deep learning model | Disadvantages ✗ Training requires large datasets ✗ Computationally intensive to train ✗ Requires significant computing power. It uses multiple Graphics Processing Units (GPUs) |
| When to use Use when dealing with complex tasks, unstructured data (e.g., text, images, audio, and video), and large datasets [10], [11], [12]. | ||
Biological Neuron and Artificial Neuron
Biological Neuron

- The biological neuron is the smallest functional unit of the nervous system
- Different signals from pre-neurons pass through synapses and are collected by dendrites
- Dendrites carry signals toward cell body (called the soma)
- The soma integrates the incoming signals from dendrites and, if the result is above a threshold, generates electrical impulse that is transmitted down the axon
- The axon is a single long nerve fiber that carries impulses from cell body toward axon terminals
- The axon terminal is the end of the axon
- The synapse is the junction or gap between the axon terminal of one neuron and the dendrite or cell body of another neuron [15], [16]
Artificial Neuron

- An artificial neuron is the smallest building block of an Artificial Neural Network
- Summation function Z(. , . , … ) sums bias (b) and all weighted inputs
- Each input is associated with a weight value. Weight determines which input is more important than the others
- Bias, b, is not connected to specific input. It is a value that is added to the sum of weighted inputs to account for factors that cannot be captured by the inputs alone. It is a form of offset to ensure the summation function’s output is large enough even when the sum of the weighted inputs is not sufficient on its own.
- Many real-world problems involve non-linear relationships between inputs and outputs. The non-linear activation function f(.) breaks the linearity (introduced by the summation function), allowing for to learn of complex relationships between inputs and output [19], [20]
Artificial Neural Network (ANN)

An Artificial Neural Network (ANN) consists of interconnected nodes, called “artificial neurons”, that are organized in layers. Usually, these layer are “fully connected” layers, where neurons in adjacent layers have full pair-wise connections, but neurons within a layer are not connected. That is, node in a layer receives inputs from all the nodes in the previous layer, processes them (i.e., apply activation function on the sum of inputs), and passes the output to the neurons in the next layer. There are three types of layers in an ANN [21 – 25].
Input layer: This layer is the very first layer of the network (i.e., the first entry point of ANN). It is a “passive” layer because its nodes are passive, meaning they do not change the raw data (i.e., nodes simply receive the raw data and pass them unchanged to the next layer). The number of neurons in input layer equals the number of input variables (also called “independent variables”, features, or attributes).
Hidden layer: a layer of neurons that is neither the input nor the output layer. An ANN can have one or more hidden layers. The hidden layer(s) lie between the input layer and the output layer and are not directly visible or accessible to a typical user (e.g., the person writing a prompt for ChatGPT), hence the name.
Output layer: This layer is the final layer of the network. It receives input from the previous hidden and produces the network’s prediction or classification.
Shallow Neural Network (SNN) vs. Deep Neural Network (DNN)

Shallow Neural Network (SNN)
A Shallow Neural Network (SNN) is an Artificial Neural Network (ANN) that has one hidden layer between the input layer and the output layer. In some cases, a neural network with 2 or 3 hidden layers is considered as SNN if each hidden layer has small number of neurons [26 – 30].
SNNs are simple, require less data to train effectively, require less computational resources, more interpretable due to their simpler architecture, and less prone to overfitting.
SNNs are typically used for simpler tasks such as binary classification (e.g., classify emails as either „spam” or „not spam”), linear regression (e.g., predict house price), and simple pattern recognition (e.g., handwritten digits recognition). For example, an SNN with a hidden layer of several dozen nodes (e.g., 50 nodes) can achieve good recognition accuracy for handwritten digits.
Deep Neural Network (DNN)
A Deep Neural Network (DNN) is an Artificial Neural Network (ANN) with more than one hidden layer [31], [32].
While both SNN and DNN share the same fundamental architecture, the addition of more hidden layers in DNN increases the capacity to learn complex, non-linear relationships. In other words, each additional layer allows the DNN to extract complex features from the data, enabling the DNN to learn more complex patterns and relationships.
When compared to SNNs, DNNs require more data for effective training. They require more computational resources and are slower to train due to the increased number of parameters to tune and large training data set. Also, DNNs are more susceptible to overfitting and therefore require techniques to prevent overfitting such as regularization (L1, L2, dropout), early stopping, data augmentation, and simplifying the DNN. Further, they are often viewed as “black boxes” due to their many hidden layers (can be hundreds or even thousands of hidden layers).
DNNs are typically used for complex tasks that require high accuracy, assuming the availability of vast amount of training data and computing power. DNNs have a wide range of applications such as image classification (e.g., recognizing cats vs. dogs), machine translation (translating text from one language to another using models like Transformers), medical imaging (e.g., analyzing X-rays, MRIs, and other imaging data for diagnosis), generative modeling (e.g., using Generative Adversarial Networks (GANs) to create highly realistic fake images and videos, used in entertainment and advertising).
Activation Functions

- An activation function is a mathematical function that controls the output of neuron [33 – 36]
- Most activation functions are non-linear. If we use linear activation function, all layers of the neural network will collapse into one (i.e., the output will be a linear combination of input). In this case, the ANN can learn only linear patterns and cannot learn real-world complex and non-linear patterns that are hidden in data.
- Typically, all hidden layers use the same activation function.
- Typically, the output layer uses a different activation function from the hidden layers. The choice depends on the type of prediction required by the model (e.g., binary classification, multi-class classification, etc)
The table below provides the most common activations functions and their pros and cons [37 – 40].
![]() | Sigmoid s(x) ✓ Great for binary classification ✗ Vanishing gradient (function derivative very small between 0 and 0.25 & saturates at 0 or 1) ✗ Outputs are not zero centered (causing zig-zagging in gradient updates) ✗ ex is computationally expensive • Used in output layer for binary classification problems | Tanh (x) = 2s(2x) – 1 ✓ Outputs are zero centered ✗ Vanishing gradient ✗ ex is computationally expensive • Used in hidden layers |
![]() | ReLU (Rectified Linear Unit) ✓ Computationally efficient ✓ Does not saturate for positive inputs ✓ Converge faster than Sigmoid and Tanh ✗ “Dead neuron” / “dying ReLU” (for negative inputs, gradient is 0 and thus no learning) ✗ Outputs are not zero centered • Commonly used for hidden layers | Leaky ReLU ✓ Prevents dying ReLU problem ✓ Same advantages as ReLU ✗ Outputs not zero centered • Commonly used for hidden layers |
![]() | Softmax ✓ The function range is from 0 to 1 and the sum of all probabilities is 1. So, we have probabilities range ✗ The range will be from 0 to 1 Class Imbalance: it assumes that each class is represented equally in the training data. If there is significant class imbalance, the softmax may struggle to assign appropriate probabilities to the minority classes. • Used in output layer for multi-class classification | |
ANN/DNN Learning Process
Fig. 7. Iterative process of ANN/DNN learning using forward and backward propagation [41] | Learning process 1. Forward propagation (Forward pass) to get estimate/prediction 2. Loss/cost function to evaluate the performance of the model (e.g., computing the difference between actual output and predicted output) 3. Backward propagation (Backward pass/Backpropagation) to update the weights 4. Repeat steps 1, 2, and 3 until the loss function is minimized Fig. 8. A typical loss curve |
Loss/Cost Function, Gradient Descent, and Learning Rate
| Loss Function • is a measure for how well the model fits the training data • is defined based on the difference between the actual values and predicted values. | Mean Squared Error (MSE) loss: ![]() | |
| Gradient Descent • is an iterative optimization method for finding the minimum of a function (e.g., loss function) • the gradient of the cost function is the first order derivative of the loss function with respect to weights and biases • to find the local minimum of the loss function, we must move away from the gradient at current point | ![]() | |
| Learning Rate (α) • is a small positive value (often smaller than 1.0) to control the amount that the weights are updated during training new weight = old weight – (learning rate * gradient) [42], [43] | ![]() ![]() ![]() | |
AI Ethics – Key Principles
Fairness
- The elimination of bias and discrimination based on names, skin color, cultural background, disability or a mental illness, etc.
- The AI Model/System must be diverse, inclusive, and representative of all segments of society [44], [45].
Privacy and Data Protection
- Privacy must be protected and promoted throughout the AI lifecycle. Uphold the highest levels of data security processes and procedures to keep the data confidential preventing data and system breaches.
Transparency and Explainability
- Transparency: The developer of the AI model must inform end-users about the intention behind developing the AI Model/System.
- Explainability: In general, we don’t blindly trust those who can’t explain their reasoning. The same goes for AI, perhaps even more so. Thus, the developer(s) of AI model/System must be able to explain “reasons” behind all AI Model/System predictions and suggestions.
Accountability and Responsibility
- This is to address the case when the AI Model/system makes a mistake.
- All (designers, developers, vendors, etc) of the AI Model/System are both responsible and accountable/liable for the decisions and actions that may result in potential risk and negative effects on individuals, communities, and society.
Reliability and Safety
- Reliability: The AI Model/system must always remain consistent and true to its actual development purpose in both the short and long term and after it trains and learns with new data
- Safety: The AI Model/System must not pose a risk of harm or danger to individuals, communities, and society.
Applications of AI in Smart Cities
Security AI-enabled cameras that can analyze footage in real time and recognize faces of criminals, criminal behavior, and potential threats. | Road infrastructure maintenance inspection Automate and scale maintenance inspections of primary roads without sending out maintenance crews to visually inspect roads or waiting for complaints from people. |
Waste recognition and sorting Recognize different types of waste and put into its respective bin for recycling or reuse. | Intelligent living and workspaces Learn from the user’s habits and preferences, then seamlessly instruct devices to adapt to the user’s lifestyle, providing a comfortable touchless experience that is tailored to user’s individual needs. |
More details are available in the following presentation and the references.

References:
[1] Deep learning vs machine learning | Google Cloud
[2] Deep learning vs. machine learning: A complete guide
[3] Generative AI Defined: How It Works, Benefits, and Limitations
[4] Artificial Intelligence – What is it and Why Does it Matter?
[5] Why Is AI booming Now|2024 blogs
[6] Supervised vs. Unsupervised Learning: What’s the Difference? | IBM
[7] Supervised vs. Unsupervised vs. Reinforcement Learning: What’s the Difference? | phData
[8] 7 of the Most Used Regression Algorithms and How to Choose the Right One | Towards Data Science
[9] What Is Reinforcement Learning? – MATLAB & Simulink
[10] Understanding The Difference Between AI, ML, And DL: Using An Incredibly Simple Example — Advancing Analytics
[11] What’s the difference between Machine Learning and Deep Learning? – viso.ai
[12] Machine learning vs deep learning | Rudderstack
[13] Why are Neuron Axons Long and Spindly?
[14] Photonic Neural Networks Based on Integrated Silicon Microresonators | Intelligent Computing
[15] site.uottawa.ca/~petriu/NN_basics-tutorial.pdf
[16] What’s the difference between the axon terminal and a synapse? – Quora
[17] terms
[18] A typical structure of a biological neuron and synapse. | Download Scientific Diagram
[19] What are Weights and Biases? — Klu
[20] Introduction to Activation Functions in Neural Networks | DataCamp
[21] COMPSCI 682 Neural Networks: A Modern Introduction
[22] (6) A Beginner’s Guide to Neural Networks | LinkedIn
[23] What is Artificial Neural Network – Startup House | Startup House
[24] Blog Theme – Details
[25] What is the difference between the output layer and the hidden layers in a neural network model in TensorFlow? – EITCA Academy
[26] Types of ML – Questions and Answers in MRI
[27] What are the differences between 'deep’ neural networks with one hidden layer and those with two or more hidden layers? – Quora
[28] Difference between Shallow and Deep Neural Networks | GeeksforGeeks
[29] How to classify MNIST digits with different neural network architectures | by Tyler Elliot Bettilyon | Teb’s Lab | Medium
[30] How to calculate the carbon footprint of training/running a large AI model in the cloud Blog Devoteam Rebirth
[31] Overfitting in Deep Neural Networks & how to prevent it. | Analytics Vidhya
[32] 5 Techniques to Prevent Overfitting in Neural Networks – KDnuggets
[33] Activation functions in Neural Networks | GeeksforGeeks
[34] What, Why and Which?? Activation Functions | by Snehal Gharat | Medium
[35] Activation Functions in Neural Networks [12 Types & Use Cases]
[36] A Friendly Introduction to [Deep] Neural Networks | KNIME
[37] What is an activation function? What are the different types of activation functions? Discuss their pros and cons – AIML.com
[38] Demystifying Activation Functions in Neural Networks – Analytics Vidhya
[39] Activation Function In Neural Networks – djinit.ai
[40] Common activation functions in artificial neural networks (NNs) that… | Download Scientific Diagram
[41] https://medium.com/data-science-365/overview-of-a-neural-networks-learning-process-61690a502fa
[42] https://www.analyticsvidhya.com/blog/2020/10/how-does-the-gradient-descent-algorithm-work-in-machine-learning/
[43] https://www.semanticscholar.org/paper/Comparison-of-Various-Learning-Rate-Scheduling-on-Konar-Khandelwal/2a70c38db475a610fc44eb0705b6b339235e70a6
[44] AI Ethics Principles
[45] Ethics of Artificial Intelligence | UNESCO
Author Bio
Youssouf Ould Cheikh Mouhamedou is a seasoned academic and telecom professional with 20+ years of experience in applied R&D and Innovation, IoT, Smart City Technology Strategy, Wireless Access Technologies, AI/ML/DL, Telco Industry, 5G, 6G, cooperation with globally leading technology companies.
Currently, he is working at NEOM as senior manager of cognitive city and AI, where he focuses on strategy, digital solutions, and proof of concepts (PoCs). Prior to NEOM, he was a senior R&D expert at STC, where he worked on a wide range of technologies such as 5G, NB-IoT, eSim, LiFi, and public Wi-Fi. In addition, Youssouf was responsible for fostering an innovation culture across the Technology Unit of STC. Before joining STC, he worked for 5 years as an assistant professor at KSU, Saudi Arabia, where he was working on applied research related to wireless access technologies. Youssouf also worked at TELECOM Bretagne, France, and the Communications Research Centre (CRC), Canada, on practical wireless aspects of satellite communications. Moreover, he worked as a software quality assurance manager at MAS GmbH, Germany, where he was responsible for the delivery of software packages that monitor various SIEMENS Fiber Optic Transmission Systems.
He received a Dipl.-Ing. degree and Ph.D. degree in electrical and computer engineering from the Technical University of Munich (TUM), Germany, and McGill University, Canada, in 2001 and 2005, respectively.
Since June 2020, he serves as an Advisory Board Member at Rimedo Labs.













Fig. 7. Iterative process of ANN/DNN learning using forward and backward propagation [41]




