Course Summary and Next Steps

Published on: 09 June 2021

AI Services Artificial Intelligence Cloud AI Python

The Artificial Intelligence Market Outlook

The International Data Corporation (IDC) has predicted that worldwide revenues for the artificial intelligence industry, inclusive of hardware, software, cloud, and related services will grow 16% in 2021 to reach a $328 Billion market value. They also predict that by 2024, this would have crossed the $500 Billion mark. The software segment within AI capped the maximum revenue at 88% of the total. These statistics clearly mean that AI is currently on an upward trend and is going to dominate the market in the coming years. Every industry has now started adopting AI in their daily business routines and therefore it is a vital skill to posses as an individual whether or not you belong to the computer science and information technology industry.

To understand this even better and perform an analysis on the trend of Artificial Intelligence in today’s market, let us perform an EDA (Exploratory Data Analysis) to see how AI has grown over the years. These are the few prerequisites we need to know, to perform this analysis.

Dataset: The ArXiv has been a one-stop public research repository for about 30 years. It provides open access to articles, scholarly reviews, and research papers from across the world. It is also discipline independent and contains data for all domains of research. The publicly available open-source JSON dataset of this entire archive of research has been issued by Cornell University to Kaggle and can be found here. We will be using this dataset for analyzing research that has gone through in the AI and ML space for the last 25 years.

Logic Used: The key understanding and assumption here is that the rise of AI can be determined by the rise in number of published research papers across the world. The more a concept is read and researched, the higher it grows. We will be analyzing the number of published research work and use it as a metric to perform analysis of increasing number of AI-based technologies. The dataset can either be downloaded locally and used or can be used as a Kaggle notebook the way it is mentioned in the code section below.

# Import the necessary packages for processing and visualization

import pandas as pd

import numpy as np

import plotly_express as px



# Importing the JSON and dask bag libraries to read and format from the JSON input

import json

import dask.bag as db



# Read from the Json Input provided in the Kaggle working repository

database_records = db.read_text("/kaggle/input/arxiv/*.json").map(lambda inp:json.loads(inp))

 

# Topics from the JSON document that could mean a Paper related to AI

topics_AI = ['stat.ML','cs.LG','cs.AI']

 

# Filter the records for only those with AI components

documents_AI = (database_records.filter(lambda x:any(a in x['categories'] for a in topics_AI) == True))

 

# Rename columns for understandability

data = lambda x: {'Paper Id': x['id'], 'Paper Title': x['title'], 'Category':x['categories'], 'Paper Abstract':x['abstract'], 'Revision Version':x['versions'][-1]['created'], 'Issue Date':x["doi"], 'Authors':x['authors_parsed']}

 

# Process the above into a dataframe and then save it as an Excel File

data_proc = documents_AI.map(data).to_dataframe().compute()

data_proc.to_excel("AI and ML Archive Papers.xlsx",index=False,encoding="utf-8")

 

print("Papers found in the AI Domain",data_proc.shape[0])

>>> Papers found in the AI Domain 107016

 

data_proc.head()

>>>

	`Paper Id`	`Paper Title`	`Category`	`Paper Abstract`	`Revision Version`	`Issue Date`	`Authors`
`0`	`0704.0047`	`Intelligent location of simultaneously active ...`	`cs.NE cs.AI`	`The intelligent acoustic emission locator is...`	`Sun, 1 Apr 2007 13:06:50 GMT`	`None`	`[[Kosel, T., ], [Grabec, I., ]]`
`1`	`0704.0050`	`Intelligent location of simultaneously active ...`	`cs.NE cs.AI`	`Part I describes an intelligent acoustic emi...`	`Sun, 1 Apr 2007 18:53:13 GMT`	`None`	`[[Kosel, T., ], [Grabec, I., ]]`
`2`	`0704.0304`	`The World as Evolving Information`	`cs.IT cs.AI math.IT q-bio.PE`	`This paper discusses the benefits of describ...`	`Wed, 13 Oct 2010 19:49:16 GMT`	`10.1007/978-3-642-18003-3_10`	`[[Gershenson, Carlos, ]]`
`3`	`0704.0671`	`Learning from compressed observations`	`cs.IT cs.LG math.IT`	`The problem of statistical learning is to co...`	`Thu, 5 Apr 2007 02:57:15 GMT`	`10.1109/ITW.2007.4313111`	`[[Raginsky, Maxim, ]]`
`4`	`0704.0954`	`Sensor Networks with Random Links: Topology De...`	`cs.IT cs.LG math.IT`	`In a sensor network, in practice, the commun...`	`Fri, 6 Apr 2007 21:58:52 GMT`	`10.1109/TSP.2008.920143`	`[[Kar, Soummya, ], [Moura, Jose M. F., ]]`


# To process time-based changes in data, extract the time from Revision Version column

import utils
data_proc['DateTime'] = pd.to_datetime(data_proc['Revision Version'])

data_proc = utils.extractDateFeatures(data_proc,"DateTime")

 

# Project the rise in paper from this domain Year-on-Year

yoy_paper_growth = data_proc.groupby(['Year']).size().reset_index().rename(columns={0:'Count of Papers'})

px.line(x="Year",y="Count of Papers", data_frame=yoy_paper_growth, title="Y-O-Y Growth in AI")

>>>

# Now take the average papers published daily

daily_average = data_proc.groupby(['Date']).size().reset_index().rename(columns={0:'Day-Level Segregation of Papers Publisher'})

px.line(x="Date",y="Day-Level Segregation of Papers Publisher",data_frame=daily_average, title="Daily Average of Publisher Papers")

>>>

This analysis clearly shows that there has been a surge of artificial intelligence research and development in the last 15 years and the trend suggests it is going to be travelling even further.

Building AI Software in Today’s World

In the machine-driven world we live today, systems dependent on Artificial Intelligence have risen and almost all kinds of machine are now using some part of AI. Let us in this section, revise the core AI concepts that build the backbone of these mechanisms. This list of concepts is not exhaustive, although, every algorithm uses a branch of these to implement AI.

Synopsis of Artificial Intelligence Systems and Technologies

Machine Learning (ML) Systems: These are algorithms that are built to recognize patterns in data and extract knowledge from new datasets based on their learning of large training datasets. Theoretically, machine learning systems learn from historic data from the past. This is similar to human intellect that learns from past observations. From this learning, the machines are then able to predict the outcome of future events that are going to occur in the same domain.
(Deep) Neural Networks: The Neural Nets system comprises of unique types of machine learning algorithms that aim to mimic the human brain’s processing capability. These are trained on even larger datasets and perform better in systems that tend to require more humanistic approaches, for instance, learning languages and handwritings. Neural Nets are a subset of Machine Learning but are more advanced in their methodology to imitate human behavior.
Natural Language Processing (NLP): Language is an important aspect in our lives since it is the primary form of communication. These systems exhibit understanding of linguistics used by humans and have algorithms that analyze several languages and interact with humans in their spoken or written languages. These algorithms are also built on top of ML and DL but are specific to language processing.
Computer Vision (CV): These systems deal with processing of images and videos. They are competent in extricating useful patterns from images, animations, or videos. This processing is aimed at mimicking the behavior of the human eye and systems that process vision for humans. They are used for identifying parts of the image or predicting its contents.
Cognitive Search Systems: Cognitive search systems facilitate the evaluation of complex circumstances and encourage human decision-making mechanisms. As part of their operating procedure, they usually collect, analyze, and contextualize various categories of data employing rule-based reasoning or applying varied machine learning algorithms that are trained on data consisting of cognitive systems.

Continuing the Programming Practice

Programming is a skill that will always need furnishing and tuning. Every artificial intelligence application requires writing code in one of many languages that are supported. Throughout this course, we focused on Python, but there are others like R, Scala, Perl, etc. that allow programmers to build AI models. In this section let us take a motivational approach towards keeping the programming practice active.

Stay Persistent: Programming is like any other activity. The more you practice, the better you will be at it. To implement this, one could participate in hackathons or build applications of choice. For instance, think of things you do in everyday life that can be automated, like researching a topic. Build a text scrapper that runs code through google, bing, or any other search engine and gives you the most relevant content based on your search without you having to browse every website.
Revisit Code: Revisiting a piece of code written a month back can help in the optimization. Every program has scope for improvement and it is usually missed in the first attempt when the goal is to solve the problem. But once the problem is solved, you can then optimize it and increase its performance. The best way of revisiting and revising is by maintaining a repository like Git. This will motivate you to come back, stay active and continue to improve.
Life-long learning: Technology grows at a rapid pace. What is news today, might not exist a few years later. In such a rapidly advancing environment, it is crucial to keep those learning shoes always on. All problems have solutions and every solution can be improved.

With the rise in cloud-based AI application-building mechanisms, the search for highly skilled AI scientists has gone down. Services offered as part of cloud computing are often bundled with complex AI models that do the work for us without having to build and maintain them. In the coming section, we will discuss how cloud computing has influence AI and is taking it to the next level.

Cloud-Based Implementation of Powerful Processing Analytics

Cloud computing in this era has been stronger and more reachable then ever. Given the rise of artificial intelligence machines, the power of compute these machines use has also risen. With this, the cost of operating AI algorithms has also increased. AI systems run complex neural networks and machine learning tasks that need intense processing and fast data movement capabilities. Since on-premises computation and storage is costly, the cloud is increasingly becoming the new home for running AI and ML workloads. There are numerous implementations of how Cloud Computing helping build faster and better AI algorithms and we will glance through a few below.

Google Colab: The most commonly used Notebook environment for running AI code on the cloud is Colab. It is a similar environment to an Anaconda powered Jupyter Lab, but runs entirely on Google’s dedicated servers, giving immense compute power to the user. An added benefit is that it can directly be connected to Git and Google Drive to save data and work on the cloud as well.
Kaggle: Started as a host of Machine Learning competitions, Kaggle now provides kernels to run ML, DL, and all kinds of AI code, with every required library already built into their kernels. It also gives compute power like Google Colab but does not have storage properties. Kaggle is best suited for working on datasets that already exist in Kaggle.
Microsoft Azure Notebooks: This results from a collaboration between Microsoft and GitHub, that allows the desktop version of Visual Studio Code to work as a cloud assistant and run code on. An added advantage here is that like Visual Studio code, this environment supports multiple language extensions apart from R and Python. Since it is hosted on Azure, it also provides storage capabilities to the user.
GitHub Codespaces: This is an extension of Visual Studio Code, provided by Microsoft on the cloud. It can host applications on the browser, just like a Jupyter Notebook, but does not run-on local resources and rather runs on GitHub’s hosted services.
Facebook PyTorch: This is Facebook’s take on giving a service to machine learning developers. PyTorch is an open-source framework that is meant for running Deep Learning tasks, but with optimized GPU accelerators. It also supports asynchronous run of programs through both Python and C++. Also, PyTorch extends all of its features to Android and iOS, which means an end-to-end mobile ML application can be deployed using this technology.

AI & ML as Services on Google Cloud, AWS, and Azure

All technology invested companies like Apple, Microsoft, Google, Amazon, and Facebook use AI in their respective tasks. But on the other hand, they have also developed tools that users can take advantage of and build complex machine learning systems at a lower price in comparison to an on-premises solution. Artificial Intelligence as a Service, more commonly called AI-as-a-service, is a popular segment of cloud subscriptions today. There are multiple advantages to using AI-AAS listed below:

Companies do not need to build their own AI models. The initiation and development of an AI project can be costly and time consuming. On the hindsight, the results from using an AI software are tremendous. By using AI services from a Cloud provider, companies can leverage this and not have to invest in building and maintaining the technology.
End users of the application will not require specialized skills. Often, building core AI applications requires sufficient amount of background in this field which is hard to find and costly to invest in. Using the same technology as a service helps companies and at the same time the users only work on an interactive GUI, meaning they do not require specialized skills.
Levering automation using AI: Numerous companies today are investing in the automation technology today. Right from voice assistants to chat responders and system maintenance, all of this is being automated. AI based applications are designed to improve this automation and intensify its performance.

Artificial Intelligence as a Service (AIAAS)

Conclusions

Through the course of this series, we worked on Python, its implementation on mathematics and statistics, concepts of programming on data acquisition, cleaning, and processing. We later looked over building machine learning models and learning cognitive science through Python. With this series of chapters, we have now built a strong foundation in the world of Python Programming and its uses in Artificial Intelligence. As important as it is to learn these topics, it is also equally important to stay in practice. Tasks like feature engineering and data processing require significant amount of programming knowledge, and these are parts of AI that ultimately guide the outcome of any model. Therefore, it is vital to keep the pace with programming. The aim of this course was to implement Python as a programming language and perform case studies on artificial intelligence through Python. We have successfully followed this and are now equipped with the knowledge of Python and its practices in the universe of AI.

Course Summary and Next Steps

Contents