Machine Learning Interview Questions 2022

This page will guide you to brush up on the skills of machine learning to crack the interview.

Machine Learning Interview Questions 2022

This page will guide you to brush up on the skills of machine learning to crack the interview.

Here, our focus will be on real-world scenario ML interview questions asked in Microsoft, Amazon, etc, And how to answer them.

Let's get started!

Firstly, Machine Learning refers to the process of training a computer program to build a statistical model based on data. The goal of machine learning (ML) is to turn data and identify the key patterns out of data or to get key insights.

For example, if we have a historical dataset of actual sales figures, we can train machine learning models to predict sales for the coming future.

Why is the Machine Learning trend emerging so fast?
Machine Learning solves Real-World problems. Unlike the hard coding rule to solve the problem, machine learning algorithms learn from the data.

The learnings can later be used to predict the feature. It is paying off for early adopters.

A full 82% of enterprises adopting machine learning and Artificial Intelligence (AI) have gained a significant financial advantage from their investments.

According to Deloitte, companies have an impressive median ROI of 17%.

Machine Learning Interview Questions For Freshers

1. Why was Machine Learning Introduced?

The simplest answer is to make our lives easier. In the early days of "intelligent" applications, many systems used hardcoded rules of "if" and "else" decisions to process data or adjust the user input. Think of a spam filter whose job is to move the appropriate incoming email messages to a spam folder.

But with the machine learning algorithms, we are given ample information for the data to learn and identify the patterns from the data.

Unlike the normal problems we don't need to write the new rules for each problem in machine learning, we just need to use the same workflow but with a different dataset.

Let's talk about Alan Turing, in his 1950 paper, "Computing Machinery and Intelligence", Alan asked, "Can machines think?"

Full paper here

The paper describes the "Imitation Game", which includes three participants -

Human acting as a judge,
Another human, and
A computer is an attempt to convince the judge that it is human.
The judge asks the other two participants to talk. While they respond the judge needs to decide which response came from the computer. If the judge could not tell the difference the computer won the game.

The test continues today as an annual competition in artificial intelligence. The aim is simple enough: convince the judge that they are chatting to a human instead of a computer chatbot program.

2. What are Different Types of Machine Learning algorithms?

There are various types of machine learning algorithms. Here is the list of them in a broad category based on:

Whether they are trained with human supervision (Supervised, unsupervised, reinforcement learning).
The criteria in the below diagram are not exclusive, we can combine them any way we like.

3. What is Supervised Learning?

Supervised learning is a machine learning algorithm of inferring a function from labeled training data. The training data consists of a set of training examples.

Example: 01.

Knowing the height and weight identifying the gender of the person. Below are the popular supervised learning algorithms.

Support Vector Machines.
Naive Bayes.
Decision Trees.
K-nearest Neighbour Algorithm and Neural Networks.

4. What is Unsupervised Learning?

Unsupervised learning is also a type of machine learning algorithm used to find patterns on the set of data given. In this, we don't have any dependent variable or label to predict. Unsupervised Learning Algorithms:.

Anomaly Detection,.
Neural Networks and Latent Variable Models.

In the same example, a T-shirt clustering will categorize as "collar style and V neck style", "crew neck style" and "sleeve types".

5. What is 'Naive' in a Naive Bayes?

The Naive Bayes method is a supervised learning algorithm, it is naive since it makes assumptions by applying Bayes' theorem that all attributes are independent of each other.

Bayes' theorem states the following relationship, given class variable y and dependent vector x1 through xn:.

P( yi|x1, ..., xn) =P( yi) P( x1, ..., xn|yi)( P( x1, ..., xn).

Using the naive conditional independence assumption that each xiis independent: for all I this relationship is simplified to:.

P( xi|yi, x1, ..., xi-1, xi +1, ..., xn) = P( xi|yi).

Since, P( x1, ..., xn) is a constant given the input, we can use the following classification rule:.

{P( yi|x1, ..., xn) = P( y) ni= 1P( xi|yi) P( x1, ..., xn) and we can also use Maximum A Posteriori (MAP) estimation to estimate P( yi) and P( yi|xi) the former is then the relative frequency of class yin the training set.|} P( yi|x1, ..., xn) P( yi) ni= 1P( xi|yi).

y = arg max P( yi) ni= 1P( xi|yi).

The different naive Bayes classifiers mainly differ by the assumptions they make regarding the distribution of P( yi|xi): can be Bernoulli, binomial, Gaussian, and so on.

6. What is PCA? When do you use it?

Principal component analysis (PCA) is most commonly used for dimension reduction.

In this case, PCA measures the variation in each variable (or column in the table). If there is little variation, it throws the variable out.

Thus making the dataset easier to visualize. PCA is used in finance, neuroscience, and pharmacology.

It is very useful as a preprocessing step, especially when there are linear correlations between features.

7. Explain SVM Algorithm in Detail.

A Support Vector Machine (SVM) is a very powerful and versatile supervised machine learning model, capable of performing linear or non-linear classification, regression, and even outlier detection.

Suppose we have given some data points that each belong to one of two classes, and the goal is to separate two classes based on a set of examples.

In SVM, a data point is viewed as a p-dimensional vector (a list of p numbers), and we wanted to know whether we can separate such points with a (p-1)- dimensional hyperplane. This is called a linear classifier.

There are many hyperplanes that classify the data. To choose the best hyperplane that represents the largest separation or margin between the two classes.
If such a hyperplane exists, it is known as a maximum-margin hyperplane and the linear classifier it defines is known as a maximum margin classifier. The best hyperplane that divides the data in H3.

We have data (x1, y1), ..., (xn, yn), and different features (xii, ..., xip), and yiis either 1 or -1.

The equation of the hyperplane H3 is the set of points satisfying:.

w. x-b = 0.

Where w is the normal vector of the hyperplane. The parameter b|| w|| determines the offset of the hyperplane from the original along the normal vector w.

So for each i, either xiis in the hyperplane of 1 or -1.

For each i, either xiis in the hyperplane of 1 or -1. Basically, xisatisfies:.

w. xi - b = 1 or w. xi - b = -1.

8. What are Support Vectors in SVM?

A Support Vector Machine (SVM) is an algorithm that tries to fit a line (or plane or hyperplane) between the different classes that maximizes the distance from the line to the points of the classes.

In this way, it tries to find a robust separation between the classes. The Support Vectors are the points of the edge of the dividing hyperplane.

9. What are Different Kernels in SVM?

There are six types of kernels in SVM:.

Linear kernel - used when data is linearly separable.
Polynomial kernel - When you have discrete data that has no natural notion of smoothness.
Radial basis kernel - Create a decision boundary able to do a much better job of separating two classes than the linear kernel.
Sigmoid kernel - used as an activation function for neural networks.

10. What is Cross-Validation?

Cross-validation is a method of splitting all your data into three parts: training, testing, and validation data. Data is split into k subsets, and the model has trained on k-1of those datasets.

The last subset is held for testing. This is done for each of the subsets. This is k-fold cross-validation. Finally, the scores from all the k-folds are averaged to produce the final score.The last subset is held for testing.