Multivariate Analysis
Section- A
1-Discuss about the Wishart distribution. Also find its additive Property.
The Wishart distribution is a multivariate generalisation of the univariate χ2 distribution, and it plays an analogous role in multivariate statistics. In inferential statistics, the Wishart distribution is also defined as the distribution of the sample covariance matrix.
The Wishart distribution arises as the distribution of the sample covariance matrix for a sample from a multivariate normal distribution.
The Wishart distribution has an additive property, which states that
if M1∼Wp(Σ,n1) and M2∼Wp(Σ,n2) are independent,
then M1+M2∼Wp(Σ,n1+n2).
In other words, the sum of two independent Wishart distributed matrices is also
Wishart distributed, with the degrees of freedom equal to the sum of the
degrees of freedom of the individual matrices.
2-Discuss about the Maholanobis D2 with its various
applications.
Mahalanobis D^2 is a 2x2 matrix which can be used for various applications such as calculating genetic distances, estimating population parameters, and clustering data.
Identify the entries of the second row of the Mahalanobis D^2 matrix. The entries are the first two eigenvalues of the matrix that are negative. These eigenvalues are associated with correlations in the population which are not due to common ancestry or environmental factors, but are mainly due to historical effects such as migration and bottlenecks.
The second row contains two negative eigenvalues with corresponding eigenvectors (EV) {E11} and {E22}. These EVs are orthogonal to each other and therefore correlate negatively with each other. Thus, these two EVs represent the two eigenvectors that have the highest correlation with the other EVs representing the four eigenvalues in the first row.
If we denote the eigenvalue associated with the fourth eigenvector by λ4, then λ4 corresponds to the first entry of the second row. Similarly, the first entry of the third row is λ3 = √{(E11)2 + (E22)2},
which corresponds to the second eigenvalue of the first row. The third row also has two negative eigenvalues, corresponding to the second and fourth eigenvectors of the first row.
Therefore, λ3 = √{(E12)2 + (E21)2} corresponds to the entry of the third
row.
For the n-dimensional vector X
having N components xi we can compute its Mahalanobis distance with respect to
the original observations as:
d_{M}\left(X;Y\right)=\sqrt{\sum_{k=0}^{N}{\left( Xi-Yk \right)}^2}
where Y represents the vector of original observations. This
distance measure is used to find the distance between the data points and the
nearest cluster center in a data set. In k-means clustering, the Mahalanobis distance
between the data points and the center of the cluster is used to select the
cluster with the minimum distance for each data point. It can also be used to
select the optimal number of clusters in a data set by assigning each data
point to the nearest cluster center based on its Mahalanobis distance from the
data points to all the cluster centers. The process of finding the best cluster
for a given data point is called search optimization.
This post will explore how to use
D^2 matrices to aid in selecting the best data clustering algorithm to use
based on a given data set, thus resulting in a more accurate and accurate model
of the given dataset.
Clustering is the process of
grouping a set of objects into subsets such that objects in the same subset are
more similar to each other than to those in other subsets. A clustering
technique may be deterministic or stochastic. Deterministic techniques are
based on predefined criteria and algorithmic methods while stochastic
techniques use random sampling techniques to group similar objects together.
3-Discuss about the Hoteing’s T2 distribution and its applications.
Hotelling's
T-squared distribution is a statistical distribution that describes the
relationship between two variables. It was developed by Harold Hotelling in
1933, who used it to describe the relationship between price and quantity for a
given product.
In
this article, we will explore the Hotelling's T-squared distribution and it's
applications. We will start with a simple example of how to use Hotelling's
T-squared distribution to determine the number of customers who would leave
your business if they were given a free ice cream cone. Then we'll discuss how
this can be applied in more complex situations.
Hotelling's
T-squared distribution is used to test a hypothesis about the relationship
between two variables. It can be used in business, finance, and economics to
determine if there is a link between two variables. The T-squared test can also
be used to determine if two variables are independent of one another.
Hotelling's
T-squared distribution is a statistical model that describes how two variables
are related. It was first introduced by Frank H. Hotelling in 1932, and it's
been used in many different scientific fields since then.
Hotelling's
T-squared distribution is a statistical distribution that gives us information
about the relationship between two variables, such as the amount of investment
in a product and how much the product costs.
Section - B
2-Describe about the multiple and partial short.
In statistics, multiple comparison refers to the problem
of comparing more than two groups or samples at the same time. This can lead to
an increased risk of making a type I error, or falsely rejecting the null
hypothesis. There are several methods for controlling this error rate, such as
the Bonferroni correction, the Tukey test, and the Holm-Bonferroni method.
Partial correlation is a statistical technique that measures the correlation between two variables while controlling for the effects of one or more other variables. This allows for the determination of the unique relationship between the two variables of interest, as opposed to the relationship that may be confounded by the effects of other variables. It can be calculated using statistical software.
3-Write short notes on Discriminate Analysis.
The likelihood function for a multivariate normal distribution
with a mean vector of μ and a covariance matrix of Σ is given by:
L(μ,Σ|X) = (2π)^(-n/2) |Σ|^(-1/2) exp[-1/2(X-μ)^T Σ^(-1)
(X-μ)]
Where X is the nxm matrix of the observations, n is the number of observations and m is the number of variables.
The MLEs of the mean
vector are found by taking the partial derivative of the log-likelihood
function with respect to μ and setting it to zero.
Then, solving this equation for the mean vector, we get
the MLEs of the mean vector as the sample mean of the observations.
μ_MLE = 1/n Σ X_i
This is the point in the parameter space that maximizes
the likelihood function given the observations.
In summary, the MLE of the mean vector for a multivariate
normal distribution is the sample mean of the observations. It is the point in
the parameter space that maximizes the likelihood function given the
observations.
-----------------------------------------------------------------------------------------------------------------------
Please reads the answers carefully if any error please show in the comment. This answers are not responsible for any objection. All the answers of Assignment are above of the paragraph. If you like the answer, please comment and follow for more also If any suggestion please comment or E-mail me.
0 Comments