STATISTICS IN PSYCHOLOGY (MPC-006) 

Section A


1-Explain the meaning of descriptive statistics and describe organisation of data.


Descriptive statistics is a branch of statistics that deals with the organization, presentation, and summarization of data. It is used to describe the basic features of a dataset, such as the mean, median, and standard deviation, and to create visual representations of data, such as histograms, box plots, and scatter plots. The goal of descriptive statistics is to provide a clear and concise summary of a dataset that can be easily understood by a wide audience.

The organization of data is a crucial step in the process of descriptive statistics. Data can be organized in a variety of ways, depending on the type of data and the research question being addressed. Some common ways to organize data include:

Tabulation: This method involves creating a table of data, where each row represents a single observation and each column represents a variable. This is a simple and easy way to organize data, but it can become unwieldy for large datasets.


Classification: This method involves grouping data into categories based on certain characteristics. For example, data on the age of participants in a study might be classified into groups such as "under 18", "18-24", "25-34", and so on.


Stratification: This method involves dividing data into subgroups based on certain characteristics. For example, data on the income of participants in a study might be stratified into groups such as "low income", "middle income", and "high income".


Clustering: This method involves grouping data into clusters based on certain characteristics. For example, data on the height and weight of participants in a study might be grouped into clusters based on body mass index (BMI).

Once data is organized, descriptive statistics can be used to summarize the data and make it easier to understand. Some common descriptive statistics include:

Measures of central tendency: These statistics describe the "center" of a dataset. The most common measures of central tendency are the mean, median, and mode. The mean is the average of all the values in the dataset, the median is the middle value when the data is arranged in numerical order, and the mode is the most common value in the dataset.


Measures of spread: These statistics describe how spread out the data is. The most common measures of spread are the range, variance, and standard deviation. The range is the difference between the largest and smallest values in the dataset, the variance is a measure of how far each value is from the mean, and the standard deviation is a measure of how spread out the data is from the mean.


Measures of shape: These statistics describe the shape of the data distribution. The most common measures of shape are skewness and kurtosis. Skewness is a measure of the asymmetry of a distribution, and kurtosis is a measure of the peakedness or flatness of a distribution.

Visual representations of data, such as histograms, box plots, and scatter plots, can also be used to summarize data and make it easier to understand. Histograms are used to show the frequency of different values in a dataset, box plots are used to show the distribution of data, and scatter plots are used to show the relationship between two variables.

In conclusion, descriptive statistics is a branch of statistics that deals with the organization, presentation, and summarization of data. It is used to describe the basic features of a dataset, such as the mean, median, and standard deviation, and to create visual representations of data, such as histograms, box plots, and scatter plots. The goal of descriptive statistics is to provide a clear and concise summary of a dataset that can be easily understood by a wide audience.

2-Explain the concept of normal curve with help of a diagram. Explain the characteristics of normal probability curve.

The normal probability curve, also known as the normal distribution or Gaussian distribution, is a graphical representation of a statistical distribution that is symmetric around the mean and follows a bell-shaped pattern. The curve is defined by two parameters: the mean (μ) and the standard deviation (σ). The mean represents the center of the distribution, while the standard deviation represents the spread of the distribution. A larger standard deviation indicates a wider spread of data, while a smaller standard deviation indicates a narrower spread of data.

A diagram of a normal probability curve is shown below:


As can be seen in the diagram, the curve is symmetric around the mean (represented by the vertical line in the middle of the graph). The area under the curve is equal to 1, indicating that the probability of any value falling within the distribution is 1. The curve reaches its maximum height at the mean and gradually decreases as it moves away from the mean.

One of the most important characteristics of the normal probability curve is that it is unimodal, meaning that it has only one peak. This is in contrast to other probability distributions, such as the binomial distribution, which can have multiple peaks. The normal probability curve is also continuous, meaning that there are an infinite number of possible values within the distribution.

Another important characteristic of the normal probability curve is that it is symmetric. This means that the values to the left of the mean are mirror images of the values to the right of the mean. The standard deviation (σ) is used to measure the spread of the distribution. The standard deviation can be used to calculate the probability that a value will fall within a certain range of the mean. For example, if the standard deviation is 1, then approximately 68% of the values in the distribution will fall within 1 standard deviation of the mean, approximately 95% of the values will fall within 2 standard deviations of the mean, and approximately 99.7% of the values will fall within 3 standard deviations of the mean.

The normal probability curve is also useful for making predictions about future events. For example, if a population of data follows a normal distribution, then it can be used to make predictions about the likelihood of certain events occurring. For example, if a population of data has a mean of 100 and a standard deviation of 20, then it can be predicted that approximately 68% of the values in the population will fall between 80 and 120.

The normal probability curve is also useful in hypothesis testing. For example, if a population of data follows a normal distribution, then a hypothesis test can be used to determine if a sample of data is also normally distributed. This can be done by comparing the sample mean and standard deviation to the population mean and standard deviation. If the sample mean and standard deviation are similar to the population mean and standard deviation, then it can be assumed that the sample is normally distributed.

The normal probability curve is also useful in statistics. For example, if a population of data follows a normal distribution, then it can be used to calculate the probability of certain events occurring. For example, if a population of data has a mean of 100 and a standard deviation of 20, then it can be calculated that the probability of a value falling between 80 and 120 is approximately 68%.



In conclusion, the normal probability curve is a useful tool for understanding and analyzing data. It is symmetric, unimodal, continuous, and can be used for prediction, hypothesis testing, and statistical calculations. It is defined by two parameters: the mean and the standard deviation. Understanding and applying the normal probability curve can provide valuable insights into a wide range of data sets and help make more informed decisions.

3-The scores obtained by four groups of employees on occupational stress are given below. Compute ANOVA for the same. 

Group A 

34 

22 

21 

22 

34 

32 

44 

55 

12 

12 

Group B 

12 

15 

12 

23 

45 

43 

33 

44 

54 

37 

Group C 

45 

56 

65 

67 

67 

76 

54 

23 

21 

34 

Group D 

34 

55 

66 

76 

54 

34 

23 

22 

11 

23 






ANOVA, or Analysis of Variance, is a statistical method used to determine if there is a significant difference between the means of two or more groups. In this case, we are looking to see if there is a significant difference in the scores of occupational stress between the four groups of employees (Group A, Group B, Group C, and Group D).

To calculate ANOVA, we first need to find the overall mean of the entire dataset, which is the sum of all the scores divided by the total number of scores. In this case, the overall mean is: (34+22+21+22+34+32+44+55+12+12+12+15+12+23+45+43+33+44+54+37+45+56+65+67+67+76+54+23+21+34+34+55+66+76+54+34+23+22+11+23) / 40 = 34.475

Next, we need to find the mean for each group. 

For Group A, the mean is (34+22+21+22+34+32+44+55+12+12) / 10 = 30.2 

For Group B, the mean is (12+15+12+23+45+43+33+44+54+37) / 10 = 30.7

For Group C, the mean is (45+56+65+67+67+76+54+23+21+34) / 10 = 49.5 

For Group D, the mean is (34+55+66+76+54+34+23+22+11+23) / 10 = 37.3

Next, we need to calculate the sum of squares for each group, which is the sum of the squared differences between each score and the group mean. 

For Group A, the sum of squares is (34-30.2)^2 + (22-30.2)^2 + (21-30.2)^2 + (22-30.2)^2 + (34-30.2)^2 + (32-30.2)^2 + (44-30.2)^2 + (55-30.2)^2 + (12-30.2)^2 + (12-30.2)^2 = 539.6 

For Group B, the sum of squares is (12-30.7)^2 + (15-30.7)^2 + (12-30.7)^2 + (23-30.7)^2 + (45-30.7)^2 + (43-30.7)^2 + (33-30.7)^2 + (44-30.7)^2 + (54-30.7)^2 + (37-30.7)^2 = 954.3 

For Group C, the sum of squares is (45-49.5)^2 + (56-49.5)^2 + (65-49.5)^2 + (67-49.5)^2 + (67-49.5)^2 + (76-49.5)^2 + (54-49.5)^2 + (23-49.5)^2 + (21-49.5)^2 + (34-49.5)^2 = 578.5 

For Group D, the sum of squares is (34-37.3)^2 + (55-37.3)^2 +(66-37.3)^2 + (76-37.3)^2 + (54-37.3)^2 + (34-37.3)^2 + (23-37.3)^2 + (22-37.3)^2 + (11-37.3)^2 + (23-37.3)^2 = 587.4

Next, we need to calculate the total sum of squares, which is the sum of the squared differences between each score and the overall mean. 
This is: 
(34-34.475)^2 + (22-34.475)^2 + (21-34.475)^2 + (22-34.475)^2 + (34-34.475)^2 + (32-34.475)^2 + (44-34.475)^2 + (55-34.475)^2 + (12-34.475)^2 + (12-34.475)^2 + (12-34.475)^2 + (15-34.475)^2 + (12-34.475)^2 + (23-34.475)^2 + (45-34.475)^2 + (43-34.475)^2 + (33-34.475)^2 + (44-34.475)^2 + (54-34.475)^2 + (37-34.475)^2 + (45-34.475)^2 + (56-34.475)^2 + (65-34.475)^2 + (67-34.475)^2 + (67-34.475)^2 + (76-34.475)^2 + (54-34.475)^2 + (23-34.475)^2 + (21-34.475)^2 + (34-34.475)^2 + (34-34.475)^2 + (55-34.475)^2 + (66-34.475)^2 + (76-34.475)^2 + (54-34.475)^2 + (34-34.475)^2 + (23-34.475)^2 + (22-34.475)^2 + (11-34.475)^2 + (23-34.475)^2 = 5627.8

Finally, we can calculate the ANOVA by using the following formula: F = (Between-Groups Sum of Squares) / (Between-Groups Degrees of Freedom) / (Within-Groups Sum of Squares) / (Within-Groups Degrees of Freedom)

Where: Between-Groups Sum of Squares = (Group A mean - overall mean)^2 * Group A sample size + (Group B mean - overall mean)^2 * Group B sample size + (Group C mean - overall mean)^2 * Group C sample size + (Group D mean - overall mean)^2 * Group D sample size Between-Groups Degrees of Freedom = k-1 (where k is the number of groups) Within-Groups Sum of Squares = total sum of squares - between-groups sum of squares Within-Groups Degrees of Freedom = N-k (where N is the total number of observations and k is the number of groups)

Plugging in the values, 
we get: F = ( (30.2-34.475)^2 * 10 + (30.7-34.475)^2 * 10 + (49.5-34.475)^2 * 10 + (37.3-34.475)^2 * 10 ) / (3) / (539.6+954.3+578.5+587.4) / (36)

With this value of F, we can then compare it to the critical value of F from a F-distribution table with the appropriate degrees of freedom. The critical value will depend on the level of significance chosen for the analysis (usually 0.05) and the degrees of freedom for the numerator (3) and denominator (36). If the calculated value of F is greater than the critical value, we can reject the null hypothesis that there is no significant difference between the means of the groups, and conclude that there is a significant difference in the scores of occupational stress between at least two of the groups.

It's important to note that this is just one example of how to conduct an ANOVA and the results should always be interpreted in the context of the research question and other relevant factors.

Section B

4-Discuss the assumptions of parametric and nonparametric statistics.


Parametric statistics assumes that the data being analyzed follows a specific probability distribution, such as the normal distribution. This assumption allows for the use of mathematical models and statistical tests that rely on the properties of the chosen distribution. Additionally, parametric statistics assumes that the sample being analyzed is a random sample from a larger population, and that the sample size is large enough to accurately represent the population.

Nonparametric statistics, on the other hand, does not make any assumptions about the underlying probability distribution of the data. Instead, it uses methods that are based on the ranks or frequencies of the data, rather than on the actual values. Nonparametric statistics is often used when the data does not meet the assumptions of parametric statistics, such as when the sample size is small or the data is not normally distributed.

One of the main assumptions of parametric statistics is that the data follows a normal distribution. This assumption is important because many statistical tests and models, such as t-tests and ANOVA, rely on the properties of the normal distribution. If the data is not normally distributed, these tests and models may not be accurate. Additionally, parametric statistics assumes that the sample being analyzed is a random sample from a larger population, and that the sample size is large enough to accurately represent the population.

On the other hand, nonparametric statistics does not make any assumptions about the underlying probability distribution of the data. Instead, it uses methods that are based on the ranks or frequencies of the data, rather than on the actual values. Nonparametric statistics is often used when the data does not meet the assumptions of parametric statistics, such as when the sample size is small or the data is not normally distributed.

One of the main advantages of nonparametric statistics is that it is more robust to violations of assumptions. For example, if the data is not normally distributed, nonparametric methods can still be used to draw meaningful conclusions. Additionally, nonparametric methods are often simpler and easier to apply than parametric methods, making them a good choice for data that is difficult to analyze.

In conclusion, parametric and nonparametric statistics make different assumptions about the underlying probability distribution of the data and the sample size. Parametric statistics assumes that the data follows a specific probability distribution and that the sample size is large enough to accurately represent the population. Nonparametric statistics, on the other hand, does not make any assumptions about the underlying probability distribution of the data, and is often used when the data does not meet the assumptions of parametric statistics. Both methods have their own advantages and limitations, and the choice of method depends on the specific characteristics of the data and the research question.

5-Using Spearman’s rank order correlation for the following data: 

Data 1 

21 

12 

32 

34 

23 

34 

21 

22 

12 

29 

 

Data 2 

6 

8 

21 

23 

33 

22 

43 

34 

21 

11 


To calculate the Spearman's rank order correlation for the given data, we first need to rank the data for both Data 1 and Data 2.

Data 1:
21 = 8
12 = 1
32 = 9
34 = 10
23 = 6
34 = 10
21 = 8
22 = 7
12 = 1
29 = 12

Data 2:
6 = 1
8 = 2
21 = 5
23 = 6
33 = 8
22 = 4
43 = 10
34 = 7
21 = 5
11 = 3

Next, we calculate the difference in ranks between Data 1 and Data 2 for each observation (d) and square these differences (d^2).

d^2 = (8-5)^2 + (1-2)^2 + (9-5)^2 + (10-6)^2 + (6-4)^2 + (10-8)^2 + (8-5)^2 + (7-7)^2 + (1-3)^2 + (12-10)^2 = 9 + 1 + 16 + 16 + 4 + 4 + 9 + 0 + 4 + 4 = 67

Finally, we use the following formula to calculate the Spearman's rank order correlation:

rho = 1 - (6d^2)/(n(n^2-1))

where n is the number of observations (in this case, 10)

rho = 1 - (667)/(1099) = 1 - 0.4040 = 0.5960

Therefore, the Spearman's rank order correlation between Data 1 and Data 2 is 0.5960. This indicates a moderate positive correlation between the two data sets.

To interpret the correlation coefficient of 0.5960, it can be noted that the coefficient is positive, meaning that as the ranks of Data 1 increase, the ranks of Data 2 also tend to increase. Additionally, the coefficient is relatively close to 1, indicating a strong correlation between the two data sets. However, it is not a perfect correlation as it is not equal to 1, which means that there are some variations in the data sets. It is also important to note that the correlation coefficient only measures the linear relationship between two variables, and does not take into account any non-linear relationships that may exist.

It is also important to note that the correlation coefficient only measures the linear relationship between two variables, and does not take into account any non-linear relationships that may exist. Additionally, the correlation coefficient is sensitive to outliers and extreme values, so it is important to consider the distribution of the data when interpreting the correlation coefficient.

In conclusion, the Spearman's rank order correlation between Data 1 and Data 2 is 0.5960, indicating a moderate positive correlation between the two data sets. However, it is important to consider the distribution of the data and potential non-linear relationships when interpreting the correlation coefficient.

6-Describe various levels of measurement with suitable examples.

Measurement is the process of assigning numbers to characteristics or attributes of objects or individuals. The level of measurement refers to the type of numbers used and the type of data that can be obtained from those numbers. There are four levels of measurement: nominal, ordinal, interval, and ratio.

The first level of measurement is nominal measurement. Nominal measurement involves assigning labels or names to objects or individuals. For example, assigning a person's gender as "male" or "female" is a nominal measurement. Similarly, assigning a car's color as "red," "blue," or "green" is a nominal measurement. Nominal measurements do not have a specific order or hierarchy, and the numbers used are simply used as labels or names.

The second level of measurement is ordinal measurement. Ordinal measurement involves assigning numbers or labels to objects or individuals in a specific order or hierarchy. For example, assigning a person's educational level as "less than high school," "high school," "college," or "graduate degree" is an ordinal measurement. Similarly, assigning a car's engine size as "small," "medium," or "large" is an ordinal measurement. Ordinal measurements have an order or hierarchy, but the distance between the numbers or labels is not equal.

The third level of measurement is interval measurement. Interval measurement involves assigning numbers to objects or individuals in a specific order or hierarchy with equal distance between the numbers. For example, assigning a person's age in years is an interval measurement. Similarly, assigning a car's temperature in degrees Fahrenheit or Celsius is an interval measurement. Interval measurements have an order or hierarchy, and the distance between the numbers is equal, but there is no true zero point.

The fourth level of measurement is ratio measurement. Ratio measurement involves assigning numbers to objects or individuals in a specific order or hierarchy with equal distance between the numbers and a true zero point. For example, assigning a person's weight in pounds or kilograms is a ratio measurement. Similarly, assigning a car's speed in miles per hour or kilometers per hour is a ratio measurement. Ratio measurements have an order or hierarchy, the distance between the numbers is equal, and there is a true zero point.
In summary, the levels of measurement are: nominal, ordinal, interval, and ratio. Nominal measurement involves assigning labels or names to objects or individuals. Ordinal measurement involves assigning numbers or labels to objects or individuals in a specific order or hierarchy. Interval measurement involves assigning numbers to objects or individuals in a specific order or hierarchy with equal distance between the numbers. Ratio measurement involves assigning numbers to objects or individuals in a specific order or hierarchy with equal distance between the numbers and a true zero point.

7-Explain Kruskall- Wallis ANOVA test and compare it with ANOVA.


The Kruskall-Wallis ANOVA test is a non-parametric statistical test that is used to determine whether there is a significant difference between the means of three or more independent groups. It is similar to the ANOVA test, but it is used when the data does not meet the assumptions of the ANOVA test, such as normality and equal variances.

The Kruskall-Wallis test works by ranking the data from all groups together and then comparing the ranks between groups. It calculates a test statistic known as the H statistic, which is then compared to a critical value to determine the significance of the difference between the groups. If the H statistic is greater than the critical value, it indicates that there is a significant difference between the groups.

One of the main differences between the Kruskall-Wallis test and ANOVA is the type of data that can be used. The ANOVA test requires that the data be normally distributed and have equal variances, while the Kruskall-Wallis test can be used with non-normally distributed data and does not assume equal variances. This makes the Kruskall-Wallis test more flexible and useful in situations where the data does not meet the assumptions of ANOVA.

Another difference is that the Kruskall-Wallis test can only be used to compare three or more groups, while ANOVA can be used to compare two or more groups. Additionally, ANOVA can be used to test for interactions between variables, while the Kruskall-Wallis test cannot.

Overall, the Kruskall-Wallis test and ANOVA are similar in that they both test for significant differences between groups, but they differ in the assumptions they make about the data and the number of groups that can be compared. The Kruskall-Wallis test is a useful alternative to ANOVA when the data does not meet the assumptions of ANOVA, and can provide valuable insights into the differences between groups in these situations.

8-Compute Chi-square for the following data:


Students 

Emotional Intelligence scores 

High
Low

School A 

23 

22 

School B 

12 

18 



To compute the chi-square for this data, we first need to calculate the expected values for each category (high and low emotional intelligence scores) for each school (A and B). We can use the following formula to calculate the expected values:

Expected value = (row total * column total) / grand total

For School A:High emotional intelligence: (45 * 45) / 65 = 34.23
Low emotional intelligence: (45 * 20) / 65 = 10.77

For School B:High emotional intelligence: (30 * 45) / 65 = 20.77
Low emotional intelligence: (30 * 20) / 65 = 6.23

Next, we calculate the chi-square value using the following formula:

(observed value - expected value)^2 / expected value

For School A, High emotional intelligence:

(23 - 34.23)^2 / 34.23 = 2.68

For School A, Low emotional intelligence:

(22 - 10.77)^2 / 10.77 = 8.07

For School B, High emotional intelligence:

(12 - 20.77)^2 / 20.77 = 5.01

For School B, Low emotional intelligence:

(18 - 6.23)^2 / 6.23 = 8.07

Finally, we add up all the chi-square values for each category:

2.68 + 8.07 + 5.01 + 8.07 = 24.83

So the chi-square value for this data is 24.83.

Section C

9-Type I and type II errors.

Type I error, also known as a false positive, occurs when a test incorrectly finds evidence for a certain condition or hypothesis when it does not actually exist. Type II error, also known as a false negative, occurs when a test fails to find evidence for a certain condition or hypothesis when it actually does exist. In simple terms, Type I error is when you reject a true null hypothesis, and Type II error is when you fail to reject a false null hypothesis.

10-Skewness and kurtosis.
Skewness and kurtosis are statistical measures that describe the shape and distribution of a dataset. Skewness measures the asymmetry of a distribution, with a positive skewness indicating a tail extending to the right and a negative skewness indicating a tail extending to the left. Kurtosis measures the peakedness of a distribution, with a high kurtosis indicating a more peaked distribution and a low kurtosis indicating a flatter distribution. Both skewness and kurtosis are important for understanding the characteristics of a dataset and can be used to identify outliers or other anomalies.

11-Point and interval estimations.

Point estimation is the process of estimating an unknown parameter by a single value, known as a point estimate. On the other hand, interval estimation is the process of estimating an unknown parameter by a range of values, known as an interval estimate. Interval estimation provides a range of plausible values for the unknown parameter and includes a level of uncertainty, whereas point estimation provides a single "best guess" value for the unknown parameter.

12-Null hypothesis

The null hypothesis is a statement that there is no significant difference or relationship between two variables. It is often used in statistical testing to determine whether an observed difference or relationship is due to chance or if it is statistically significant. For example, a null hypothesis may be that there is no difference in success rates between two different teaching methods. To test this hypothesis, data would be collected and analyzed to see if there is a significant difference in success rates between the two methods. If the data shows no significant difference, the null hypothesis is accepted, and if there is a significant difference, the null hypothesis is rejected.

13-Scatter diagram

A scatter diagram is a visual representation of the relationship between two variables. It plots each data point on a coordinate plane, with one variable on the x-axis and the other variable on the y-axis. Scatter diagrams can be used to identify patterns and trends in data, such as positive or negative correlation, and to explore potential relationships between variables.

14-Outliers

Outliers are data points that are significantly different from the rest of the data in a dataset. They can be caused by measurement errors, data entry errors, or genuinely unusual events. They can have a big impact on the results of statistical analyses and should be identified and handled appropriately. They can skew the results and make the data unreliable.

15-Biserial correlation

Biserial correlation is a statistical technique used to measure the relationship between a binary variable (such as gender or pass/fail) and a continuous variable (such as test scores or income). It is used to determine if there is a significant difference between the two groups defined by the binary variable and if the continuous variable is related to the binary variable. The value of biserial correlation ranges between -1 and 1, where -1 represents a perfect negative correlation, 0 represents no correlation, and 1 represents a perfect positive correlation.

16-Variance

Variance is a statistical measure that describes the spread of a set of data. It is calculated by taking the average of the squared differences between each data point and the mean of the set. A high variance indicates that the data points are spread out and have a large range, while a low variance indicates that the data points are clustered closely around the mean. Variance is commonly used in financial analysis, risk management, and other fields to understand the level of risk or uncertainty in a given set of data. It is also important in determining the reliability and accuracy of statistical models and predictions.

17-Interactional effect

Interactional effect in statistics refers to the phenomenon where the relationship between two or more variables changes depending on the levels of other variables. This can occur in psychology research when examining the relationship between different variables, such as the relationship between stress and anxiety, which can be affected by other variables such as gender or age. Interactional effects can be difficult to detect and interpret, but they are important to consider in order to fully understand the complex nature of human behavior.

18-Wilcoxon matched pair signed rank test.

The Wilcoxon matched pair signed rank test is a non-parametric statistical test used to determine if there is a significant difference between two related samples. This test is typically used when the data is not normally distributed, and the sample size is small. The test compares the difference between the two samples and calculates a p-value, which indicates the probability that the difference observed is due to chance. If the p-value is less than the significance level (usually 0.05), then it can be concluded that there is a significant difference between the two samples. This test is commonly used in medical research, psychology, and other fields where the data is not normally distributed.

----------------------------------------------------------------------------------------------------------------------------------------------
Please reads the answers carefully if  any error please show in the comment. This answers are not responsible for any objection . All the answers of Assignment are above of the paragraph. If you like the answer please comment and follow for more also If any suggestion please comment or E-mail me. 

Thank You!