Posts

Showing posts from November, 2018

Clear Explanation on "PROBABILITY"

Image
P robability is a value between 0 and 1 that a certain event will occur  We can calculate probability by dividing the number of events by the number of possible outcomes  by doing that we can find out the probability for the single event occurring Example the probability that a fair coin will come up heads is 0.5  mathematically we write:     p(E heads) = 0.5 The act of flipping a coin is called a "trail" Each trail of flipping a coin can be called an "Experiment" Each mutually exclusive outcome is called a "simple event" The sample space is the sum of every possible simple event Experiments and sample space: Consider rolling a six-sided die One roll is an experiment The simple events are: E1 = 1  E2 = 2   E3 =3   E4 =4  E5 =5, E6 =6          -----------(Experiments) therefore, the sample space is : S={E1,E2,E3,E4,E5,E6} The probability that a fair die will roll a six: the simple event is E6 = 6(

Correlation, Covariance and Causation

Image
A measure used to represent how strongly two random variables are related known as " Correlation "  "Covariance" is nothing but a measure of correlation   Correlation refers to the scaled form of covariance. Correlation is dimensionless,i.e, it is a   unit free measure of the relationship between variables  Covariance indicates how two variables are related.                           A  positive  covariance means   the  variables  are positively related, while a negative covariance means  the variables are inversely related. To understand covariance clearly we should know what is difference covariance and variance "Variance"  refers to the spread of the data set --how far apart the numbers are in relation to the mean  " Covariance" refers to the measure of how two random variables will change together and are used to calculate the correlation between variables. So, we can clearly understand by above  that to find the correla

InterQuartile Range( IQR)

Image
              A way to describe data is through quartiles and the interquartile range(IQR) Above we have some values that are divided into 3 quartiles ranges  Lower Quartile(Q1) = 14 Median Quartile(Q2) = 22 Upper Quartile(Q3) = 35 A boxplot is used  for easy understanding of  quartiles and outliers in the data In the above boxplot, we can see the minimum value and three quartile ranges and the max value here 60 may be an outlier for finding  a perfect outlier we set a fence that indicates the outlier in the data for finding IQR  we have a formula for finding it              IQR = Q3 - Q1 =35-14 Interquartile range  = 21 Outliers here are defined as observations that fall below Q1 - 1.5 IQR or above Q3 + 1.5 IQR Lower  Q1  - 1.5(IQR)   =14 - 1.5(21) = -17.5                                              Upper Q3 + 1.5(IQR) =  35 + 1.5(21) = 66.5                                             -  Here in this data, the perfect outliers

Measurement type Dispersion

Image
The purpose of measures of dispersion is to find out how to spread out the data values are on the number line.  The range of a set of  data  is the difference between the largest and smallest values                    Range = Maximum value - Minimum value       Range = 39-9        =30 VARIANCE  is the expectation of the squared deviation of a random variable from its mean, It measures how far a set of numbers are spread out from their average value . Sample variance population variance             1.Calculating the Sample variance for the values: 4, 7, 8,9, 11                                  We have to find mean for the following values : 4 + 7 + 8 + 9 + 11 / 5 = 39 / 5                                                                                                                                                                           Mean = 7.8                                                  S^2   = (4-7.8)^2 + (7-7.8)^2 + (

Measures of Central Tendency

Image
In statistics, a central tendency is a central or typical value for a probability distribution may also be called a center or location of the distribution. Most common central tendency measures are: 1.Arithmetic mean 2.Median and  3.Mode 1 .MEAN :     Mean is nothing but " calculated average "   Ex: {3,4,5}    Mean = ( Number of observations / total number of observations)                                       Mean=  (3 + 4 + 5) / 3 = 4.                                     2. Median :        Median is "Middle value" in the given observations   Ex:  Median for odd numbers                            Median for even numbers   Mode:     "Most occurring value in the given observations".              The main advantage of these are used to find the  location of the data but fails to describe the shape      of the data     N ote:-      Mean can be influenced by outliers so it is highly recomme

Intro to Probability & Statistics for Data science

Image
To become an expert in the domain of Data science we should have sound knowledge on Probability and Statistics .                                                                           STATISTICS Statistics is the mathematical science behind the problem "what I can know about the population if I'm unable to reach every member?" If we could measure the height of every resident of India, then we could make a statement about the average height of the Indians at the time we took our measurement, there is where RANDOM SAMPLING comes in If we take a reasonably sized  random sample of Indians and measure their heights, we can form a Statistical inference about the population of Indians  Probability helps us know how sure we are of our conclusion                                                                                     DATA Data is nothing but the collected observations we have about something Data can of two types: 1. Continuous data