DatamindS

Posts

Correlation, Covariance and Causation

November 26, 2018

A measure used to represent how strongly two random variables are related known as " Correlation " "Covariance" is nothing but a measure of correlation Correlation refers to the scaled form of covariance. Correlation is dimensionless,i.e, it is a unit free measure of the relationship between variables Covariance indicates how two variables are related. A positive covariance means the variables are positively related, while a negative covariance means the variables are inversely related. To understand covariance clearly we should know what is difference covariance and variance "Variance" refers to the spread of the data set --how far apart the numbers are in relation to the mean " Covariance" refers to the measure of how two random variables will change together ...

InterQuartile Range( IQR)

November 23, 2018

A way to describe data is through quartiles and the interquartile range(IQR) Above we have some values that are divided into 3 quartiles ranges Lower Quartile(Q1) = 14 Median Quartile(Q2) = 22 Upper Quartile(Q3) = 35 A boxplot is used for easy understanding of quartiles and outliers in the data In the above boxplot, we can see the minimum value and three quartile ranges and the max value here 60 may be an outlier for finding a perfect outlier we set a fence that indicates the outlier in the data for finding IQR we have a formula for finding it IQR = Q3 - Q1 =35-14 Interquartile range = 21 Outliers here are defined as observations that fall below Q1 - 1.5 IQR or above Q3 + 1.5 IQR Lower Q1 - 1.5(IQR) =14 - 1.5(21) = -17.5 ...

Measurement type Dispersion

November 22, 2018

The purpose of measures of dispersion is to find out how to spread out the data values are on the number line. The range of a set of data is the difference between the largest and smallest values Range = Maximum value - Minimum value Range = 39-9 =30 VARIANCE is the expectation of the squared deviation of a random variable from its mean, It measures how far a set of numbers are spread out from their average value . Sample variance population variance 1.Calculating the Sample variance for the values: 4, 7, 8,9, 11 We have to find mean for the following values : 4 + 7 + 8 + 9 + 11 / 5 = 39 / 5 ...

Measures of Central Tendency

November 21, 2018

In statistics, a central tendency is a central or typical value for a probability distribution may also be called a center or location of the distribution. Most common central tendency measures are: 1.Arithmetic mean 2.Median and 3.Mode 1 .MEAN : Mean is nothing but " calculated average " Ex: {3,4,5} Mean = ( Number of observations / total number of observations) Mean= (3 + 4 + 5) / 3 = 4. 2. Median : Median is "Middle value" in the given observations Ex: Median for odd numbers Median for even numbers ...

Intro to Probability & Statistics for Data science

November 21, 2018

To become an expert in the domain of Data science we should have sound knowledge on Probability and Statistics . STATISTICS Statistics is the mathematical science behind the problem "what I can know about the population if I'm unable to reach every member?" If we could measure the height of every resident of India, then we could make a statement about the average height of the Indians at the time we took our measurement, there is where RANDOM SAMPLING comes in If we take a reasonably sized random sample of Indians and measure their heights, we can form a Statistical inference about the population of Indians Probability helps us know how sure we are of our conclusion ...

Welcome to Data Science

September 27, 2018

From past few years, we are hearing about a buzzing word Artificial intelligence what exactly AI i am going to explain about this As we are traveling into the 21st century there are a lot of technology advancements, Automation cuts down the operating cost of the company. Statistics, Programming, Machine learning, Neural networks combine to form Artificial intelligence. I am going to explain about each element in AI. Statistics : Statistics are a collection of statistical tools which are used to quantitative describe or summarize a collection of data Descriptive statistics aim to summarize and as such can be distinguished from inferential statistics which are more predictive in nature Programming: Python and R programming is extensively used in analyzing data, R is statistical programming language to analyze and visualize data Machine learning: ML is a field of computer science that uses statistical techniques to give computer systems th...

Search This Blog