Intro to Probability & Statistics for Data science


To become an expert in the domain of Data science we should have sound knowledge on Probability and Statistics
.
                                                                          STATISTICS

Statistics is the mathematical science behind the problem "what I can know about the population if I'm unable to reach every member?"
If we could measure the height of every resident of India, then we could make a statement about the average height of the Indians at the time we took our measurement, there is where RANDOM SAMPLING comes in
If we take a reasonably sized  random sample of Indians and measure their heights, we can form a Statistical inference about the population of Indians 
Probability helps us know how sure we are of our conclusion

                                                                                    DATA

Data is nothing but the collected observations we have about something
Data can of two types:
1.Continuous data-
Continuous data is quantitative data that can be measured it has an infinite number of possible values within a selected range
Ex: "What is the stock price?",Temperature range

2.Categorical data-
Categorical variables represent types of data which may be divided into groups
Ex: "What car has the best repair history?",Race ,sex,age group,Education level 

Here you can ask for the question i.e; why we should have data or what can we get out of data

Data helps us to  know:
"what relationships if any exist between two events?"
"Do people who eat an apple a day enjoy fewer doctor's visits than those who don't?"
"Based on a user's click history which ad  is more likely to bring them to our site?"
"Helps us predict future behavior to guide business decisions 

 Measuring Data

Levels of Measurement:
There are four levels of measurement :

Nominal: Predetermined categories and can't be sorted  
 Ex:Animal classification(mammal ,fish ,reptile)
       Political party(republican ,democrat ,independent)
Ordinal: Can be sorted and lacks scale
Ex: Survey responses
Interval: Provides scale and lacks a zero point
Ex: Temperature
Ratio: values have a true zero point 
Ex: Age, Weight, Salary


I Hope this explanation is clear, I will be back with another topic "Measures of central tendency".







Comments

Popular posts from this blog

Correlation, Covariance and Causation