Intro to Probability & Statistics for Data science
To become an expert in the domain of Data science we should have sound knowledge on Probability and Statistics
.
STATISTICS
Statistics is the mathematical science behind the problem "what I can know about the population if I'm unable to reach every member?"
If we could measure the height of every resident of India, then we could make a statement about the average height of the Indians at the time we took our measurement, there is where RANDOM SAMPLING comes in
If we take a reasonably sized random sample of Indians and measure their heights, we can form a Statistical inference about the population of Indians
Probability helps us know how sure we are of our conclusion
DATA
Data is nothing but the collected observations we have about something
Data can of two types:
1.Continuous data-
Continuous data is quantitative data that can be measured it has an infinite number of possible values within a selected range
Ex: "What is the stock price?",Temperature range
2.Categorical data-
Categorical variables represent types of data which may be divided into groups
Ex: "What car has the best repair history?",Race ,sex,age group,Education level
Here you can ask for the question i.e; why we should have data or what can we get out of data
Data helps us to know:
"what relationships if any exist between two events?"
"Do people who eat an apple a day enjoy fewer doctor's visits than those who don't?"
"Based on a user's click history which ad is more likely to bring them to our site?"
"Helps us predict future behavior to guide business decisions
Measuring Data
Levels of Measurement:
There are four levels of measurement :
Nominal: Predetermined categories and can't be sorted
Ex:Animal classification(mammal ,fish ,reptile)
Political party(republican ,democrat ,independent)
Ordinal: Can be sorted and lacks scale
Ex: Survey responses
Interval: Provides scale and lacks a zero point
Ex: Temperature
Ratio: values have a true zero point
Ex: Age, Weight, Salary
I Hope this explanation is clear, I will be back with another topic "Measures of central tendency".
Comments
Post a Comment