• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/30

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

30 Cards in this Set

  • Front
  • Back
Datum
an item of information
Data Warehouse
large data base of information collected by company
Data mining
using data to make predictions or make decisions
Metadata- contains all information about data
Who-Specific case, what data is describing
What-what about case was recorded/ measured
When-Time?
Why- reason for examining data
Where- actual location
Rows- cases
Columns- variable
Respondent- individuals in survey
Subjects/participants- people in experiment
Experimental units- if not people in experiment
Relational database
two or more tables linked together so info can be merged across them.
-adds clarity
-keep track of transactions better instead of having one huge data table with many columns for just one customer
Categorical Vs. Quantitative
categorical- can't use math, doesn't have specific numerical units. Nominal- categorical Ordinal- intrinsic order involved such as Freshman, Sophomore, Junior, Senior.
Quantitative- numerical, it has UNITS ; PERCENTAGES
Identifier variable
unique type of categorical, assigned to each individual.
Example- Social security, ID number
Time Series VS Cross Sectional
time series- variables measured at regular intervals of time. Ex- Every week, month, year
Cross Sectional- Several variables measured at relatively same point in time.
3 Rules of Sampling
1) Make a sample- examine a part of a whole.
2) Randomization ensures every member of population is accounted for
3) Sample size is what matters not size of population
Population
entire group of individuals in which we hope to learn from
Population parameter
the valued answer of the population
Sampling frame
what list you are choosing from for the sample
Sample
the subset that responds/ represents the data that is used to learn from

The size of the sample is what matters not the size of the population (as long as sample is representative)
Voluntary Response bias
when individuals can choose on their own if they wish to participate in sample.
-People who participate are more likely to polarize on whatever the issue is.
Undercoverage bias
when some portion of the population is not sampled at all
Nonresponse bias
large fraction of those sampled failed to respond
Response Bias
when a survey design influences responses
Sampling error/ variability

Measurement error
differences in responses between random samples

built in bias of sampling.
Convenience sampling
sample consisting of individuals who are conveniently available
Simple Random Sample
a sample drawn so that every possible sample of the size we plan to draw has equal chance of being selected.
Stratified Random Sampling
Put population into homogeneous groups and then use random sampling within each stratum
-ensures sample represents diff groups in population
Cluster Sampling
Putting population into clusters at random and perform census within each cluster
-Useful for when you don't have a big list to choose from, ex- getting poles from counties
-more practical
-saves money
Systematic Sampling
selecting individuals in a selected order.
Multistage Sampling
a more complicated form of cluster sampling in which larger clusters are further subdivided into smaller, more targeted groupings for the purposes of surveying.
Frequency Vs Relative Frequency table
Frequency- shows the number of a variable
Relative- shows the percentage of the variable to the whole
Area Principle
can't represent data with 2 different dimensions.
-must fix height or width and change other
Contingency table
table that shows how an individual is distributed under one variable which is contingent upon another
Independent
no relationship between 2 variables
Simpson's Paradox
when percentages across groups contradict the overall percentages.
- Group A) 90/100= 90% and 10/20=50% Total: 83%
- Group B) 19/20=95% and 75/100= 75% Total: 78%