Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
24 Cards in this Set
- Front
- Back
Data mining definition
|
previously unknown and potentially useful patterns from lots of data
|
|
Data mining goal
|
a single target or outcome variable
|
|
supervised learning
|
target out come, training data, classification and prediction
|
|
unspervised learning
|
segment data, no target, assocation, visualization, reduction
|
|
overfitting
|
to much fit on a dataset, won't fit with new data
|
|
address overfitting issue with
|
training and validation sets
|
|
normalizing data
|
puts all variables on same scale
|
|
Association Rule, supervised or unsupervised?
|
Unsupervised
|
|
AR interpret Confidence
|
60% MEANS THAT 60% of customers that purchased A also bought B
|
|
Assocation Rules IF and then parts are called...
|
antecedent and consequent
|
|
Confidence % =
|
support(a,b)/support(a)
|
|
AR Lift =
|
confidence/support(b)
|
|
If lift < 1 then...
|
better off randomly choosing to get B
|
|
Supervised learning
|
you are trying to predict a variable, a specific outcome
|
|
This model can handle missing values
|
CART
|
|
Limitation of Logit
|
ANN is quicker, Ann has no hidden layer during logit or MLR
|
|
Calculate Logit P
|
p=1/(1+e^-z)
|
|
A benefit of ANN
|
can compute MLR or Logit with no hidden layers
|
|
MAE mEAN aBSOLUTE eRROR
|
average absolute value of errors
|
|
Average error
|
average of all errors
|
|
RMSE
|
square errors, find average, take sqrt
|
|
odds
|
where P is probability % p/(1-p)
|
|
How many clusters if trying to extract sub pops?
|
<10
|
|
How many cluster if trying to understand the major population?
|
>10
|