• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/34

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

34 Cards in this Set

  • Front
  • Back

The Pearson coefficient of correlation r equals 1 when there is no:

unexplained variation The unexplained variation is based on the residuals. The relationship isdeterministic (all points fall on a straight line) when r = 1, so all residuals will be 0.

In a regression problem, if the coefficient of determination is 0.95, this means that:

b. 95% of the variation in y can be explained by the variation in x

In simple linear regression, which of the following statements indicate no linear relationship between thevariables x and y?

b. Coefficient of correlation is 0.0

A scatter diagram includes the following data points:


x 3 2 5 4 5


y 8 6 12 10 14Two regression models are proposed:


(1)yˆ1.2 + 2.5x, and


(2)yˆ3 + 2.0x.


Using the least squares method,which of these regression models provides the better fit to the data? Why?

The better equation is (1). It is the one that results in the LOWER SSE. Find the residuals using both equations;square them; sum the squared residuals.
sse= y - y hat

In a least squares regression using 1 predictor variable and 30 observations, the sum of squares for error is60 and the sum of squares for regression is 140. The coefficient of determination ( R^2) is:

R^2 = SSR/SST = SSR/SSR+SSE


R^2 = 140/140+60 = .70

If the coefficient of determination is equal to 1, then the coefficient of correlation:

Can be either -1 or +1

R2= 1 implies a deterministic relationship, i.e., all residuals = 0.Perfect relationships exist when correlation is either 1 or -1.

If an estimated regression line has a y-intercept of 10 and a slope of 4, then when x = 2 the actual value of yis:

d. Unknown We can make a prediction,yˆ, but we don’t have enough information (the sample data, forexample) to know the actual values of y.

In a simple linear regression problem, the following sum of squares are produced:( ) 200 2y y i ( ˆ ) 50 2i iy y, and( ˆ ) 150 2y y i. The percentage of the variation in y that isexplained by the variation in x is:

75% --- R2= SSR ÷ SST = 150 ÷ 200

A regression analysis between sales (in $1000) and advertising (in $100) resulted in the following leastsquares line:yˆ= 75 +6x. This implies that if advertising is $800, the predicted amount of sales (in dollars) is:

$123,000 = (75 + 6(8)) × $1000

The adjusted multiple coefficient of determination is adjusted for

the number of independent variables

In multiple regression analysis, the correlation among the independent variables is termed

multicollinearity.

A marketing manager of a pharmacy chain wants a regression model to predict sales in the greetingcard department. Her data set includes two qualitative variables: the pharmacy neighborhood (urban,suburban, and rural) and lighting level in the greeting card department (soft, medium, and bright). Thenumber of dummy variables needed in the regression model is

4. It will take 2 dummy (indicator) variables to model neighborhood’s 3 categories, andit will require 2 dummy variables to model the 3 categories of lighting level.

Which of the following statistics and procedures can be used to determine whether a simple linearmodel should be employed?

a. The standard error of estimate


b. The coefficient of determination


c. The t-test of the slope


d. All of the above

In a multiple regression analysis involving 25 data points, the standard error of estimate squared iscalculated as1.8 and the sum of squares for error as SSE = 36. Then, the number of theindependent variables (p) must be:

standard error of est= sse/n-p-1


1.8 = 36/25-p-1 = 4



In a multiple regression model, the mean of the probability distribution of the error variableisassumed to be:

0.0

Rank order the predictors, from highest to lowest, in terms of their strength of linear relationship withCatch.

Answer: Strength of linear relationship is measured by correlation. Look at the matrix of correlations.Remember, the top number in each cell is the correlation. Higher correlation indicates strongerrelationship.Structure, Access, Homes, Lakesize

What population model has been estimated? With catch, access, homes, lakeside, structure

y= b0+b1x1+b2x2+b3x3+b4x4+e

For the test of overall significance of this model, what are the correct null and alternative hypotheses,the value of the test statistic, the value of the critical point(s), decision and conclusion?

T. S. = F-ratio = 27.98


Critical Point = (use table, df1= # variables, df2= n-# variables


F0.05, 4, 15 = 3.06 (Reject H0 if the value of F-T.S. > F-critical)




Decision:


Reject H0Conclusion: At least one of the predictors, # of homes, lake size, public access, and/or structureindex, provides significant explanation of the catch of bass

According to the tests of individual significance, which predictors are useful? Use appropriate statisticalinformation to explain/support your answer

use the P-valuesand compare them to α = 0.05

) Create and identify indicator variables to represent the nominal variable Working Shift (8:00am to 4:00 pm, 4:00 pm to 12:00 am, 12:00 am to 8:00 am) in a regression model.

Since Working Shift has 3 categories, we must define 2 indicator variables. You canchoose to define any two of the three categories with indicators. For example,




X1 = 1 if the shift is 8:00 am to 4:00 pm


0 otherwise


X2 = 1 if the shift is 4:00 pm to 12:00 am


0 otherwise




If X1 and X2 are both assigned the value 0, then the shift must be 12:00 am to 8:00 am

Which of the following does not constitute a time series?

A. the number of kilowatts of electricity used by a firm each week last year


B. the daily high temperature in a city for the past month


C. annual revenues last year for all the companies in an industry ---- only one measurement intime (last year) for each company


D. annual household income of one family from 1960 through last year

Decreased sales due to a fire at a meat packing plant is an example of a(n) ______ component.

D. irregular ---- rare, unpredictable event

The time series component that reflects a long-term, relatively smooth pattern or directionexhibited by a time series over a long time period (more than one year) is called:

A. long – term trend

In exponentially smoothed time series, the smoothing constant ω is chosen on the basis of howmuch smoothing is required. In general, which of the following statements is true?

A. A small value of ω such as ω = 0.1 results in very little smoothing, while a large value such asω = 0.9 may result in excessive smoothing.


B. A small value of ω such as ω = 0.1 may result in excessive smoothing, while a large value such asω = 0.9 results in very little smoothing.


C. A small value of ω such as ω= 0.1 and a large value such as ω = .9 may both result in very littlesmoothing


D. A small value of ω such as ω = 0.1 and a large value such as ω = 0.9 may both result in excessivesmoothing

In general, the trend component of a time series can be well specified by using:

C. regression analysis

If we want to measure the trend and seasonal variations on stock market performance by monthusing regression analysis, how many indicator variables would be required?

11 indicator variables --- We would need to “identify” in which month, out of 12 possibilities,each observation occurs. Eleven of the months are identified explicitly by the 11 indicatorvariables. The 12th month would be implicitly specified when each of the 11 indicator variablesis assigned the value 0.

Which of the following methods is/are appropriate for forecasting a time series when the trend,cyclical, and seasonal components of the series are not significant?

A. Moving averages


B. Simple exponential smoothing


C. Mean absolute deviation


D. Decomposition

The trend liney = 0.70 +0.005t was calculated from quarterly data for 2000–2004, where t = 1 forthe first quarter of 2000. The trend value for the second quarter of the year 2005 is:

0.810 ---- t = 22 in Q2 of 2005 ---- count the observations in order --- 4 observations per year for5 years = 20 observations plus 2 for quarters 1 and 2 of 2005:y= 0.70 + 0.005(22)

Discuss when simple exponential smoothing is not recommended as a forecasting tool

Simple exponential smoothing should only be used with data that exhibit not notable components like trend and seasonality. These data exhibit noticeable trend and very strong seasonal patterns.

A trend forecast of 125 for a future time period has been made. The seasonal index for that timeperiod is 1.10. The seasonally adjusted forecast is

(125)(1.10) = 137.5.

In an autoregression forecasting model, the independent variable(s) is (are)

A. time-lagged values of the dependent variable

A quadratic equation fitted to annual time series data is y=7.5 -0.25t+ 3.5t^2 , where t = 1 for1997. The forecasted value for 2004 is:

229.5 = 7.5 – 0.25(8) + 3.5(82)

Model MAD


Linear Trend 1.38


Quadratic Trend 1.22


Exponential Trend 1.39


Autoregressive 0.71




Based on the MAD criterion, the most appropriate model is

autoregressive --- MAD is an average of the forecast errors in absolute value. Errors arenever desired. Lowest amount of error is best

How can you tell which model produced more accurate forecasts?

The model with smaller MAD and MAPE