Regression and Factor Analysis

Regression and Factor Analysis

Please use the data attached to tackle the case of “Customer Satisfaction at Harver & Boecker”.  Please answer the following questions:

  1. Using regression analysis, locate those variables that best explain the customers’ overall satisfaction.  Evaluate the model fit and assess the impact of each variable on the criterion variable.  Remember to use collinearity diagnostics.
  1. Determine the factors that characterize the respondents by means of a factor analysis.  Consider the following issues:

(a) Are FA assumptions met?

(b) How many factors should be extracted?

(c) Try to find suitable labels for the extracted factors.

(d) Evaluate the solution’s goodness-of-fit.

  1. Use the factor scores and regress the customers’ overall satisfaction (overall) on these.

Solution

proc import datafile=”c:\myfiles\Accounts.xls”

out=sasuser.accounts

sheet=”Prices”;

getnames=no;

run;

proc print data=sasuser.accounts(obs=10);

run;

proc import datafile=”C:/Users/Mohamed/Desktop/data.xls”

DBMS=excel

out=Work.data ;

run;

proc print data=sasuser.accounts(obs=10);

run;

PROC IMPORT DATAFILE= “C:\Users\Mohamed\Desktop\data.xlsx”

OUT= WORK.data

DBMS=XLS

REPLACE;

SHEET=”Sheet1″;

GETNAMES=YES;

RUN;

libname assign ‘C:\Users\Mohamed\Desktop\sas’;

PROC IMPORT DATAFILE= “C:\Users\Mohamed\Desktop\data.xls”

OUT= assign.data

dbms=xls

REPLACE;

GETNAMES=YES;

Sheet=”Sheet1″;

RUN;

ods graphics on;

procreg;

model overall = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12;

run;

ods graphics off;

ods graphics on;

procreg;

model overall = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12   / tolvifcollin;

run;

ods graphics off;

ods graphics on;

proc factor data=Assign.Data

priors=smcmsa residual

rotate=promax reorder

outstat=fact_all

plots=(scree initloadingspreloadings loadings);

var s1  s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ;

run;

ods graphics off;

data fact2(type=factor);

setfact_all;

if _TYPE_ in(‘PATTERN’ ‘FCORR’) then delete;

if _TYPE_=’UNROTATE’ then _TYPE_=’PATTERN’;

ods graphics on;

proc factor data=Assign.Data

priors=smcmsa residual

rotate=promax reorder

outstat=fact_all

score=fact

plots=(scree initloadingspreloadings loadings);

var s1  s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ;

run;

ods graphics off;

proc factor data=Assign.Data score outstat=fact;

var s1  s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ;

run;

proc score data=Assign.Data score=fact out=scores;

var s1  s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ;

run;

ods graphics on;

procreg data=Scores;

model overall = Factor1 Factor2 factor3;

run;

ods graphics off; 

Haver&Boecker is one of the world’s leading providers of filling and screening systems. The company operates a number of facilities in Germany, as well as production plants in the UK, Belgium, USA, Canada, and Brazil. It is a recognized specialist in the fields of weighing, filling, and material handling technology.

Haver&Boecker designs, produces, and markets systems and plants for filling and processing loose bulk materials of every type and, thus, solely operates in industrial markets. The company’s relationships with its customers are usually long-term oriented, and complex.

Since the company’s philosophy is to assist customers and business partners in solving technical problems and innovating new solutions, their products are often customized to the buyers’ needs. Therefore, the customer is no longer a passive buyer, but an active partner. Given this background, the customer’s satisfaction plays an important role in establishing, developing, and maintaining successful customer relationships.

Very early on, the company’s management realized the importance of customer satisfaction and decided to commission a market research project to identify marketing activities that can positively contribute to the business’s overall success. Based on a thorough literature review as well as interviews with experts, the company developed a short survey to explore their customers’ satisfaction with specific performance features and their overall satisfaction. All items were measured on 7-point scales with higher scores denoting higher levels of satisfaction. A standardized survey was mailed to customers in 12 countries worldwide, which yielded 281 fully completed questionnaires. The following items (names in parentheses) were listed in the survey:

  • Reliability of the machines and systems (s1)
  • Life-time of the machines and systems (s2)
  • Functionality and user-friendliness operation of the machines and systems (s3)
  • Appearance of the machines and systems (s4)
  • Accuracy of the machines and systems (s5)
  • Timely availability of the after-sales service (s6)
  • Local availability of the after-sales service (s7)
  • Fast processing of complaints (s8)
  • Composition of quotations (s9)
  • Transparency of quotations (s10)
  • Fixed product prize for the machines and systems (s11)
  • Cost/performance ratio of the machines and systems (s12)
  • Overall, how satisfied are you with the supplier (overall)? 
  1. Using regression analysis, let us locate those variables that best explain the customers’ overall satisfaction. The following tables represent the regression outputs from SAS:

SAS command:

odsgraphicson;

procreg;

model overall = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12;

run;

odsgraphicsoff;

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: OVERALL overall

Number of Observations Read 281
Number of Observations Used 281
Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 12 198.91066 16.57589 15.69 <.0001
Error 268 283.07510 1.05625    
Corrected Total 280 481.98577      
Root MSE 1.02774 R-Square 0.4127
Dependent Mean 5.00712 Adj R-Sq 0.3864
CoeffVar 20.52559    
Parameter Estimates
Variable Label DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept Intercept 1 2.68730 0.22756 11.81 <.0001
S1 s1 1 0.17898 0.05198 3.44 0.0007
S2 s2 1 0.03091 0.04280 0.72 0.4709
S3 s3 1 0.05274 0.05177 1.02 0.3092
S4 s4 1 0.06009 0.05042 1.19 0.2344
S5 s5 1 0.02594 0.04602 0.56 0.5735
S6 s6 1 -0.00967 0.04832 -0.20 0.8416
S7 s7 1 -0.02486 0.04157 -0.60 0.5504
S8 s8 1 0.06262 0.04669 1.34 0.1810
S9 s9 1 0.06358 0.04729 1.34 0.1799
S10 s10 1 0.01909 0.04504 0.42 0.6720
S11 s11 1 -0.11662 0.04563 -2.56 0.0112
S12 s12 1 0.16684 0.04634 3.60 0.0004

From the regression coefficient table above, we notice that the variables s1,s11 and s12 are significantly associated with our dependent variable overallsince the p-values associated to the t-test are less than the 5% significance level (p-value<0.005)

The coefficient of determination R-Square is of 0.4127 meaning that 41.27% of Overall (s12, how satisfied are you with the supplier (overall)?) variability is explained by the regression model

Collinearity diagnostics

The following table represent the calculations of VIF

SAS command:

odsgraphicson;

procreg;

model overall = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12   / tolvifcollin;

run;

odsgraphicsoff;

Parameter Estimates
Variable Label DF Parameter
Estimate
Standard
Error
t Value Pr > |t| Tolerance Variance
Inflation
Intercept Intercept 1 2.68730 0.22756 11.81 <.0001 . 0
S1 s1 1 0.17898 0.05198 3.44 0.0007 0.36902 2.70989
S2 s2 1 0.03091 0.04280 0.72 0.4709 0.41655 2.40068
S3 s3 1 0.05274 0.05177 1.02 0.3092 0.44269 2.25890
S4 s4 1 0.06009 0.05042 1.19 0.2344 0.43494 2.29917
S5 s5 1 0.02594 0.04602 0.56 0.5735 0.51240 1.95160
S6 s6 1 -0.00967 0.04832 -0.20 0.8416 0.39505 2.53133
S7 s7 1 -0.02486 0.04157 -0.60 0.5504 0.52572 1.90214
S8 s8 1 0.06262 0.04669 1.34 0.1810 0.38187 2.61872
S9 s9 1 0.06358 0.04729 1.34 0.1799 0.36659 2.72783
S10 s10 1 0.01909 0.04504 0.42 0.6720 0.38487 2.59826
S11 s11 1 -0.11662 0.04563 -2.56 0.0112 0.50294 1.98829
S12 s12 1 0.16684 0.04634 3.60 0.0004 0.41761 2.39456

 From the table above, the variance inflation coefficients are all smaller than 10 and hence we can securely confirm that we don’t have a multicollinearity problem in our regression model

  1. Let us determine the factors that characterize the respondents by performing a factor analysis:
  1. Before performing a factor analysis, it is recommended to check the FA assumptions
  2. The following results were obtained using SAS:

SAS command:

odsgraphicson;

procfactordata=Assign.Data

priors=smcmsa residual

rotate=promaxreorder

outstat=fact_all

plots=(screeinitloadingspreloadings loadings);

var s1  s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ;

run;

odsgraphicsoff;

Eigenvalues of the Reduced Correlation Matrix: Total = 6.84974007 Average = 0.57081167
  Eigenvalue Difference Proportion Cumulative
1 4.99825280 3.87778870 0.7297 0.7297
2 1.12046410 0.32833702 0.1636 0.8933
3 0.79212708 0.29569712 0.1156 1.0089
4 0.49642996 0.21084773 0.0725 1.0814
5 0.28558224 0.32084982 0.0417 1.1231
6 -.03526758 0.02317729 -0.0051 1.1179
7 -.05844487 0.03761719 -0.0085 1.1094
8 -.09606207 0.01664101 -0.0140 1.0954
9 -.11270307 0.03596681 -0.0165 1.0789
10 -.14866989 0.03274139 -0.0217 1.0572
11 -.18141127 0.02914608 -0.0265 1.0307
12 -.21055735   -0.0307 1.0000

3 factors will be retained by the PROPORTION criterion.

As we can notice, the first three largest positive eigenvalues of the reduced correlation matrix account for 100.89% of the common variance.The scree and variance explained plots clearly support the conclusion that three common factors are present.

  1. The following tables represent the quartimax rotation from type=factor
The SAS System

The FACTOR Procedure

Rotation Method: Quartimax

Orthogonal Transformation Matrix
  1 2 3
1 0.95280 0.23208 0.19575
2 -0.29862 0.83273 0.46626
3 -0.05480 -0.50270 0.86272
Rotated Factor Pattern
  Factor1 Factor2 Factor3
S3 s3 0.76878 0.01891 0.01975
S1 s1 0.76676 0.08647 -0.17922
S6 s6 0.74527 -0.01233 0.20478
S2 s2 0.73721 -0.02689 -0.20578
S4 s4 0.72969 0.16241 -0.06462
S8 s8 0.69994 0.06638 0.26004
S5 s5 0.64419 0.15131 0.00818
S7 s7 0.59822 -0.14136 0.31048
S9 s9 0.31265 0.75647 0.13789
S10 s10 0.35498 0.73627 0.10118
S11 s11 0.27322 0.17054 0.65092
S12 s12 0.51801 0.14334 0.53542
Variance Explained by Each Factor
Factor1 Factor2 Factor3
4.6398200 1.2463468 1.0246772
Final Communality Estimates: Total = 6.910844
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12
0.62751549 0.58655527 0.59176624 0.56300275 0.43794391 0.59751792 0.47425167 0.56194856 0.68900934 0.67834519 0.52742980 0.57555786

From the rotated factor pattern, we can suggest labels for the factors extracted

  • Factor1 : general characteristics of the machine and systems (variables s1, s2, s3, s4, s5, s6, s7 and s8)
  • Factor2 : composition and transparency of quotations (variables s9 and s10)
  • Factor3 : cost and prize of the machines and systems (variables s11,s12) 
  1. Goodness of fit

The FACTOR Procedure

Initial Factor Method: Principal Factors

Partial Correlations Controlling all other Variables
  S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12
S1 s1 1.00000 0.53364 0.08440 0.21083 -0.06904 0.02151 -0.05023 0.12656 0.04780 0.03591 -0.17162 0.19072
S2 s2 0.53364 1.00000 0.21205 0.02173 0.03700 0.08820 0.01627 0.00369 -0.12243 0.06437 -0.00185 -0.05719
S3 s3 0.08440 0.21205 1.00000 0.16996 0.21912 0.12463 0.04091 0.06178 0.01031 -0.04926 0.01954 0.09331
S4 s4 0.21083 0.02173 0.16996 1.00000 0.39920 0.02519 0.00325 0.02253 -0.03544 0.11584 -0.05966 0.09087
S5 s5 -0.06904 0.03700 0.21912 0.39920 1.00000 0.09408 0.08136 -0.07459 0.13282 -0.03575 -0.01364 0.03805
S6 s6 0.02151 0.08820 0.12463 0.02519 0.09408 1.00000 0.25105 0.40322 -0.04941 0.07276 0.00267 0.01977
S7 s7 -0.05023 0.01627 0.04091 0.00325 0.08136 0.25105 1.00000 0.32910 -0.14730 0.01204 0.07300 0.06058
S8 s8 0.12656 0.00369 0.06178 0.02253 -0.07459 0.40322 0.32910 1.00000 0.28431 -0.12448 0.03698 0.01487
S9 s9 0.04780 -0.12243 0.01031 -0.03544 0.13282 -0.04941 -0.14730 0.28431 1.00000 0.70788 0.13231 -0.09990
S10 s10 0.03591 0.06437 -0.04926 0.11584 -0.03575 0.07276 0.01204 -0.12448 0.70788 1.00000 -0.01613 0.12426
S11 s11 -0.17162 -0.00185 0.01954 -0.05966 -0.01364 0.00267 0.07300 0.03698 0.13231 -0.01613 1.00000 0.62577
S12 s12 0.19072 -0.05719 0.09331 0.09087 0.03805 0.01977 0.06058 0.01487 -0.09990 0.12426 0.62577 1.00000
Kaiser’s Measure of Sampling Adequacy: Overall MSA = 0.83846511
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12
0.84308697 0.84677623 0.93908778 0.90005427 0.88364757 0.90375303 0.88854400 0.85856369 0.66694994 0.72117052 0.69422859 0.79622332

 The data are appropriate for the common factor model, because the partial correlations (controlling all other variables) should are small compared to the original correlations

  1. Let us use the factor scores and regress the customers’ overall satisfaction (overall) on these. First we need to extract the first three factors then run a regression analysis.

SAS commands:

procscoredata=Assign.Datascore=fact out=scores;

var s1  s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ;

run;

odsgraphicson;

procregdata=Scores;

model overall = Factor1 Factor2 factor3;

run;

odsgraphicsoff;

Results:

The SAS System

The REG Procedure

Model: MODEL1

Dependent Variable: OVERALL overall

Number of Observations Read 281
Number of Observations Used 281
Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 3 175.32101 58.44034 52.79 <.0001
Error 277 306.66476 1.10709    
Corrected Total 280 481.98577      

 

Root MSE 1.05218 R-Square 0.3637
Dependent Mean 5.00712 Adj R-Sq 0.3569
CoeffVar 21.01378    
Parameter Estimates
Variable Label DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept Intercept 1 5.00712 0.06277 79.77 <.0001
Factor1   1 0.76425 0.06288 12.15 <.0001
Factor2   1 -0.05840 0.06288 -0.93 0.3539
Factor3   1 -0.19663 0.06288 -3.13 0.0020

From the regression table we see that Factor1 and Factor3 are significantly associated with our dependent variable since their p-values associated to the t-test are less than 0.005 (5% significance level) but factor2 is not significant at the 5% significance level.

The Coefficient of determination is of 0.3637 meaning that 36.37% is explained by the model.