Regression and Factor Analysis
Please use the data attached to tackle the case of “Customer Satisfaction at Harver & Boecker”. Please answer the following questions:
- Using regression analysis, locate those variables that best explain the customers’ overall satisfaction. Evaluate the model fit and assess the impact of each variable on the criterion variable. Remember to use collinearity diagnostics.
- Determine the factors that characterize the respondents by means of a factor analysis. Consider the following issues:
(a) Are FA assumptions met?
(b) How many factors should be extracted?
(c) Try to find suitable labels for the extracted factors.
(d) Evaluate the solution’s goodness-of-fit.
- Use the factor scores and regress the customers’ overall satisfaction (overall) on these.
Solution
proc import datafile=”c:\myfiles\Accounts.xls”
out=sasuser.accounts
sheet=”Prices”;
getnames=no;
run;
proc print data=sasuser.accounts(obs=10);
run;
proc import datafile=”C:/Users/Mohamed/Desktop/data.xls”
DBMS=excel
out=Work.data ;
run;
proc print data=sasuser.accounts(obs=10);
run;
PROC IMPORT DATAFILE= “C:\Users\Mohamed\Desktop\data.xlsx”
OUT= WORK.data
DBMS=XLS
REPLACE;
SHEET=”Sheet1″;
GETNAMES=YES;
RUN;
libname assign ‘C:\Users\Mohamed\Desktop\sas’;
PROC IMPORT DATAFILE= “C:\Users\Mohamed\Desktop\data.xls”
OUT= assign.data
dbms=xls
REPLACE;
GETNAMES=YES;
Sheet=”Sheet1″;
RUN;
ods graphics on;
procreg;
model overall = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12;
run;
ods graphics off;
ods graphics on;
procreg;
model overall = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 / tolvifcollin;
run;
ods graphics off;
ods graphics on;
proc factor data=Assign.Data
priors=smcmsa residual
rotate=promax reorder
outstat=fact_all
plots=(scree initloadingspreloadings loadings);
var s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ;
run;
ods graphics off;
data fact2(type=factor);
setfact_all;
if _TYPE_ in(‘PATTERN’ ‘FCORR’) then delete;
if _TYPE_=’UNROTATE’ then _TYPE_=’PATTERN’;
ods graphics on;
proc factor data=Assign.Data
priors=smcmsa residual
rotate=promax reorder
outstat=fact_all
score=fact
plots=(scree initloadingspreloadings loadings);
var s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ;
run;
ods graphics off;
proc factor data=Assign.Data score outstat=fact;
var s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ;
run;
proc score data=Assign.Data score=fact out=scores;
var s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ;
run;
ods graphics on;
procreg data=Scores;
model overall = Factor1 Factor2 factor3;
run;
ods graphics off;
Haver&Boecker is one of the world’s leading providers of filling and screening systems. The company operates a number of facilities in Germany, as well as production plants in the UK, Belgium, USA, Canada, and Brazil. It is a recognized specialist in the fields of weighing, filling, and material handling technology.
Haver&Boecker designs, produces, and markets systems and plants for filling and processing loose bulk materials of every type and, thus, solely operates in industrial markets. The company’s relationships with its customers are usually long-term oriented, and complex.
Since the company’s philosophy is to assist customers and business partners in solving technical problems and innovating new solutions, their products are often customized to the buyers’ needs. Therefore, the customer is no longer a passive buyer, but an active partner. Given this background, the customer’s satisfaction plays an important role in establishing, developing, and maintaining successful customer relationships.
Very early on, the company’s management realized the importance of customer satisfaction and decided to commission a market research project to identify marketing activities that can positively contribute to the business’s overall success. Based on a thorough literature review as well as interviews with experts, the company developed a short survey to explore their customers’ satisfaction with specific performance features and their overall satisfaction. All items were measured on 7-point scales with higher scores denoting higher levels of satisfaction. A standardized survey was mailed to customers in 12 countries worldwide, which yielded 281 fully completed questionnaires. The following items (names in parentheses) were listed in the survey:
- Reliability of the machines and systems (s1)
- Life-time of the machines and systems (s2)
- Functionality and user-friendliness operation of the machines and systems (s3)
- Appearance of the machines and systems (s4)
- Accuracy of the machines and systems (s5)
- Timely availability of the after-sales service (s6)
- Local availability of the after-sales service (s7)
- Fast processing of complaints (s8)
- Composition of quotations (s9)
- Transparency of quotations (s10)
- Fixed product prize for the machines and systems (s11)
- Cost/performance ratio of the machines and systems (s12)
- Overall, how satisfied are you with the supplier (overall)?
- Using regression analysis, let us locate those variables that best explain the customers’ overall satisfaction. The following tables represent the regression outputs from SAS:
SAS command:
odsgraphicson;
procreg; model overall = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12; run; odsgraphicsoff; |
The SAS System |
The REG Procedure
Model: MODEL1
Dependent Variable: OVERALL overall
Number of Observations Read | 281 |
Number of Observations Used | 281 |
Analysis of Variance | |||||
Source | DF | Sum of Squares |
Mean Square |
F Value | Pr > F |
Model | 12 | 198.91066 | 16.57589 | 15.69 | <.0001 |
Error | 268 | 283.07510 | 1.05625 | ||
Corrected Total | 280 | 481.98577 |
Root MSE | 1.02774 | R-Square | 0.4127 |
Dependent Mean | 5.00712 | Adj R-Sq | 0.3864 |
CoeffVar | 20.52559 |
Parameter Estimates | ||||||
Variable | Label | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| |
Intercept | Intercept | 1 | 2.68730 | 0.22756 | 11.81 | <.0001 |
S1 | s1 | 1 | 0.17898 | 0.05198 | 3.44 | 0.0007 |
S2 | s2 | 1 | 0.03091 | 0.04280 | 0.72 | 0.4709 |
S3 | s3 | 1 | 0.05274 | 0.05177 | 1.02 | 0.3092 |
S4 | s4 | 1 | 0.06009 | 0.05042 | 1.19 | 0.2344 |
S5 | s5 | 1 | 0.02594 | 0.04602 | 0.56 | 0.5735 |
S6 | s6 | 1 | -0.00967 | 0.04832 | -0.20 | 0.8416 |
S7 | s7 | 1 | -0.02486 | 0.04157 | -0.60 | 0.5504 |
S8 | s8 | 1 | 0.06262 | 0.04669 | 1.34 | 0.1810 |
S9 | s9 | 1 | 0.06358 | 0.04729 | 1.34 | 0.1799 |
S10 | s10 | 1 | 0.01909 | 0.04504 | 0.42 | 0.6720 |
S11 | s11 | 1 | -0.11662 | 0.04563 | -2.56 | 0.0112 |
S12 | s12 | 1 | 0.16684 | 0.04634 | 3.60 | 0.0004 |
From the regression coefficient table above, we notice that the variables s1,s11 and s12 are significantly associated with our dependent variable overallsince the p-values associated to the t-test are less than the 5% significance level (p-value<0.005)
The coefficient of determination R-Square is of 0.4127 meaning that 41.27% of Overall (s12, how satisfied are you with the supplier (overall)?) variability is explained by the regression model
Collinearity diagnostics
The following table represent the calculations of VIF
SAS command:
odsgraphicson;
procreg; model overall = s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 / tolvifcollin; run; odsgraphicsoff; |
Parameter Estimates | ||||||||
Variable | Label | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| | Tolerance | Variance Inflation |
Intercept | Intercept | 1 | 2.68730 | 0.22756 | 11.81 | <.0001 | . | 0 |
S1 | s1 | 1 | 0.17898 | 0.05198 | 3.44 | 0.0007 | 0.36902 | 2.70989 |
S2 | s2 | 1 | 0.03091 | 0.04280 | 0.72 | 0.4709 | 0.41655 | 2.40068 |
S3 | s3 | 1 | 0.05274 | 0.05177 | 1.02 | 0.3092 | 0.44269 | 2.25890 |
S4 | s4 | 1 | 0.06009 | 0.05042 | 1.19 | 0.2344 | 0.43494 | 2.29917 |
S5 | s5 | 1 | 0.02594 | 0.04602 | 0.56 | 0.5735 | 0.51240 | 1.95160 |
S6 | s6 | 1 | -0.00967 | 0.04832 | -0.20 | 0.8416 | 0.39505 | 2.53133 |
S7 | s7 | 1 | -0.02486 | 0.04157 | -0.60 | 0.5504 | 0.52572 | 1.90214 |
S8 | s8 | 1 | 0.06262 | 0.04669 | 1.34 | 0.1810 | 0.38187 | 2.61872 |
S9 | s9 | 1 | 0.06358 | 0.04729 | 1.34 | 0.1799 | 0.36659 | 2.72783 |
S10 | s10 | 1 | 0.01909 | 0.04504 | 0.42 | 0.6720 | 0.38487 | 2.59826 |
S11 | s11 | 1 | -0.11662 | 0.04563 | -2.56 | 0.0112 | 0.50294 | 1.98829 |
S12 | s12 | 1 | 0.16684 | 0.04634 | 3.60 | 0.0004 | 0.41761 | 2.39456 |
From the table above, the variance inflation coefficients are all smaller than 10 and hence we can securely confirm that we don’t have a multicollinearity problem in our regression model
- Let us determine the factors that characterize the respondents by performing a factor analysis:
- Before performing a factor analysis, it is recommended to check the FA assumptions
- The following results were obtained using SAS:
SAS command:
odsgraphicson;
procfactordata=Assign.Data priors=smcmsa residual rotate=promaxreorder outstat=fact_all plots=(screeinitloadingspreloadings loadings); var s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ; run; odsgraphicsoff; |
Eigenvalues of the Reduced Correlation Matrix: Total = 6.84974007 Average = 0.57081167 | ||||
Eigenvalue | Difference | Proportion | Cumulative | |
1 | 4.99825280 | 3.87778870 | 0.7297 | 0.7297 |
2 | 1.12046410 | 0.32833702 | 0.1636 | 0.8933 |
3 | 0.79212708 | 0.29569712 | 0.1156 | 1.0089 |
4 | 0.49642996 | 0.21084773 | 0.0725 | 1.0814 |
5 | 0.28558224 | 0.32084982 | 0.0417 | 1.1231 |
6 | -.03526758 | 0.02317729 | -0.0051 | 1.1179 |
7 | -.05844487 | 0.03761719 | -0.0085 | 1.1094 |
8 | -.09606207 | 0.01664101 | -0.0140 | 1.0954 |
9 | -.11270307 | 0.03596681 | -0.0165 | 1.0789 |
10 | -.14866989 | 0.03274139 | -0.0217 | 1.0572 |
11 | -.18141127 | 0.02914608 | -0.0265 | 1.0307 |
12 | -.21055735 | -0.0307 | 1.0000 |
3 factors will be retained by the PROPORTION criterion.
As we can notice, the first three largest positive eigenvalues of the reduced correlation matrix account for 100.89% of the common variance.The scree and variance explained plots clearly support the conclusion that three common factors are present.
- The following tables represent the quartimax rotation from type=factor
The SAS System |
The FACTOR Procedure
Rotation Method: Quartimax
Orthogonal Transformation Matrix | |||
1 | 2 | 3 | |
1 | 0.95280 | 0.23208 | 0.19575 |
2 | -0.29862 | 0.83273 | 0.46626 |
3 | -0.05480 | -0.50270 | 0.86272 |
Rotated Factor Pattern | ||||
Factor1 | Factor2 | Factor3 | ||
S3 | s3 | 0.76878 | 0.01891 | 0.01975 |
S1 | s1 | 0.76676 | 0.08647 | -0.17922 |
S6 | s6 | 0.74527 | -0.01233 | 0.20478 |
S2 | s2 | 0.73721 | -0.02689 | -0.20578 |
S4 | s4 | 0.72969 | 0.16241 | -0.06462 |
S8 | s8 | 0.69994 | 0.06638 | 0.26004 |
S5 | s5 | 0.64419 | 0.15131 | 0.00818 |
S7 | s7 | 0.59822 | -0.14136 | 0.31048 |
S9 | s9 | 0.31265 | 0.75647 | 0.13789 |
S10 | s10 | 0.35498 | 0.73627 | 0.10118 |
S11 | s11 | 0.27322 | 0.17054 | 0.65092 |
S12 | s12 | 0.51801 | 0.14334 | 0.53542 |
Variance Explained by Each Factor | ||
Factor1 | Factor2 | Factor3 |
4.6398200 | 1.2463468 | 1.0246772 |
Final Communality Estimates: Total = 6.910844 | |||||||||||
S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 | S11 | S12 |
0.62751549 | 0.58655527 | 0.59176624 | 0.56300275 | 0.43794391 | 0.59751792 | 0.47425167 | 0.56194856 | 0.68900934 | 0.67834519 | 0.52742980 | 0.57555786 |
From the rotated factor pattern, we can suggest labels for the factors extracted
- Factor1 : general characteristics of the machine and systems (variables s1, s2, s3, s4, s5, s6, s7 and s8)
- Factor2 : composition and transparency of quotations (variables s9 and s10)
- Factor3 : cost and prize of the machines and systems (variables s11,s12)
- Goodness of fit
The FACTOR Procedure
Initial Factor Method: Principal Factors
Partial Correlations Controlling all other Variables | |||||||||||||
S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 | S11 | S12 | ||
S1 | s1 | 1.00000 | 0.53364 | 0.08440 | 0.21083 | -0.06904 | 0.02151 | -0.05023 | 0.12656 | 0.04780 | 0.03591 | -0.17162 | 0.19072 |
S2 | s2 | 0.53364 | 1.00000 | 0.21205 | 0.02173 | 0.03700 | 0.08820 | 0.01627 | 0.00369 | -0.12243 | 0.06437 | -0.00185 | -0.05719 |
S3 | s3 | 0.08440 | 0.21205 | 1.00000 | 0.16996 | 0.21912 | 0.12463 | 0.04091 | 0.06178 | 0.01031 | -0.04926 | 0.01954 | 0.09331 |
S4 | s4 | 0.21083 | 0.02173 | 0.16996 | 1.00000 | 0.39920 | 0.02519 | 0.00325 | 0.02253 | -0.03544 | 0.11584 | -0.05966 | 0.09087 |
S5 | s5 | -0.06904 | 0.03700 | 0.21912 | 0.39920 | 1.00000 | 0.09408 | 0.08136 | -0.07459 | 0.13282 | -0.03575 | -0.01364 | 0.03805 |
S6 | s6 | 0.02151 | 0.08820 | 0.12463 | 0.02519 | 0.09408 | 1.00000 | 0.25105 | 0.40322 | -0.04941 | 0.07276 | 0.00267 | 0.01977 |
S7 | s7 | -0.05023 | 0.01627 | 0.04091 | 0.00325 | 0.08136 | 0.25105 | 1.00000 | 0.32910 | -0.14730 | 0.01204 | 0.07300 | 0.06058 |
S8 | s8 | 0.12656 | 0.00369 | 0.06178 | 0.02253 | -0.07459 | 0.40322 | 0.32910 | 1.00000 | 0.28431 | -0.12448 | 0.03698 | 0.01487 |
S9 | s9 | 0.04780 | -0.12243 | 0.01031 | -0.03544 | 0.13282 | -0.04941 | -0.14730 | 0.28431 | 1.00000 | 0.70788 | 0.13231 | -0.09990 |
S10 | s10 | 0.03591 | 0.06437 | -0.04926 | 0.11584 | -0.03575 | 0.07276 | 0.01204 | -0.12448 | 0.70788 | 1.00000 | -0.01613 | 0.12426 |
S11 | s11 | -0.17162 | -0.00185 | 0.01954 | -0.05966 | -0.01364 | 0.00267 | 0.07300 | 0.03698 | 0.13231 | -0.01613 | 1.00000 | 0.62577 |
S12 | s12 | 0.19072 | -0.05719 | 0.09331 | 0.09087 | 0.03805 | 0.01977 | 0.06058 | 0.01487 | -0.09990 | 0.12426 | 0.62577 | 1.00000 |
Kaiser’s Measure of Sampling Adequacy: Overall MSA = 0.83846511 | |||||||||||
S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | S10 | S11 | S12 |
0.84308697 | 0.84677623 | 0.93908778 | 0.90005427 | 0.88364757 | 0.90375303 | 0.88854400 | 0.85856369 | 0.66694994 | 0.72117052 | 0.69422859 | 0.79622332 |
The data are appropriate for the common factor model, because the partial correlations (controlling all other variables) should are small compared to the original correlations
- Let us use the factor scores and regress the customers’ overall satisfaction (overall) on these. First we need to extract the first three factors then run a regression analysis.
SAS commands:
procscoredata=Assign.Datascore=fact out=scores;
var s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 ; run; odsgraphicson; procregdata=Scores; model overall = Factor1 Factor2 factor3; run; odsgraphicsoff; |
Results:
The SAS System |
The REG Procedure
Model: MODEL1
Dependent Variable: OVERALL overall
Number of Observations Read | 281 |
Number of Observations Used | 281 |
Analysis of Variance | |||||
Source | DF | Sum of Squares |
Mean Square |
F Value | Pr > F |
Model | 3 | 175.32101 | 58.44034 | 52.79 | <.0001 |
Error | 277 | 306.66476 | 1.10709 | ||
Corrected Total | 280 | 481.98577 |
Root MSE | 1.05218 | R-Square | 0.3637 |
Dependent Mean | 5.00712 | Adj R-Sq | 0.3569 |
CoeffVar | 21.01378 |
Parameter Estimates | ||||||
Variable | Label | DF | Parameter Estimate |
Standard Error |
t Value | Pr > |t| |
Intercept | Intercept | 1 | 5.00712 | 0.06277 | 79.77 | <.0001 |
Factor1 | 1 | 0.76425 | 0.06288 | 12.15 | <.0001 | |
Factor2 | 1 | -0.05840 | 0.06288 | -0.93 | 0.3539 | |
Factor3 | 1 | -0.19663 | 0.06288 | -3.13 | 0.0020 |
From the regression table we see that Factor1 and Factor3 are significantly associated with our dependent variable since their p-values associated to the t-test are less than 0.005 (5% significance level) but factor2 is not significant at the 5% significance level.
The Coefficient of determination is of 0.3637 meaning that 36.37% is explained by the model.