- Introduction
The use of partial least squares (PLS) as an alternative to factor-based structural equation modeling (SEM) has been increasing in various fields(Hair et al., 2022,Jöreskog & Wold, 1982,).. PLS-SEM applications have grown significantly, especially in social sciences (e.g., Ali et al., 2018; Ringle et al., 2020; Willaby et al., 2015), and other scientific disciplines such as agricultural science, engineering, environmental science, and medicine. This growth can be attributed to user-friendly software, guideline articles, and textbooks that make the method accessible to non-technical users.
Hair et al. (2012) paper has received a significant number of citations, indicating its lasting influence on marketing and consumer behavior. The researchers examined over 200 PLS SEM studies published in the top 30 marketing publications between 1981 and 2010. In their evaluation, Hair et al. (2012, p. 428) assessed PLS-SEM applications based on various factors such as model properties and assessment procedures. The authors draw attention to misapplications of the technique, even in top-tier marketing journals. They point out that researchers sometimes misapply measures and do not fully capitalize on the criteria available for model assessment. The authors also provide comprehensive recommendations for algorithmic settings, measurement, and structural model assessment criteria, as well as supplementary studies. These guidelines serve as a foundation for the method's subsequent developments and applications. Since then, there have been numerous methodological advancements in the PLS-SEM sector (Hwang et al., 2020; Khan et al., 2019). Research has contributed to a better understanding of the approach (e.g., Rigdon, 2012), introduced new measures (e.g., Liengaard et al., 2021), and addressed model formulation and data concerns (e.g., Sarstedt et al., 2016). These developments are important for users of PLS-SEM. In recent years, there have been several developments in the area of PLS-SEM. Some of these developments include guidelines for diagnosing and treating endogeneity (Bendler & Huang, 2014; Dijkstra & Henseler, 2015; Kock, 2019) and new techniques for evaluating discriminant validity (Henseler et al., 2015), which emerged from debates about the overall effectiveness of PLS-SEM (Evermann & Rönkkö, 2021). Additionally, best practices in PLS-SEM have become more established, particularly in the areas of measurement and structural model assessment (Hair et al., 2020b). But is the use of PLS-SEM in marketing research understood and accepted? Have researchers included the most recent recommendations for best practices? Are misapplications occurring as previously noted by Hair et al. (2012)
To understand the relationship between data, measurement, and model estimation in Partial Least Square Structure Equation Modelling (PLS-SEM), it's important to grasp three key points. First, in PLS-SEM, all indicators of formative measurement models are treated as composite indicators. Therefore, a formatively specified construct in PLS-SEM does not have an error term, unlike causal indicators in factor-based SEM (Diamantopoulos 2011). When using partial least squares structural equation modeling (PLS-SEM) with data from a common factor model population, the parameter estimates may deviate from the specified values. In this case, the measurement model parameters are often overestimated and the structural model parameters are underestimated, leading to what is known as PLS-SEM bias. This issue tends to diminish as the sample size and the number of indicators per concept increase, a phenomenon referred to as consistency at large. However, because the characteristic is based on specific assumptions about the nature of the data that may or may not be valid, recent research on Partial Least Square Structural Equation Modeling (PLS-SEM) suggests that researchers should avoid using the term "PLS-SEM bias" (e.g., Rigdon 2016). Specifically, estimations in Partial Least Square Structural Equation Modeling (PLS-SEM) are unbiased and consistent when the data is derived from a composite model population where linear combinations of the indicators define the data's nature (Sarstedt et al. 2016b). It has been shown in studies that when estimating data from common factor model populations, Partial Least Squares Structural Equation Modelling (PLS-SEM) tends to produce lower absolute bias compared to common factor-based SEM, particularly when estimating data from composite model populations (Reinartz et al., 2009; Sarstedt et al., 2016b). Also PLS-SEM uses composites, impacting both the method's measurement philosophy and its application. Unlike factor-based SEM, PLS-SEM always produces a specific score for each case per construct once the weights are derived. This sets it apart as factor-based SEM produces indeterminate construct scores, which can affect result validity (Rigdon et al. 2019). Using these specific scores as input, PLS-SEM applies a series of ordinary least squares regressions to estimate model parameters, maximizing the explained variance of endogenous constructs ( values). While this process enhances explanatory power, PLS-SEM is well-suited for prediction due to the computation of specific construct scores, allowing model parameters from a training sample to be applied to generate testable predictions for other observations (hold-out cases) not used in the model estimation(Hwang et al 2020). Several studies have shown evidence of PLS-SEM's effectiveness for prediction (Becker et al 2013a; Evermann and Tate 2016; Cho et al 2021). Through PLS-SEM, researchers gain an understanding of causal relationships based on theory and logic (explanation) as well as the model's predictive power, essential for establishing its practical relevance.
This study aimed to determine the use of latent variable scores and factor based SEM results of common factor models by using PLS (e.g when the model and/or data do not meet the requirements of factor-babed SEM).
Methodology
For this study, data was obtained from a primary source. A total of 392 questionnaires were distributed among selected individuals from Wuse, Deidei, Kado, and Garki Markets in Abuja, FCT. The aim was to gather information from business owners, as they are considered major stakeholders in business growth and standards in the state. The questionnaire focused on gathering information about the Factors Affecting Business Growth in Abuja Markets. It consisted of two sections: section A gathered information on the demographic profiles of respondents, while section B obtained information on factors influencing business growth, such as individual training in business, functional skills, experience in the business area, adaptability to the environment, drive, ambition, family influence on business growth, business environment, financial institutions, and government influence.
Research Design
The following equation formally illustrates the relationship between a latent variable and its observed indicators:
Where: d is the observed indicator variable Y is the latent variable λ is the loading, which is a regression coefficient quantifying the strength of the relationship between d and p, and e represents the random measurement error
A measurement model with causal indicators can be formally described as follows:
Where indicates the contribution of (k = 1, ... , K) to p, and z is an error term associated with p.
In formative measurement models with composite indicators, the error term, which in causal indicator models represents “omitted causes,” is set to zero in formative measurement models with composite indicators. A measurement model with composite indicators takes the following form, where p is a linear combination of indicators (k= 1, ... , K), each weighted by an indicator weight (Bollen 2011):
According to Henseler (2017, p. 180), measurement models with composite indicators “are a prescription of how the ingredients should be arranged to form a new entity,” which he refers to as artifacts. That is, composite indicators define the construct’s empirical meaning.
Population and Sample Size
Yamane simplified formula for proportion was used to calculate the sample size, using 0.05 as level of precision.
Where n = sample size, N = Population size and = Level of precision
We utilize the Yamane simplified formula as it is frequently employed in survey research to determine the necessary sample size, particularly when dealing with a relatively large population.
Estimation of Model
A three-stage method from the family of (alternating) least squares algorithms is utilized for model estimation in Partial Least Square Structure Equation Modeling (PLS-SEM) (Mateos-Aparicio, 2011). The PLS-SEM algorithm, as described by Lohmöller (1989), is illustrated in the following step.
Initialization
Stage 1:Iterative estimation of weight and latent variable scores
Starting at step 1d, repeat steps 1a to 1d unit convergence is obtained.
1a Inner weight (here obtained by using the factor weighting
1b Inside approximation
1c Outer weights; solve for
1d Outside approximation
Stage 2:Estimation of outer weights, outer loadings, and path coefficients
Stage 3:Estimation of location parameters
The algorithm begins with an initialization stage, during which it establishes preliminary latent variable scores. To compute these scores, the algorithm typically uses unit weights (i.e., 1) for all indicators in the measurement models (Hair et al. 2017b). Stage 1 of the Partial Least Square Structural Equation Modelling (PLS-SEM) algorithm iteratively determines the inner weights and latent variable scores using a four-step procedure, consistent with the algorithm’s original presentation (Lohmöller1989). Inner weights refer to path coefficients, while outer weights and outer loadings refer to indicator weights and loadings in the measurement models. Step #1 uses the initial latent variable scores from the initialization of the algorithm to determine the inner weights between the adjacent latent variables (i.e., the dependent one) and (i.e., the independent one) in the structural model. The literature suggests three approaches to determining the inner weights ( Lohmöller 1989; Chin 1998; Tenenhaus et al. 2005). In the centroid scheme, the inner weight is set to +1 if the covariance between and is positive and -1 if this covariance is negative. If two latent variables are unconnected, the weight is set to 0. In the factor weighting scheme, the inner weight reflects the covariance between variables and is set to zero if the latent variables are not connected. On the other hand, the path weighting scheme considers the direction of the inner model relationships (Lohmöller 1989). Chin (1998, p. 309) explains that the path weighting scheme aims to create a component that can be ideally predicted and, at the same time, act as a good predictor for subsequent dependent variables. Consequently, the path weighting scheme tends to result in slightly higher values in the endogenous latent variables compared to the other schemes and is generally preferred. However, in most cases, the choice of the inner weighting scheme has minimal impact on the results (Noonan and Wold 1982; Lohmöller1989). In Step #2, the inner approximation involves calculating proxies for all latent variables by using the weighted sum of their adjacent latent variable scores. Then, in Step #3, new outer weights representing the strength of the relationship between each latent variable and its corresponding indicators are computed for all the indicators in the measurement models. The Partial Least Square Structure Equation Modelling (PLS-SEM) algorithm uses two different estimation modes for this purpose. When using Mode A (i.e., correlation weights), the outer weights are determined based on the bivariate correlation between each indicator and the construct. In contrast, Mode B (i.e., regression weights) computes indicator weights by regressing each construct on its associated indicators.
The estimation of reflectively specified constructs typically uses Mode A, while Partial Least Square Structure Equation Modeling (PLS-SEM) utilizes Mode B for formatively specified constructs. However, Becker et al. (2013a) demonstrated that this reflexive use of Mode A and Mode B is not always ideal. For instance, when constructs are specified formatively, Mode A estimation produces better out-of-sample prediction under specific conditions: when the model estimation involves more than 100 observations and when the endogenous construct’s R2 value is 0.30 or higher. The algorithm above provides the formal representation of these two modes with the respective symbols and steps. The PLS-SEM algorithm takes standardized data as input and always standardizes the generated latent variable scores in Step #2 and Step #4. The algorithm terminates when the weights obtained from Step #3 change marginally from one iteration to the next (typically 1×), or when the maximum number of iterations (typically 300) is reached (Henseler, 2010).
Stages 2 and 3 use the final latent variable scores from Stage 1 as input for a series of ordinary least squares regressions. These regressions produce the final outer loadings, outer weights, and path coefficients as well as related elements such as indirect and total effects, values of the endogenous latent variables, and the indicator and latent variable correlations (Lohmöller 1989).
Test of Reliability Using Cronbach’s Alpha
The Cronbach's alpha coefficient is a statistical tool used to assess the internal consistency or reliability of a set of test items or a scale. It measures the extent of interrelatedness among the items in the set. In other words, it indicates how well a measurement represents a concept consistently, and one way to measure this consistency is by examining its Cronbach's alpha value. Cronbach's alpha is computed by comparing the variance for all individual item scores to the correlation between the score for each item and the overall score for each observation (usually individual test takers or survey respondents).
Where refers to the number of number of scale items, Item i variance refers to the variability associated with that specific item and refers to the variance associated with the observed total score.
Cronbach's alpha measures the reliability of a test by considering the number of items in the test, the average covariance between pairs of items, and the variance of the total score. The resulting coefficient of reliability ranges from 0 to 1, providing an overall assessment of the measure's reliability. If all the scale items are completely independent (i.e., not correlated or sharing any covariance), then α = 0. On the other hand, if all items have high covariance’s, then α will approach 1 as the number of items in the scale approaches infinity.
Data Presentation and Result
Data Presentation
Table 2 : Dei-Dei: Group of Markets
|
NAMES OF MARKETS
|
NO. OF SHOPS
|
|
BUILDING MATERIALS MARKET
|
3,066
|
|
TOMATO MARKET
|
807
|
|
REGIONAL MARKET
|
8,668
|
|
TIMBER (CARPARK)
|
1,489
|
|
LORRY PARK/CORNER SHOP
|
671
|
|
BUFFER ZONE
|
66 (CANOPY RESTAURANTS)
12 (GP TANKS RENT SPACES)
|
|
PANTAKER
|
72
|
Table3 : Kado Fish Market
|
OPEN STALLS
|
86
|
|
LOCK UP SHOP
|
257
|
|
WAREHOUSE
|
26
|
|
TOTAL
|
369
|
Table 4: Wuse Market
|
FORMAL SHOPS
|
1,592
|
|
WET INFORMAL
|
500
|
|
DRY INFORMAL
|
197
|
|
HAIR DRESSER
|
232
|
|
SITOUT
|
6
|
|
TOTAL
|
2,527
|
Table 5: Garki Market
|
FORMAL SHOPS
|
1255
|
|
PLAZA
|
367
|
|
OLD INFORMAL
|
125
|
|
INFORMAL SHOPS
|
1625
|
|
TOTAL
|
3,723
|
Source: Abuja Market Management, 2024
Table6 : 4 Major Markets
|
Market
|
No. of Shops
|
Proportion
|
|
Dei-Dei Market
|
3,066
|
153
|
|
Kado Fish Market
|
369
|
19
|
|
Wuse Market
|
2,527
|
127
|
|
Garki Market
|
1,625
|
81
|
|
Total
|
7,587
|
380
|
Source: Abuja Market Management
Sample size
Yamane simplified formula for proportion tells us the sample size needed for a given population when conducting research, particularly when estimating the proportion of a certain characteristic or attribute within the population. So the sample size needed is 380 for the research work
Table 7: Demographic information of respondents
|
S/N
|
Items
|
Frequency
|
Percentage
|
|
Gender
|
Male
Female
Total
|
265
115
380
|
70
30
100
|
|
Years in business
|
1-5
6-10
11-15
16-20
21-25
26-30
31-Above
Total
|
52
91
105
75
22
15
20
380
|
14
24
27
20
6
4
5
100
|
|
Business Location
|
Garki market
Wuse Market
Deidei Market
Kado Fish Market
Total
|
81
127
153
19
380
|
21
34
40
5
100
|
|
Business Type
|
Supplier
Distributor
Wholesaler
Retailer
Total
|
94
19
92
175
380
|
25
5
24
46
100
|
Model 1
Effects of individual, family, business environment, financial institutions, and government on the business performance of individuals.
Table 8: Path Coefficients/Total Effects
| |
Business Environment
|
Family
|
Financial Institutions
|
Government
|
Individual
|
|
Business Environment
|
|
|
|
|
0.360
|
|
Family
|
|
|
|
|
0.317
|
|
Financial Institutions
|
|
|
|
|
0.166
|
|
Govenment
|
|
|
|
|
0.183
|
|
Individual
|
|
|
|
|
|
Indirect effects :The model does not contain indirect effects.
Table 9: Outer Loadings
| |
Business Environment
|
Family
|
Financial Institutions
|
Government
|
Individual
|
|
1a
|
|
|
|
|
0.444
|
|
1b
|
|
|
|
|
0.525
|
|
1c
|
|
|
|
|
0.702
|
|
1d
|
|
|
|
|
0.663
|
|
1e
|
|
|
|
|
0.767
|
|
2a
|
|
0.641
|
|
|
|
|
2b
|
|
0.881
|
|
|
|
|
2c
|
|
0.357
|
|
|
|
|
3a
|
|
|
0.841
|
|
|
|
3b
|
|
|
0.887
|
|
|
|
3c
|
|
|
0.459
|
|
|
|
3d
|
|
|
0.445
|
|
|
|
4a
|
0.484
|
|
|
|
|
|
4b
|
0.599
|
|
|
|
|
|
4c
|
0.512
|
|
|
|
|
|
4d
|
0.645
|
|
|
|
|
|
5a
|
|
|
|
0.059
|
|
|
5b
|
|
|
|
-0.146
|
|
|
5c
|
|
|
|
0.334
|
|
|
5d
|
|
|
|
-0.603
|
|
|
5e
|
|
|
|
0.851
|
|
Quality Criteria
Table 10: R Square
| |
R-square
|
R-square adjusted
|
|
Individual
|
0.465
|
0.459
|
46.5% represents the percentage of variance in the dependent variable explained by the independent variables in the model. It measures how well the independent variables predict or explain the variation in the dependent variable.
Table 11: f Square
| |
Business Environment
|
Family
|
Financial Institutions
|
Government
|
Individual
|
|
Business Environment
|
|
|
|
|
0.221
|
|
Family
|
|
|
|
|
0.150
|
|
Financial Institutions
|
|
|
|
|
0.039
|
|
Government
|
|
|
|
|
0.057
|
|
Individual
|
|
|
|
|
|
The effect size values indicated by f square of 0.02, 0.15, and 0.35 represent small, medium, and large effects (as per Cohen, 1988) of an exogenous latent variable, respectively. Effect size values below 0.02 indicate no effect. Table 11 illustrates that the business environment, family, financial institutions, and government all have an impact on the market.
Table 12: Construct Reliability and Validity
| |
Cronbach's alpha
|
Composite reliability (rho_a)
|
Composite reliability (rho_c)
|
Average variance extracted (AVE)
|
|
Business Environment
|
0.301
|
0.290
|
0.648
|
0.318
|
|
Family
|
0.433
|
0.557
|
0.677
|
0.438
|
|
Financial Institutions
|
0.641
|
0.804
|
0.767
|
0.475
|
|
Government
|
0.080
|
0.308
|
0.061
|
0.245
|
|
Individual
|
0.649
|
0.669
|
0.762
|
0.399
|
In Model 1, a path analysis was conducted using Smart PLS. The formative model comprises individuals as it is an endogenous variable, with no intervening variable present. Consequently, there is no indirect effect on variables, and the total effect (path coefficient) is equal to the direct effect. This model aims to elucidate the impact of family, business environment, financial institution, and government on the growth and performance of individual businesses in four major markets in Abuja. The numbers on the path relationships represent the standardized regression coefficients, while the numbers within the circles of the endogenous latent variables indicate their values. Upon assessment, it was found that the business environment exerts the strongest influence (0.360) on business growth and performance, followed by family influence (0.317), government influence (0.183), and financial institution influence (0.166), which has the least effect on individual business performance and growth. Collectively, these four constructs account for 46% of the variance in the endogenous construct, Individual. The reflective measurement model entails assessment of the individual, family, government, and financial institution with indicators 1, 2, 3, 4 and 5, known as outer loadings. These represent the absolute contribution of the indicator to the definition of the latent variable. Generally, higher loadings indicate a stronger and more reliable measurement model. The reflective measurement model meets the relevant assessment criteria as the indicators display a sufficient level of reliability (>0.50). However, some indicators exhibit weak reliability, indicating minimal contribution to the variable. For individuals, indicator 1a (0.444) shows the least contribution, followed by 1b (0.525), 1d (0.663), 1c (0.702), and 1e (0.767) with the highest effect rate. In the Business Environment, 4a (0.484) has the least effect, followed by 4c (0.512), 4b (0.599), and 4d (0.645) showing stronger effects. Indicators 3d (0.445) have a weaker effect on the financial institution, while 3c (0.459), 3a (0.841), and 3b (0.887) have the strongest effect on financial institutions. The reflective measurement model entails assessment of the individual, family, government, and financial institution with indicators 1, 2, 3, 4 and 5, known as outer loadings. These represent the absolute contribution of the indicator to the definition of the latent variable. Generally, higher loadings indicate a stronger and more reliable measurement model. The reflective measurement model meets the relevant assessment criteria as the indicators display a sufficient level of reliability (>0.50). However, some indicators exhibit weak reliability, indicating minimal contribution to the variable. For individuals, indicator 1a (0.444) shows the least contribution, followed by 1b (0.525), 1d (0.663), 1c (0.702), and 1e (0.767) with the highest effect rate. In the Business Environment, 4a (0.484) has the least effect, followed by 4c (0.512), 4b (0.599), and 4d (0.645) showing stronger effects. Indicators 3d (0.445) have a weaker effect on the financial institution, while 3c (0.459), 3a (0.841), and 3b (0.887) have the strongest effect on financial institutions.
Model 2
The relationship between demographic factors of respondents and individual perspectives on factors influencing business performance
Path Coefficients/Total Effeccts
| |
Business Location
|
Business type
|
Gender
|
Latent Variable
|
Years in business
|
|
Business Location
|
|
|
|
-0.444
|
|
|
Business type
|
|
|
|
-0.041
|
|
|
Gender
|
|
|
|
0.181
|
|
|
Latent Variable
|
|
|
|
|
|
|
Years in business
|
|
|
|
-0.047
|
|
Indirect Effects: Model does not contain Indirect Effects.
Outer Loadings
| |
Business Location
|
Business type
|
Gender
|
Latent Variable
|
Years in business
|
|
1
|
|
|
1.000
|
|
|
|
1a
|
|
|
|
-0.102
|
|
|
1b
|
|
|
|
-0.070
|
|
|
1c
|
|
|
|
0.364
|
|
|
1d
|
|
|
|
-0.028
|
|
|
1e
|
|
|
|
0.413
|
|
|
2
|
|
|
|
|
1.000
|
|
2a
|
|
|
|
-0.446
|
|
|
2b
|
|
|
|
0.011
|
|
|
2c
|
|
|
|
-0.576
|
|
|
3
|
1.000
|
|
|
|
|
|
3a
|
|
|
|
0.031
|
|
|
3b
|
|
|
|
0.467
|
|
|
3c
|
|
|
|
-0.009
|
|
|
3d
|
|
|
|
0.535
|
|
|
4
|
|
1.000
|
|
|
|
|
4a
|
|
|
|
-0.152
|
|
|
4b
|
|
|
|
0.701
|
|
|
4c
|
|
|
|
0.449
|
|
|
4d
|
|
|
|
0.145
|
|
|
5a
|
|
|
|
0.024
|
|
|
5b
|
|
|
|
0.434
|
|
|
5c
|
|
|
|
0.336
|
|
|
5d
|
|
|
|
-0.196
|
|
|
5e
|
|
|
|
0.104
|
|
Quality Criteria
R Square
| |
R-square
|
R-square adjusted
|
|
Latent Variable
|
0.202
|
0.194
|
f Square
| |
Business Location
|
Business type
|
Gender
|
Latent Variable
|
Years in business
|
|
Business Location
|
|
|
|
0.210
|
|
|
Business type
|
|
|
|
0.002
|
|
|
Gender
|
|
|
|
0.008
|
|
|
Latent Variable
|
|
|
|
|
|
|
Years in business
|
|
|
|
0.002
|
|
Construct Reliability
| |
Cronbach's alpha
|
Composite reliability (rho_a)
|
Composite reliability (rho_c)
|
Average variance extracted (AVE)
|
|
Latent Variable
|
0.673
|
0.643
|
0.242
|
0.116
|
In Model 2, a path analysis is displayed using SmartPLS. The formative model (Latent Variable 1) includes all the different factors that affect business performance and the growth of individuals, as it is the endogenous variable. There are no intervening variables in this model, so there are no indirect effects on the variables. This means that the total effect (path coefficient) and the direct effect are the same. The purpose of this model is to explain how demographic factors impact the way business owners respond to business growth and performance.
The numbers along the paths represent the standardized regression coefficients, while the numbers within the circles of the endogenous latent variables represent their values. An evaluation reveals that Business location (-0.444) has the least negative impact on Latent Variable 1, followed by years in business (-0.047), Business type (-0.041), and Gender (0.181), which has the highest effect among business owners. These four factors collectively account for 20.2% of the variance in the endogenous construct Latent Variable.
An evaluation of the reflective measurement model (i.e., Latent Variable1) with indicators 1a, 1b, 1c, 1d, 1e, 2a, 2b, 2c, 3a, 3b, 3c, 3d, 4a, 4b, 4c, 4d, 5a, 5b, 5c, 5d, 5e, also referred to as outer loadings, represents the absolute contribution of the indicator to the definition of the latent variable. Generally, higher loadings indicate a stronger and more reliable measurement model. We observed that the reflective measurement model does not meet the relevant assessment criteria, as the outer loadings are below 0.70, except for 4b (0.701). This suggests that the indicators do not exhibit a sufficient level of reliability (i.e., <0.50). Some indicators do show a sufficient level of reliability, indicating their impact on the variable. Latent Variable1 is least affected by 2c (0.576), followed by 5d (0.196), 3d (0.535), and 4b (0.701), which has the highest effect on Latent Variable1. Beyond the reliabilities, the model is good and accurate. These results suggest that the construct measures the demographic factors Gender, Years in business, Business Location, and Business type, but do not exhibit sufficient levels of internal consistency reliability.