The differences between Principal component analysis and Factor analysis

Photo of author

By admin

PCA and factor analysis are both statistical methods deployed while working with variables. The multivariate scenarios are made simple using these methods. The implication method can be dimension reduction or a better, more easy representation of data. Despite being closely related, these two methods stand out in terms of application. This article will try to dive deep into the technicalities of these two methods and present a comparison of PCA vs FA



  • What is PCA?


PCA stands for principal component analysis. PCA is best deployed in the case of high dimensional problems. Due to our limitation of thinking and imagining in three major dimensions, PCA is a helpful method to reduce the dimensionality in our favour. It is better to understand the method with help of a simple example. 

For example, a potential buyer is trying to evaluate an apartment. The most important aspects of apartments are the space, location and quality. In this scenario, the buyer is availing of options in a similar price range, this quality does not differ drastically from one apartment to another. This data from N number of apartments can be plotted in a three-dimensional graph. If PCA is applied for easier representation, it will most likely ignore the dimension of quality due to its ineffectiveness in the decision-making process. 



  • Why use PCA?




  • Reduction of dimensions 


In the context of data analytics, PCA is used to make things more simple and easy to handle. As we saw in the example, the third dimension was avoided by PCA due to its irrelevance. This ignoring of dimensions makes the data more succinct simple and easy to store. 



  • Easy representation of data 


Reduction of dimensions means a more favourable representation of data. In the case of PCA, a three-dimensional graph is likely to transform into a two-dimensional representation. This reduction of unnecessary parameters truly affects the outcome in a positively predisposed manner towards simplicity. 



  • Feature extraction 


Related and dependent6 features like price and quality in the case of apartments can be grouped together due to their known correlations. This approach of deriving one variable from the other is known as feature extraction. Basically another method of rendering the data simple and easy to understand. 



  • Factor analysis 


Factor analysis is also a simplification method but it concentrates on the correlations between variables. If we go back to the apartment example, we can understand it better. For instance, we can choose the parameter of location. The location variable is not a single and simple one it might include the variables like distance from the hospital, availability of transport. Basic service quality of electricity and water and of course pollution. These contributory variables are made up of three components. The first is a shared variable, which forms the main group of variables with other variables. The other components are error and a unique factor. Only the shared variable comes into play in the case of factor analysis. Due to the hidden nature, these variables are called latent variables.



  • Why factor analysis


Factor analysis reduced the number of variables and groups them under one variable. This simplification makes things easy while making a decision. Factor analysis is used for exploiting the shared variables and grouping them while ignoring the errors and unique factors. This method is also used for the simplification of data and for decision making ease. 


The purpose: PCA vs. FA

  • In the case of PCA, the outcome is cumulative thus all the variables can be explained from the outcome. And as the variables in the outcome are actively contributing all of them can be explained through the outcome or inference. The concentration of a PCA is on the variance as the outcome is telling regarding the variables. 
  • However, In the case of FA, the common variables of the latent variables can contribute to the outcome, thus they are the only explainable variables from the outcome. The outcome of a FA does not reveal the states of errors and unique factors. 

The outcome of a FA is shared variance, thus FA concentrates on the correlation between variables.