What is high-dimensional data? How to define high-dimensional data

The concept of high-dimensional data is not difficult. Simply speaking, it means multidimensional data. Usually we often touch one-dimensional data or two-dimensional data that can be written in the form of a table. High-dimensional data can also be analogized. However, when the dimension is high, the visual representation is difficult.
At present, high-dimensional data mining is the focus of research.
This is its characteristic:
High-dimensional data mining is based on a high-dimensional data mining. The main difference between it and traditional data mining lies in its high dimension. At present, high-dimensional data mining has become the focus and difficulty of data mining. As technology advances, data collection becomes easier, resulting in larger and more complex databases, such as various types of trade transaction data, Web documents, gene expression data, document word frequency data, User rating data, WEB usage data, and multimedia data, etc., their dimensions (attributes) can usually reach hundreds of dimensions or even higher.
Due to the universality of high-dimensional data, the research on high-dimensional data mining has very important significance. However, due to the impact of â€œdimensional disastersâ€, high-dimensional data mining has become extremely difficult, and some special methods must be used for processing. As the data dimension increases, the performance of high-dimensional index structures declines rapidly. In low-dimensional space, we often use Euclidean distance as the measure of similarity between data, but in many cases in high-dimensional space, this similarity The concept of sex no longer exists, which brings a severe test to high-dimensional data mining. On the one hand, the performance of data mining algorithms based on index structure is degraded. On the other hand, many mining methods based on full-space distance functions will also Invalid. The solution can be as follows: the data can be reduced from high-dimensional to low-dimensional by dimensionality reduction, and then processed by low-dimensional data processing; the algorithm efficiency can be reduced by designing a more efficient index structure. Incremental algorithms and parallel algorithms are used to improve the performance of the algorithm; the problem of failure is re-defined to make it new.
High-dimensional data mining is based on a high-dimensional data mining. The main difference between it and traditional data mining lies in its high dimension. At present, high-dimensional data mining has become the focus and difficulty of data mining. As technology advances, data collection becomes easier, resulting in larger and more complex databases, such as various types of trade transaction data, Web documents, gene expression data, document word frequency data, User rating data, WEB usage data, and multimedia data, etc., their dimensions (attributes) can usually reach hundreds of dimensions or even higher.
Due to the universality of high-dimensional data, the research on high-dimensional data mining has very important significance. However, due to the impact of â€œdimensional disastersâ€, high-dimensional data mining has become extremely difficult, and some special methods must be used for processing. As the data dimension increases, the performance of high-dimensional index structures declines rapidly. In low-dimensional space, we often use Euclidean distance as the measure of similarity between data, but in many cases in high-dimensional space, this similarity The concept of sex no longer exists, which brings a severe test to high-dimensional data mining. On the one hand, the performance of data mining algorithms based on index structure is degraded. On the other hand, many mining methods based on full-space distance functions will also Invalid. The solution can be as follows: the data can be reduced from high-dimensional to low-dimensional by dimensionality reduction, and then processed by low-dimensional data processing; the algorithm efficiency can be reduced by designing a more efficient index structure. Incremental algorithms and parallel algorithms are used to improve the performance of the algorithm; the problem of failure is re-defined to make it new.
High dimensional data processing
PCA
Unsupervised
Using the covariance matrix to find the projection function Ï‰ so that the maximum dispersion (variance) after projecting into a low-dimensional space uses the Lagrangian solution inequality
Selecting feature vectors based on the obtained eigenvalues
General feature vector set with information rate above 90%
For data where N is much larger than D, use SVD (singular value) to solve
First perform a self-property reduction and then train
LDA
Supervisory
Seeking a projection space that minimizes intraclass variance and maximizes interclass differences
SOM
Clustering method
- Take differences to update neighbors in the surrounding range
MDS
Unsupervised dimensionality reduction
Focus on the relative distance (relationship) of data, which is conducive to dimensionality reduction and visualization of streaming data.
But the overall structure of the original data is seriously damaged.
Three basic steps:
Calculating stress
Update projection function
Check disparity
ReliefF
ReliefF handles multiple classifications, and Relief can only handle two classifications.
Used to weight features and filter them by weights
Algorithm input: data set D, including class c samples, subset sample number m, weight threshold Î´, kNN coefficient k algorithm steps:
LLE and ISOMAP
Some summary
The basic idea of â€‹â€‹high latitude data modeling is to find the function f(x):
f(x) projects data into a low-dimensional space
Certain features of the data in a low dimensional space can be maintained
Method selection:
Focus on reducing dimensions and improving data analyzability using PCA, using SVD for large amounts of data
Focus on inter-class and intra-class distinctions, use LDA
Focus on the interrelationship of data, and the data is inseparable, then use MDS
For manifolds, use LLE and IOSMAP

PVC Trunking
Pvc Trunking,Pvc Cable Trunking,Plastic Cable Trunking,Pvc Electrical Trunking
FOSHAN SHUNDE LANGLI HARDWARE ELECTRICAL CO.LTD , https://www.langliplastic.com