Navigation and service

Data Analytics

Data Analytics

We address techniques and methods for analysing Big Data sets, focusing on scalable machine-learning and data mining techniques, query processing approaches, and data-privacy methods. These include the classification, clustering, regression, subspace search, outlier detection, and the development of methods for identifying and describing clusters and outliers in Big Data sets and automating the process. Research elements include the parallelization of existing and the optimization of already parallelized approaches, eventually replacing the traditional sampling using complete big data sets and balancing between the benefits and risks of noisy data.

These techniques, methods, and approaches are targeted at three essential characteristics of Big Data: Volume, Velocity, and Variety. Additionally, the methods are important in the context of large-scale data, as they reduce the amount of data to be analysed in more detail subsequently by identifying relevant subsets and by cleaning the data of anomalies or entropy. Query processing includes developing and assessing the uncertainty of selectivity-estimation techniques. Supervised and unsupervised learning models are used in order to explain, to derive knowledge, or to create predictive models from datasets.

One objective in Analytics is the development of improved and more efficient correlation analysis data mining methods. In particular, scalable and intelligent correlation analysis on Big Data is envisaged to reduce its volume and to make it accessible to subsequent analysis such as clustering or outlier detection.

Another objective will be privacy-aware Big Data analytics. Here, the main goal is to carry out data analytics tasks on user related data while guaranteeing privacy.

Research results obtained together with various user communities from various scientific disciplines are contributed to the Research Data Alliance (RDA) Big Data Analytics (BDA) Interest Group.