a carregar...

FCTUC

Eventos na FCTUC

IEEE BigData 2014 Coimbra Satellite Session

Publication date: 08-05-2014 09:45

bd_logo

Local: Anf. B1 no DEI no

Dia 16 de Maio (sexta-feira)

Hora: 09h00 às 18h30

O propósito do encontro é a discussão de temas Bigdata. Este pretende ser aberto à comunidade e deverá contar com participações de investigadores e alunos.

IEEE BigData 2014 Coimbra Satellite Session, an event in conjunction with The IEEE 2014 3rd International Congress on Big Data

Theme: Big Data, Cloud Data and Applications

URL: http://www.ieeebigdata.org/2014/satellite/coimbra/index.html

Versão preliminar do programa:

http://www.ieeebigdata.org/2014/satellite/coimbra/program.html

IEEE BigData 2014 - Coimbra Satellite Session - Keynote/Invited Talks

Paulo Marques

FeedZai

Using Machine Learning and Big-Data to Fight Payment Fraud

Payment Fraud and data breaches are on a rise. Every year millions of credit-card numbers are compromised. In the recent Target breach between 70 and 110 million people say their information stolen and purchases being made in their name. In this talk we will discuss how at Feedzai we leverage Machine Learning and Big-Data Techniques to make commerce safe. In particular, we'll discuss fraud trends, challenges with dealing with high-volume high-velocity skewed data, and how machine learning and big-data provide a strong foundation for fighting fraudsters.

------------------------------------------------------------------------------------------

Orlando Belo

“Small and Big Data” – Opposite Vectors on Dimensional Modelling

Department of Informatics

ALGORITMI R&D Centre

School of Engineering

University of Minho

The implementation of a data warehousing system depends largely from the way it is designed following business requirements and decision-makers’ needs, accompanied by a huge host of hellish conditions and operational aspects. However, if we consider the usefulness and the utility of the system there is a very important aspect that we should take strongly into consideration, not only because it represents the materialization of business and analytical exploration requirements, but also the performance of the system in meeting users’ ad hoc operating procedures. We are talking, of course, to the way we do dimensional modelling. It provides an effective way to develop successful multidimensional data schemes, having the ability to host all analysis dimensions as well as the facts that sustain sophisticated temporal analysis over them. Stars, snowflakes and constellations are common terms used to represent all types of settings that dimensional schemas can assume as configurations. By default, dimensional schemes are designed to accommodate large volumes of data. Usually they are arranged to support and satisfy any query (almost) immediately, even when it is very complex and demanding such as a star-join. With the advent of " big data " - intensive data processing -, today the volume of information managed by a data warehouse seems to be positioned in other kind of dimension: “smaller”. However, the influence of application scenarios, labelled as big data, is not very clearly, and we do not know what is its real impact over data warehouses components. Thus, in this talk we are interest to identify such influence in a more concrete way, taking a typical big data scenario, and discussing some of the aspects more relevant in the development of a data warehouse system, giving particular emphasis, obviously, to the design and implementation of dimensional data models.

-----------------------------------------------------------------------------------------

Alfredo Cuzzocrea,

 ICAR-CNR & University of Calabria, Rende, Cosenza, Italy.

Data Warehousing and OLAP over Big Data

Data Warehousing and OLAP over Big Data is becoming one of the emergent challenges for next-generation research, with special emphasis on data-intensive Cloud infrastructures. As a consequence, several studies are focusing the attention on this relevant issue, and various open problems arise. This evidence has inspired our study, which provide a comprehensive overview on actual open research problems in the context of Data Warehousing and OLAP over Big Data, along with a deep critical discussion on future research directions to be taken under this so-challenging road.

----------------------------------------------------------

Pedro Martins

University of Coimbra

Automatically scaling the ETL process for freshness-preserving in high-rate data warehousing

We investigate how to deal with the problem of providing scalability and data freshness automatically, and how to deal with high-rate data efficiently in very large data warehouses. In general, data freshness is not guaranteed in those contexts, since data loading, transformation and integration are heavy tasks that are done only periodically, instead of row by row.

Many current data warehouse deployments are designed to work as single server, although for many applications problems related with data volume processing times, data rates, and requirements for fresh and fast responses, increasingly make this approach less useful. The solution is to use/build parallel architectures and mechanisms to speed-up data integration and handle fresh data efficiently.

Desirably users developing data warehouses should need to concentrate solely on the conceptual and logic design (e.g. business driven requirements, logical warehouse schemes, workloads analysis and ETL process), while physical details including mechanisms for scalability, freshness and integration of high-rate data should be left to automated tools.

We propose a universal data warehouse parallelization solution, that is, an approach that enables the automatic scalability and freshness of any