Budapest Data 2015: Full Schedule

17:00 CEST

Data Job Fair

The free Data Job Fair brings together the people from all kinds of data-related jobs and the compaines looking for them.

Data Cinema
As a warm-up exercise we will show short movies on the world of data and data analysis. The program will consist of – among others – several spectacular data visualization clips and TED lectures. The final selection will be defined by votes of the audience.

Data Careers talks
There are many different paths leading to the world of data. We will invite several data professionals from different companies, who will tell us how they found their own way.

HR Pitch Competition
We are running a HR pitch competition for the exhibitors. 5 out of these companies will have the opportunity to present their companies and try to persuade audience about why they are the best place to work for. The best pitch title will be decided by the audience.

Online Data Job Board
The open positions will be available online as well, so everyone who cannot attend the event has the chance to check out them later.

Detailed program, list of exhibitors and speakers available here: datajobfair.hu

This event is FREE.

Tuesday June 2, 2015 17:00 - 20:00 CEST
Hotel

Plenary

19:00 CEST

Big Data Meetup

Tuesday June 2, 2015 19:00 - 21:00 CEST
Mátyás I-II.

Plenary

08:00 CEST

Registration / Regisztráció

Wednesday June 3, 2015 08:00 - 09:00 CEST
Hotel

Break

09:00 CEST

Welcome Talk

Welcome notes by the conference host

Wednesday June 3, 2015 09:00 - 09:15 CEST
Mátyás I-II.

Plenary

09:15 CEST

Innovations in Big Data

This talk gives an overview of the Big Data innovation landscape in Hungary using several examples of successful Hungarian companies working in this field.

Speakers

Bence Arató

Managing Director, BI Consulting

Managing Director of BI Consulting Hungary. He has been in the BI industry since 1995 as an analyst, architect and consultant. He advises companies on general BI strategy, project and architecture planning, and vendor and tool selection. Also provides QA and on-the-job mentoring services... Read More →

Wednesday June 3, 2015 09:15 - 09:45 CEST
Mátyás I-II.

Plenary

09:45 CEST

The Future of Our Past, or How to Build a Time Machine

We live our lives through digital services and connected devices and generate data at an unprecedented volume. The footsteps we leave behind in the digital snow of our lives define fundamentally who we are. The digital data traces of our existence form the photo album of a lifetime. For both consumers and creators of digital products all this comes with far reaching implications regarding ownership, privacy and longevity – with great data comes great responsibility. This talk will give consumers some principles by which to make an informed decision whom they entrust their digital memories to and offer practical advise to digital product makers about how to address these issues responsibly.

Speakers

Pascal Raabe

Designer, Ustwo

Paz is a designer at the digital product studio Ustwo in London. He has a passion for emerging technology and an ambition to help shape the future through truly meaningful design across platforms and disciplinary boundaries. This has led to a fascination with the pervasiveness of... Read More →

Wednesday June 3, 2015 09:45 - 10:30 CEST
Mátyás I-II.

Plenary

10:30 CEST

Break / Szünet

Wednesday June 3, 2015 10:30 - 11:00 CEST
Mátyás I-II.

Break

11:00 CEST

Big Mistakes to Avoid When Performing Big Data Analytics

Mark Twain, a famous American author, once stated that there are “Lies, Damned Lies, and Statistics.” This phrase is used to describe the persuasive power of numbers, particularly the use of statistics, to lead people to draw incorrect conclusions. This workshop describes the subtle mistakes that can easily be made when interpreting the results from an analytic study or report. We describe logically sound processes for deciphering data using methods designed to illuminate actionable information for data scientists without distracting or misleading the knowledge worker from the relevant facts needed for effective decision-making.

Learn about common mistakes in interpreting data that lead to incorrect conclusions.

Learn best practice for avoiding errors when conducting analytics with Big Data.

Learn about the skill sets that data scientists must possess to avoid big mistakes when performing big data analytics.

Speakers

Stephen Brobst

Chief Technology Officer, Teradata

Stephen performed his graduate work in Computer Science at the Massachusetts Institute of Technology where his Masters and PhD research focused on high-performance parallel processing. He also completed an MBA with joint course and thesis work at the Harvard Business School and the... Read More →

Wednesday June 3, 2015 11:00 - 12:00 CEST
Mátyás I-II.

Plenary

12:00 CEST

Lunch / Ebéd

Wednesday June 3, 2015 12:00 - 13:30 CEST
Hotel

Break

13:30 CEST

"Houston! Baj van?" avagy Social Media Command Centre Big Data alapokon

A Big Data trendek között évek óta olvashatunk a social miningról, azaz a közösségi media csatornákon keletkező adatok elemzéséről, monitorozásáról. Az előadás egy olyan vállalaton belüli funkciót mutat be, amely egyfelől a social mining eszközkészletét felhasználva egy repülésirányító központhoz hasonlóan a digitális világ ütőerén tartja az ujját, illetve különböző, közösségi médiára épülő akciókat is kezdeményezni tud.

Speakers

Ponori-Thewrewk Ajtony

Projektvezető, T-Systems Magyarország

2007 óta tagja a T-Systems – és elődcégei – BI csapatának. Fejlesztőként, majd 2008-től több adattárház projektben fejlesztési vezetőként, vezető architektként, illetve projektvezetőként tett szert tapasztalatokra az adattárház fejlesztés, kiaknázás... Read More →

Wednesday June 3, 2015 13:30 - 14:00 CEST
Mátyás II.

Big Data

13:30 CEST

Heisenberg and the uncertainty laws of BI

Heisenberg's uncertainty principle is any of a variety of mathematical inequalities asserting a fundamental limit to the precision with which certain pairs of physical properties of a particle known as complementary variables, such as position x and momentum p, can be known simultaneously.

In simple words, there are situations when achieving two seemingly independent goals becames impossible (or can only be achieved to a certain limit).
In this presentation I will investigate a few classic BI problems sharing similar nature. There are two approaches in these cases or dual goals. Those goals cannot be fulfilled without a trade-off. These problems manifest in most of the BI/DWH/data management systems.
I will highlight the reasons behind these classic dilemmas. We will check industry best practices and Teradata specific answers to handle these situations.

The data modelling paradoxon.
A data model can't be flexible AND user friendly/simple at the same time.

The Business' BI vs IT's BI paradoxon
Business can't wait for IT development cycles. IT doesn't support non-standardized, hard to operate solutions.

The Classic DWH vs Big Data paradoxon
Websites, meters, mobile apps, etc. generate more and ever-changing data that can't be handled with classic BI toolset. Fail fast is important. New emerging technologies and packs of fresh data scientists promise to solve the quest, but can they do the same with good-old DWH/BI?

Speakers

Vágó Zoltán

Senior DWH Consultant, Teradata Hungary

Zoltán is a BI & DWH expert. In his previous positions he acted as the head of the BI team at Vodafone, participated in the mobile / landline data warehouse consolidation at Magyar Telekom. Recently he has been working for Teradata Hungary.

Wednesday June 3, 2015 13:30 - 14:00 CEST
Mátyás I

Data Warehousing

14:05 CEST

Machine learning on Big Data – big benefits or wasting resources?

In the era of Big Data, it is easy to get carried away by the hype and statements like one should collect and use all the data that is available. For data collection, with the falling prices of data storage,this seems to be a completely valid statement. However, different types of analytics methods have different scale economies, and they do not necessarily benefit from running on billions of records.Machine learning has always been considered the holy grail in Big Data analytics. New platforms like Apache Spark are finally enabling data scientists all around the world to solve large-scale machine learning problems. Is it a power tool that we should use for all problems? Are there any drawbacks? This session shows some eye opening results from Apache Spark experiments that challenge our intuition on machine learning.The session includes a short introduction of all subjects covered buta basic understanding of machine learning is certainly helpful for the audience.

Speakers

Prekopcsák Zoltán

VP Big Data, RapidMiner

Zoltan Prekopcsak is the Vice President of Big Data at RapidMiner, the leader in Modern Analytics. Previously, he was co-founder and CEO of Radoop, before its acquisition by RapidMiner. Prekopcsak has experience in data-driven projects in various industries including telecommunications... Read More →

Wednesday June 3, 2015 14:05 - 14:35 CEST
Mátyás II.

Big Data

14:05 CEST

What makes a good ETL system

There is no BI without data warehouses and there are no data warehouse without an ETL system. ETL processes are crucial in the life of data-driven companies. There are several ETL tools available, both open source and commercial softwares, although none of them are widely adopted, there is no standard tool targeting this problem. In my talk I will point out the characteristics of good ETL frameworks, compare the existing ones and outline their best usecases.

Speakers

Göbölös-Szabó Julianna

Data Infrastructure Engineer, Prezi

Prior to joining Prezi Julianna studied mathematics, after that she was researching big networks from data mining aspects. In Prezi she is responsible for the stability of the data infrastructure which includes developing and operating Prezi's own ETL framework that runs thousand... Read More →

Wednesday June 3, 2015 14:05 - 14:35 CEST
Mátyás I

Data Warehousing

14:40 CEST

Agilis Big Data adatkezelés SAS Data Loader for Hadoop megoldással

A technológiai újdonságok és az agilis módszertan hatása az információ menedzsment alap szintjeire egyre erőteljesebb. De mit jelent ez az adatkezelés tekintetében? Az előadás során bemutatásra kerül a SAS jövőképe az önkiszolgáló adatkezelés kapcsán.

Speakers

Szász Viktor

Business Analitics Presales Consultant, SAS

Szász Viktor 10 éve foglalkozik üzleti analitikai megoldások bevezetésével és szakmai tanácsadással. Pályafutását szoftver konzulensként kezdte a SAS Institute Kft konzultációs csapatában, majd később projektvezetőként projekt menedzsment és tanácsadói... Read More →

Wednesday June 3, 2015 14:40 - 15:10 CEST
Mátyás II.

Big Data

14:40 CEST

Adattárházak 2015-ben, kiterjesztés és gyorsulás: Big Data és a relációs világ, In-Memory, Exadata

Az előadás bemutatja, hogy az olyan korszerű technológiák, mint az alkalmazásoknak transzparens oszlopos memóriacentrikus adatkezelés és más vívmányok hogyan tehetik még nagyobb teljesítményűvé az adatkezelési architektúrát, illetve a „hagyományos" relációs és és Big Data jellegű adatok integrációja hogyan vethető be adattárházas környezetekben, továbbá az Exadata technológia milyen jelent és jövéképet ad az adattárházaknak is.

Speakers

Fekete Zoltán

principal pre-sales consultant, Oracle

Fekete Zoltán az Oracle termékek közül az 1996-tól az Oracle Express multidimenziós technológiával kezdett el foglalkozni. 1998 óta dolgozik az Oracle-ben a presales területen. Az üzleti intelligencia és adattárház területen elemző és tervező eszközökkel, jelentéskészítéssel... Read More →

Wednesday June 3, 2015 14:40 - 15:10 CEST
Mátyás I

Data Warehousing

15:10 CEST

Book signing - Hadoop: The Definitive Guide

Book signing by Tom White - Hadoop: The Definitive Guide
Conference attendees can get a free, signed copy of the book. Only a limited number of copies are available!

Speakers

Tom White

Cloudera

Tom White is one of the foremost experts on Hadoop. He has been an Apache Hadoop committer since February 2007, and is a Member of the Apache Software Foundation. Tom is a software engineer at Cloudera, where he has worked, since its foundation, on the core distributions from Apache... Read More →

Wednesday June 3, 2015 15:10 - 15:40 CEST
Hotel

Break

15:10 CEST

Break / Szünet

Break

Wednesday June 3, 2015 15:10 - 15:40 CEST
Mátyás I

Break

15:40 CEST

Being a Data Janitor for 10m+ Users - Tips and Tools from the Trenches

The evolution of the data setup at 6Wunderkinder, makers of the Wunderlist - a case study covering the architecture, the suite and the processes we use every day.

Speakers

Molnár Dániel

Data Scientist, 6Wunderkinder GmbH

Data scientist at 6Wunderkinder in Berlin, Germany working on Wunderlist. Generalist in a tight-knit data team enabling data driven company culture and operations as a data janitor, data analyst and occasional data scientist. Doing ETL and Data Quality, defining company-wide KPIs... Read More →

Wednesday June 3, 2015 15:40 - 16:10 CEST
Mátyás II.

Big Data

15:40 CEST

Apache Spark – The modern data analytics platform

One of the fastest developing tool in the Hadoop world is Apache Spark, so it is not a surprise at all that this fast, data analytical, batch and streaming processing system with high fault tolerance has become a popular choice among the data scientists and data engineers. Its fault tolerance and scalability covers all aspects of data analysis starting from small sized databases to massive petabytes of data.
In this talk the speaker introduces the basic functionality of Apache Spark through a use case, to help users who has no experience wit this tool yet to use it easily in data analytical solution implementations. In the second part of the the talk the speaker will demonstrate to the audience how to run our Apache Spark program in cloud environment on large databases

Speakers

Gulyás Máté

CTO, enbrite.ly

Wednesday June 3, 2015 15:40 - 16:10 CEST
Mátyás I

Spark

16:15 CEST

Applying Hadoop to Scientific Computing

Some of the largest datasets in the world are generated from scientific experiments. For example, around 20 petabytes of genomic data was created in 2014, and many times that figure will be created this year. Big data tools like Hadoop are increasingly important for making sense of such datasets. In this talk, I look at some examples where Hadoop has been used for processing large scientific datasets, and what technologies and tools from the Hadoop ecosystem are being used.
The tools are still young, and there are many improvements needed to make Hadoop and related technologies more accessible to the practicing scientist. I will look at some of the trends that we are seeing in interactive scientific notebooks, cloud, and machine learning and how their convergence is good for users.

Speakers

Tom White

Cloudera

Wednesday June 3, 2015 16:15 - 16:45 CEST
Mátyás II.

Big Data

16:15 CEST

Hive powered by Spark

Apache Hive has become de facto standard SQL on big data in Hadoop ecosystem. It is used extensively in data warehousing and data analytics with big data. Not long ago, Hive queries could only run on MapReduce and Tez. As Apache Spark become mature as an open-source data analytics cluster computing framework, it's also introduced to Apache Hive as a new, powerful execution engine. The obvious benefit is making Hive available to Spark users and providing a better performance and response time for existing Hive users. This presentation will talk about the motivation, design principles, architecture, etc. followed by a demo.

Speakers

Xuefu Zhang

Software Engineer, Cloudera

Xuefu Zhang has over 10 year’s experience in software development. Working for Cloudera since May 2013, he spends a lot of his efforts on Apache Hive and Pig. He also worked in the Hadoop team at Yahoo when the majority of the development on Hadoop was still there. Xuefu Zhang is... Read More →

Wednesday June 3, 2015 16:15 - 16:45 CEST
Mátyás I

Spark

16:50 CEST

Big fast data in high-energy particle physics

Experiments at CERN (the European Organization for Nuclear Research) generate colossal amounts of data. Physicists must sift through about 30 petabytes of data produced annually in their search for new particles and interesting physics. The tidal wave of data produced by the Large Hadron Collider (LHC) at CERN places an unprecedented challenge for experiments' data acquisition systems, and it is the need to select rare physics processes with high efficiency while rejecting high-rate background processes that drives the architectural decisions and technology choices. Although filtering and managing large data sets is of course not exclusive to particle physics, the approach that has been taken is somewhat unique. In this talk, I will describe the typical journey taken by data from the readout electronics of one experiment to the results of a physics analysis.

Speakers

Andrew Lowe

Scientific Research Fellow, Wigner Research Centre for Physics, Hungarian Academy of Sciences

Andrew Lowe is a particle physicist at the Wigner Research Centre for Physics, Hungarian Academy of Sciences, in Budapest. He spent several years based at the European Organization for Nuclear Research (CERN) in Geneva and was a member of the collaboration that discovered the Higgs... Read More →

Wednesday June 3, 2015 16:50 - 17:20 CEST
Mátyás II.

Big Data

16:50 CEST

Interactive Graph Analytics with Spark

The Spark community has a lot of experience using Spark for offline batch analysis tasks coming from a broad range of use cases. But creating an interactive web application which aims for sub-second response times using Spark as the computation backend is still a somewhat unexplored territory. We at Lynx Analytics wandered into this territory when we built LynxKite, our big graph analysis tool. The tool enables users to interactively explore graphs of hundreds of millions of vertices and billions of edges. Exploration includes global and local views of the graph featuring visualization of attributes, connections and distributions. This talk is about the technical challenges — general and domain specific — we faced during building this software and about our solutions. We will talk about problems like scheduler delay, GC pauses, interoperability with other Akka based libraries and solutions like sorted RDDs, prefix sampling, and column based attribute representation.

Speakers

Darabos Dániel

Programozó, Lynx Analytics

Dániel has been member of the LynxKite developers team in Budapest since the very beginning of the project. Prior to this he worked at Google SRE team in Dublin.

Wednesday June 3, 2015 16:50 - 17:20 CEST
Mátyás I

Spark

17:20 CEST

Welcome reception

Wednesday June 3, 2015 17:20 - 19:20 CEST
Hotel

Break

08:00 CEST

Registration / Regisztráció

Thursday June 4, 2015 08:00 - 09:00 CEST
Hotel

Break

09:00 CEST

Welcome Talk

Welcome notes by the conference host

Speakers

Bence Arató

Managing Director, BI Consulting

Thursday June 4, 2015 09:00 - 09:15 CEST
Mátyás I-II.

Plenary

09:15 CEST

Big Data Roundtable

Mi a sikerhez vezető út kulcsa a Big Data világában? Egy jó ötlet? Annak kivitelezése? A megfelelő időzítés? Egy nagyszerű csapat? Vagy esetleg csak egy csipetnyi szerencse?

Kerekasztal beszélgetésünk ezekre a kérdésekre keresi a választ olyan résztvevőkkel, akik szívesen megosztják saját tapasztalataikat arról, hogy hogyan kell egy sikeres startup céget vezetni.

Prekopcsák Zoltán, VP of Big Data at Rapidminer, ex-co-founder & CEO of Radoop.eu
Nagy István, CTO and Co-Founder, Enbrite.ly
Papp Lajos, co-founder at SequenceIQ

Speakers

Nagy István

Technológiai vezető (CTO) • senior partner, data scientist, enbrite.ly • Dmlab

Founder and CTO of the startup company enbrite.ly to fight against fraudsters on the online advertising market, and founder and senior partner of Dmlab, one of the leading data mining companies in Hungary. His more than 8 years experience in data analysis combines the mindset of a... Read More →

Papp Lajos

devops, SequenceIQ

Lajos is a co-founder of SequenceIQ. He is is a fetishist of automation, let it be cluster provisioning, or sending a birthday sms. Lately he become a docker evangelist, promoting virtualization into every aspect of a software project. His topics of interest are continuous integration... Read More →

Prekopcsák Zoltán

VP Big Data, RapidMiner

Thursday June 4, 2015 09:15 - 09:45 CEST
Mátyás I-II.

Plenary

09:45 CEST

Watson Analytics - a new way to do data analysis

Thursday June 4, 2015 09:45 - 10:25 CEST
Mátyás I-II.

Plenary

10:25 CEST

Break / Szünet

Thursday June 4, 2015 10:25 - 10:55 CEST
Mátyás I-II.

Break

10:55 CEST

SQL Engines on Hadoop - The case for Impala

This talk will go through the history and current state of processing engines for Hadoop, in particular, focussing SQL engines on Hadoop. We will, then, dive deep into one of the SQL processing engines for Hadoop - Cloudera Impala.

The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Hadoop cluster. WithImpala, the Hadoop community now has an open-sourced codebase that helps users query data stored in HDFS and Apache HBase in real time, using familiar SQL syntax. In contrast with other SQL-on-Hadoop initiatives, Impala's operations are fast enough to do interactively on native Hadoop data rather than in long-running batch jobs. Now you have the freedom to discover relationships and explore what-if scenarios on Big Data datasets. By taking advantage of Hadoop's infrastructure, Impala lets you avoid traditional data warehouse obstacles like rigid schema design and the cost of expensive ETL jobs.

This talk starts out with an overview of Impala from the user's perspective, followed by a presentation of Impala's architecture and implementation. It concludes with a summary of Impala's benefits when compared with the available SQL-on-Hadoop alternatives.

Speakers

Mark Grover

Software Engineer, Cloudera

Mark is the co-author of O'Reilly's Hadoop Application Architectures book, a committer on Apache Bigtop and a committer and PMC member on Apache Sentry (incubating). He has contributed code to Apache Hadoop, Apache Hive, Apache Sqoop and Apache Flume projects. He is also a section... Read More →

Thursday June 4, 2015 10:55 - 11:35 CEST
Mátyás I-II.

Plenary

11:35 CEST

The Evolution of Big Data at Spotify - Through Failures and Pain

The quickest way to learn and evolve infrastructure is by encountering obstacles and being forced to overcome limitations that keep you inches away from project goals. At Spotify, we’ve encountered many of these obstacles and frustrations as we grew our Big Data systems from a few machines in an office closet aggregating played song events for financial reports, to our current 1300 node Hadoop cluster and a complex architecture that plays a large role in many features that you see in our application today and processes PBs of data every week.

A member of Spotify’s Hadoop ‘squad’ will weave in war stories, failures, frustrations and lessons learned to describe the Hadoop/Big Data architecture at Spotify and talk about how that architecture has evolved over time.

Speakers

Josh Baer

Hadoop Product Owner, Spotify

Josh ‘joined the band’ at Spotify in early 2013 and has worked on a small team focusing on stabilizing and enhancing the Hadoop infrastructure: performing multiple migrations, upgrades and growing the cluster from 190 nodes to over 1300+. Josh holds a BS in Computer Science/Philosophy... Read More →

Thursday June 4, 2015 11:35 - 12:15 CEST
Mátyás I-II.

Plenary

12:15 CEST

Book signing - Hadoop Application Architectures

Mark is co-authoring an O'Reilly book called Hadoop Application Architectures. It gives Hadoop developers, users and admins a higher level view of using Hadoop - about what are the typical use cases being solved with Hadoop and how to architect solutions for similar use cases (what technologies to choose, how to design schemas, etc.).
During this event some attendees can get a free, signed copy of the book.

Speakers

Mark Grover

Software Engineer, Cloudera

Thursday June 4, 2015 12:15 - 12:35 CEST
Hotel

Break

12:15 CEST

Lunch / Ebéd

Thursday June 4, 2015 12:15 - 13:30 CEST
Hotel

Break

13:30 CEST

Nyugdíj előrejelzés korszerű mikroszimulációs módszerrel

Az egyéni életpályák, a nyugdíj jogszerzés társadalom szintű megoszlása és a területet érintő intézkedések hatása nem mutatható ki determinisztikus egyenletekkel, nem modellezhető makro modellekkel. Az előadás bemutatja, hogy egy nemrég megvalósult Európai Uniós projekt eredményeképpen az Országos Nyugdíjbiztosítási Főigazgatóság Európai szinten is korszerűnek számító mikroszimulációs modellezéssel vizsgálhatja és prognosztizálhatja a nyugdíjrendszert, az esetleges jövőbeni változások hatását.

Speakers

Tóth Krisztián

Mikroszimulációs nyugdíjmodell fejlesztő, ONYF

BCE-ELTE közös mesterszakán végzett biztosításmatematikusként. Az ONYF munkatársaként a MIDAS_HU mikroszimulációs nyugdíjmodell fejlesztésén dolgozik, a fejlesztés 2012-es megkezdése óta.

Puskás Péter

DWH Developer and Consultant, Omnit Solutions

Karriere során adattárház, BI, adatbázis és alkalmazás fejlesztési projekteken keresztül lehetősége volt megismerni a fejlesztési és üzleti oldalt is egyaránt. Az Omnit Solutions csapatát erősítve BI szakértőként adattárház és üzleti intelligencia megoldások... Read More →

Thursday June 4, 2015 13:30 - 14:00 CEST
Mátyás I

Data Warehousing

13:30 CEST

Designing Agile Data Pipelines

Agile software development values responding to change over following a plan. Responding to changes means allowing data scientists to experiment with data and allowing developers to easily modify data processing and even make mistakes without taking huge risks. Well designed data pipelines gives organizations flexible data analysis. In this session we'll show how to design architectures that make it easy and safe to extend and modify data analysis software.

We will look at how to design an agile data processing architecture using Apache Hadoop, Apache Kafka and stream processing frameworks. The architectures we’ll discuss make it easy to add new data sources, experiment with new analysis algorithms and correct data processing errors. All this makes the data pipeline both flexible and safe.

Speakers

Ashish Singh

Software Engineer, Cloudera

Ashish Singh is a Software Engineer, working with Cloudera to empower Hadoop ecosystem to answer bigger questions. He contributes to Apache Kafka, Hive, Parquet and Sentry. Prior to joining Cloudera, he worked on optimizing MPI collective communications on High Performance Computing... Read More →

Thursday June 4, 2015 13:30 - 14:00 CEST
Mátyás II.

Stream

14:05 CEST

Rövid bevezetés a data governance-be

Big Data és elemzések, adattárház és önkiszolgáló BI - napjaink sláger témái. Ugyanakkor ahhoz, hogy hatékonyan tudjuk kezelni és elemezni az összegyűjtött adatainkat és ne költsünk felesleges dolgokra, tudni kell, hogy mink van, minek mi az értéke és mennyibe kerül. Ennek megválaszolását segíti a data governance (adatvagyon-kezelés). Az előadás „kedvcsináló jelleggel” bemutatja a DG alapokat.

Speakers

Gollnhofer Gábor

Vezető Tanácsadó, DMS Consulting

Az adattárházak tapasztalt szakembere, 1996 óta foglalkozik magyar és külföldi DW/BI rendszerek kialakításával és ehhez kapcsolódó tanácsadással.Kiemelt szakterülete a rendszertervezés és az adatmodellezés, mind az adattárházak, mind a hagyományos informatikai... Read More →

Thursday June 4, 2015 14:05 - 14:35 CEST
Mátyás I

Data Warehousing

14:05 CEST

Real-time data processing with Apache Flink

Flink Streaming is a distributed, fault-tolerant, real-time data processing engine provided by the Apache Flink data analytics platform. It is currently programmable in Java and Scala using stateful functional operators including map, aggregations and temporal joins amongst many others. The streaming API also features flexible windowing semantics to express a wide variety of business logic.

In the Flink runtime layer both batch and streaming jobs are executed as a common data flow graph thus unifying batch and stream processing in an elegant way. Flink provides a more straight-forward and transparent approach than the lambda architecture or other state of the art solutions. Flink also provides exactly-once processing guarantees for streaming programs with a combination of upstream backup and consistent user state snapshots.

The highly efficient runtime layer offers competitive performance compared to current streaming solutions with a rich and expressive API. This talk will focus on the API and runtime features of Flink Streaming in comparison with current industry standard streaming solutions.

Speakers

Gyula Fóra

Researcher, Distributed Systems, SICS

Gyula is a committer and PMC member for the Apache Flink project, currently working as a researcher at the Swedish Institute of Computer Science. His main expertise and interest is real-time distributed data processing frameworks, and their connections to other big data applications... Read More →

Thursday June 4, 2015 14:05 - 14:35 CEST
Mátyás II.

Stream

14:40 CEST

How Prezi uses Amazon Redshift

Redshift is a fast, fully managed, petabyte-scale data warehouse solution. At Prezi we voted for it as Data Warehouse technology. We have couple of terabytes of data in it and is available for everybody doing interactive data analysis with blazing fast response time. In my talk I will show why we chose to use Redshift as our distributed SQL database and what best practices we applied to scale to our needs as the amount of data started to reach 10TB and the user base increased.

Speakers

Németh Tamás

Data Engineer, Prezi

Tamás has more than 10 years prior experience as software engineer in various fields like PKI and investment banking. Now at Prezi as a data engineer he makes sure the data infrastructure rocks: it is reliable and a joy to work with.

Thursday June 4, 2015 14:40 - 15:10 CEST
Mátyás I

Data Warehousing

14:40 CEST

STREAMLINE: learning from data streams with Apache Flink

STREAMLINE is the research project of TU Berlin, SICS Stockholm and SZTAKI Budapest for reducing system and human latencies in the analytics of high speed data streams. On top of Apache Flink, we

Develop automatic optimization, parallelization, and system adaptation technologies that reduce the programming expertise required by data scientists, thereby enabling them to more freely focus on domain specific matters.

Overcome the complexity of the so-called ‘lambda architecture’ by delivering simplified operations that jointly support “data at rest” and “data in motion” in a single system that is compatible with the Hadoop ecosystem.

Develop new machine learning technologies capable of very fast reacting to changes in the stream.

In the presentation we show results of our experiments over telecommunication and recommendation use cases.

Speakers

Benczúr András

Head of Big Data Research Group, MTA SZTAKI

András Benczúr is the head of Informatics Laboratory of 30 doctoral students, post-docs and developers. Benczúr received his Ph.D. at the Massachusetts Institute of Technology in 1997, since then his interest turned to Information Retrieval and Web Search. He was representing SZTAKI... Read More →

Thursday June 4, 2015 14:40 - 15:10 CEST
Mátyás II.

Stream

15:10 CEST

Break / Szünet

Thursday June 4, 2015 15:10 - 15:40 CEST
Mátyás II.

Break

15:40 CEST

Big Data & DWH modernization

Thursday June 4, 2015 15:40 - 16:10 CEST
Mátyás I

Data Warehousing

15:40 CEST

Bootstrap Real Time pipeline in 30 minutes

In a world where every "Thing" is producing lots of data, ingesting and processing that large volume of data becomes a big problem. In today's dynamic world, firms have to react to changing conditions very fast, or even better in real time. In this talk we will take on this interesting challenge using latest and greatest tools from Big Data community. We will try to combine awesomeness of Kafka, a resilient pub-sub messaging system, with the powers of Spark streaming for scalable, high-throughput, fault-tolerant stream processing of live data streams. Combining different systems to get even a more powerful system is great, but has its own complexity. With a demo of building a pipeline to ingest and process real time data using these systems, we will explore how the two systems can be intertwined to make the most out of the combined system.

Speakers

Ashish Singh

Software Engineer, Cloudera

Thursday June 4, 2015 15:40 - 16:10 CEST
Mátyás II.

Stream

16:15 CEST

BDD: The Visual Face of Hadoop, The Hidden Face of Spark

Big Data Discovery (BDD) is a new Oracle product aimed at cataloging, enriching and analyzing data sets stored in Hadoop in a visual manner without the need to code. Data Scientists speed up preparation tasks, business analysts can correlate faster, and business users can tap quickly into Hadoop data. But what’s really under the covers of BDD? This 30min session will try to give you a glimpse.

Speakers

Luis Moreno Campos

EMEA Big Data Solutions Lead, Oracle

Luis is a Big Data Solutions director at Oracle for EMEA doing Business Development, Marketing Campaigns, Partner development and Sales Enablement. Regular speaker at Industry and technology events, CIO roundtables, technology user groups, University seminars, and marketing events... Read More →

Thursday June 4, 2015 16:15 - 16:45 CEST
Mátyás I

Big Data

16:15 CEST

Data in fashion - Solving problems of apparel e-commerce with data

More than half of all new televisions are sold online in the US, but online sales only account for 10% in the $400bn US apparel retail market. Why? It's a lot harder to find an apparel item that you still like and that fits you after you unpack the box. Apparel retailers face average return rates of 20-40% and conversion rates of 2-3%.
I will talk about how we ended up tackling the problems of a market three times the size of Hungary's GDP and I will share my personal and professional experience leading a small team of engineers working on products integrated into the largest ecommerce stores in the US and Europe.

Speakers

Rátky Gábor

CTO, Secret Sauce Partners Inc.

As a member of the founding team and the technical lead of Secret Sauce, Gabor has been instrumental in assembling and growing the engineering team in Budapest that provides services to some of the most iconic brands in the US. As CTO, Gabor ensures that the team can tackle the... Read More →

Thursday June 4, 2015 16:15 - 16:45 CEST
Mátyás II.

Big Data

16:50 CEST

Finding Hijacked Accounts: Anomaly Detection in User Behavior Analysis

Me and my team are currently developing a novel IT security product that employs user behavior analytics. With this product, security professionals can sustain a high level of security in complex IT environments by detecting abnormal activities that could indicate masquerade attacks, malicious insiders or other forms of security threats. As opposed to common SIEM (Security Information and Event Management) solutions that achieve this through comparing incoming activities to a manually defined rule database, our solution identifies reference patterns through unsupervised machine learning, providing more flexibility in specifying normal behavior. After this the ensemble of multiple algorithms scores incoming activities, highlighting those that differ most significantly from the previously learned baseline patterns.

I will present the most important high-level problems of this field, and I will also demonstrate the data science challenges that were translated from these issues. After defining these challenges, I will provide a broad perspective on the tools and algorithms that we develop and also methods that we utilize to resolve the challenges.

Speakers

Kovács László

Data Scientist, Balabit-Europe

László works as a Data Scientist at BalaBit-Europe. His main responsibilities include researching, developing, customizing, and testing of algorithms for an IT security product that detects anomalous activities in user behavior data. Prior to BalaBit he participated in data warehousing... Read More →

Thursday June 4, 2015 16:50 - 17:05 CEST
Mátyás I

Big Data Quick talk

16:50 CEST

Emberek vs. botok: az enbrite.ly története

Az enbrite.ly, az online hirdetési piac megtisztításáért küzdő csapat a webes felhasználók viselkedésének elemzésével képes feltárni a piac sötét oldalát, azokat a csalókat, akik a hirdetők zsebéből dollármilliókat húznak ki évente mesterségesen generált webes forgalom eladásával.
Előadásomban bemutatom a csalók felderítésére használt módszereinket és végigvezetem a hallgatóságot azon a különböző technológiákkal tarkított út mentén, amit a jelenlegi architektúra kialakulásához vezetve bejártunk.

Speakers

Nagy István

Technológiai vezető (CTO) • senior partner, data scientist, enbrite.ly • Dmlab

Thursday June 4, 2015 16:50 - 17:20 CEST
Mátyás II.

Big Data

08:00 CEST

Registration / Regisztráció

Friday June 5, 2015 08:00 - 09:00 CEST
Hotel

Break

09:00 CEST

Adattárházat egy nap alatt I.

A bevezető jellegű workshop az adattárházak kialakításának legfontosabb területet érinti, egy nap alatt áttekintést adva azokról a kérdésekről, amelyek jellemzően felmerülnek egy DW fejlesztési projekt kapcsán.

A workshop előadói évtizedes szakmai múlttal rrendelkező DW szakértők, akik igen széles körű tanácsadási és oktatási tapasztalatokkal rendelkeznek.

I. rész tematikája:

Bevezetés és DW Architektúra
Adatmodellezés
ETL
Gyakorlati bemutató

Speakers

Bence Arató

Managing Director, BI Consulting

Fekszi Csaba

Ügyvezető Partner, Omnit Solutions Kft.

40 éves adattárház- és BI szakértő. A KKVMF informatika szakának és a Budapesti PSZF számvitel szakának elvégzése után a Veszprémi egyetemen szerzett mesterfokozatot informatikából. Pályája során számos, főleg banki és pénzügyi folyamatokat kiszolgál... Read More →

Gollnhofer Gábor

Vezető Tanácsadó, DMS Consulting

Friday June 5, 2015 09:00 - 12:00 CEST
Mátyás II.

Workshop

09:00 CEST

Introduction to Apache Hadoop

Originally inspired by Google's GFS and MapReduce papers, Apache Hadoop is an open source framework offering scalable, distributed, fault-tolerant data storage and processing on standard hardware. This session explains what Hadoop is and where it best fits into the modern data center. You'll learn the basics of how it offers scalable data storage and processing, some important "ecosystem" tools that complement Hadoop's capabilities, and several practical ways organizations are using these tools today. Additionally, you'll learn about the basic architecture of a Hadoop cluster and some recent developments that will further improve Hadoop's scalability and performance.

Basic knowledge: None

Preparation:
This tutorial would use Cloudera QuickStart VM for demo (http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms.html). The attendees are welcome to download the VM beforehand on their laptops and follow along with the demo instructions. It is, however, not required.

Speakers

Mark Grover

Software Engineer, Cloudera

Friday June 5, 2015 09:00 - 12:00 CEST
Mátyás I

Workshop

09:00 CEST

MongoDB I.

A workshop a vezető NoSQL adatbázis, a MongoDB tulajdonságait, architektúráját és alkalmazási területeit mutatja be. Foglalkozik az adatmodellezés, a lekérdezések és az adatmódosító műveletek, valamint a magas rendelkezésre állás és a skálázhatóság alapjaival.
A MongoDB a vezető NoSQL adatbázis, amely lehetővé teszi a vállalatok számára, hogy még agilisabbak legyenek és még hatékonyabban növekedjenek. A Fortune 500 vállalatai és a startup cégek egyaránt használják, hogy segítségével új típusú alkalmazásokat hozzanak létre, javítsák az ügyfél-élményt, lerövidítsék a piacra lépéshez szükséges időt és csökkentsék költségeiket.
A MongoDB egy agilis adatbázis, mely lehetővé teszi, hogy a sémák olyan gyorsan változzanak, ahogy az alkalmazások fejlődnek, miközben továbbra is biztosítja azt a funkcionalitást melyet a fejlesztők a hagyományos adatbázisoktól várnak, mint például a másodlagos indexek, a teljes lekérdezési nyelv és a szigorú konzisztencia.
A MongoDB legkiemelkedőbb előnyei a skálázhatóság, a teljesítmény és a magas rendelkezésre állás. Legyen az akár egyetlen szerveren vagy akár nagy, komplex, több-telephelyes architektúrán kiépítve. Kihasználva az In-Memory computing előnyeit, a MongoDB egyaránt nagy teljesítményt biztosít az olvasás és az írás területén. A natív replikáció és az automatikus feladatátvétel (failover) pedig biztosítja a vállalati szintű megbízhatóságot és működési rugalmasságot.

Workshop 1 tematika

Ismerkedés a MongoDB-vel:

Alapvető fogalmak
Telepítés
JSON / BSON
MongoDB Shell

Első adatbázis műveletek:

Alapvető fogalmak (documents, collections)
CRUD (Create, Read, Update, Delete)
Indexelés alapfogalmak

A workshopon való részvétel előfeltételei:

Saját laptop
MongoDB 2.4.14 telepítve (letölthető innen: https://www.mongodb.org/downloads#previous)

Speakers

Izsák Tamás

Az APPWORKS ügyvezetője, aki több mint 10 éves adatbázis szakértőként és vezető fejlesztőként szerzett tapasztalattal rendelkezik, a relációs- és NoSQL adatbázisok területén (Oracle Database, MongoDB). Az APPWORKS Magyarországon elsőként szerezte meg a MongoDB... Read More →

Friday June 5, 2015 09:00 - 12:00 CEST
István

Workshop

12:00 CEST

Lunch / Ebéd

Friday June 5, 2015 12:00 - 13:00 CEST
Hotel

Break

13:00 CEST

Adattárházat egy nap alatt II.

Adatpiacok tervezése
BI réteg tervezése
BI felület kiépítése demó
Üzemeltetési kérdések

Speakers

Bence Arató

Managing Director, BI Consulting

Fekszi Csaba

Ügyvezető Partner, Omnit Solutions Kft.

Gollnhofer Gábor

Vezető Tanácsadó, DMS Consulting

Friday June 5, 2015 13:00 - 16:00 CEST
Mátyás II.

Workshop

13:00 CEST

Data Science bevezető

Mindenhol azt halljuk, hogy a következő idők legkeresettebb szakmáinak egyike a data scienctist lesz, de milyen készségek kellenek ahhoz. hogy valakiből az adatok tudósa váljon? A workshop során bemutatásra kerülnek azok a területek, technológiák és módszerek, amik az elejét jelenthetik a data scientistté válás útjának.

Konkrét üzleti problémák mentén bemutatásra kerülnek:

az adatok manipulálásához szükséges készségek és technológiák;
az adatok elemzéséhez szükséges adatbányászati módszerek alapjai;
a nagyméretű adatok kezeléséhez szükséges technológiák;
az adatelemzés eredményeinek kommunikálásához szükséges eszközök és kommunikációs technikák.

A workshopra olyanok jelentkezőket várunk, akik érdeklődnek a data science iránt. A workshopon történő részvétel előképzettséget nem igényel.

Speakers