Presented by:

Marsall presser

Marshall Presser

Pivotal Software

Marshall Presser is a Data Engineer in Pivotal's Data Labs where he helps customers solve complex analytic problems with the Greenplum Database.

Prior to coming to Pivotal (formerly Greenplum), he spent 12 years at Oracle, specializing in High Availability, Business Continuity, Clustering, Parallel Database Technology, Disaster Recovery and Large Scale Database Systems. Marshall has also worked for a number of hardware vendors implementing clusters and other parallel architectures. His background includes parallel computation, operating system and compiler development as well as private consulting for organizations in heath care, financial services, and federal and state governments.

Marshall holds a B.A in Mathematics and an M.A. in Economics and Statistics from the University of Pennsylvania and a M.Sc. in Computing from Imperial College, London.

Andreas Scherbaum is working with PostgreSQL since 1997. He is involved in several PostgreSQL related community projects, member of the Board of Directors of the European PostgreSQL User Group and also wrote a PostgreSQL book (in German). Since 2011 he is working for EMC/Greenplum/Pivotal and tackles very big databases.

No video of the event yet, sorry!
Download the Slides

It's more than just storing and retrieving data. Equally important are loading high volume data in parallel and running analytics in the database. This hands-on session will lead you through the entire process of creating, loading, and analyzing data in the Greenplum MPP database. It's PostgreSQL, but bigger and DWH-focused.

At the end of this workshop, attendees will learn modern DWH techniques in a PostgreSQL based Massively Parallel Processing platform. This includes the basic architecture of the Greenplum Database, the parallel techniques for loading, querying, and analyzing structured and semi-structured data, as well as the tools Greenplum provides for doing analytics in the database.

Workshop Agenda:

Introduction to MPP and Greenplum

Distribution -- a key to good performance in Greenplum

Parallel loading -- loading multi Terabytes per hour

Loading from s3 and external connectivity

Polymorphic storage and external partitions

Compare external tables to Foreign Data Wrappers

Partitioning vs. Distribution -- how they interact

Difference between PG and GP partitions

Query response time exercises

Running Analytics in Greenplum: MADlib exercise

Analyzing Free Form Text with SOLR and GPText

Monitoring and Managing Greenplum with Command Center

Managing Concurrency with Resource groups and Workload Manager

Running PL/Python and PL/R as Trusted Languages with PL/Container

Pre-requisites: Laptop with a modern browser and SSH client; Instruction on using SSH on Windows; Basic knowledge of SQL

Users will connect to a cloud based Greenplum Cluster.

There will be a maximum of 25 attendees.

Suggested Pre work:

Videos on YouTube Channel

GP Database basics - https://www.youtube.com/watch?v=cCuGX_fLNl8&list=PL4duir3J-8GUodk1uS9ONPU_eWvfCeVjT

GP & analytics: https://www.youtube.com/watch?v=3K1PRZNYHZE&list=PL4duir3J-8GXgVNvHVE8Y86W79Gzu5oEk

GP & MADlib https://www.youtube.com/watch?v=Nza2F2dU-Q0&list=PL4duir3J-8GUcubGGpudx6KCCxp8onTI8

Date:
Duration:
7 h
Room:
Conference:
PostgresConf US 2018
Language:
English
Track:
Greenplum Summit
Difficulty:
Medium
Requires Registration:
Yes (Registered: 11)