Customize and Secure the Runtime and Dependencies of Your Procedural Languages using PL/Container
Currently I'm a staff software engineer at Pivotal. My work includes PLContainer, Greenplum and HAWQ(a SQL-On-Hadoop system). I achieved my master degree at Peking University, major in artificial intelligence. I'm interested in database systems and distributed computing platform.
Jack Wu is Principle Product Manager in Pivotal Inc. He focuses on Greenplum Database for 7 years. He is also interesting in Cloud Computing and working on extending Greenplum Database with Cloud technology. Before joining Pivotal, he worked in IBM as a developer of DB2. He received M.S. degree in computer science from Tsinghua University in 2003.
Python and R are widely used among data scientists. These two languages are supported in Greenplum in the form of procedural languages, PL/Python and PL/R, but unfortunately they suffer from a security problem due to running them as untrusted. As an untrusted language, only administrators can create the UDF (user defined language) which limits the convenience of data scientist training and debugging models.
We propose a new way of sandboxing the execution of PL/Python and PL/R in Greenplum. PL/Container allows a user to run a UDF inside a docker container. The container is a mechanism which can isolate the execution process in a separate environment, and provides the isolation of namespace along with allowing a non-admin user to create the UDF. This enables us to decouple the data processing. SQL operators such as "scan," "filter," and "project" are executed at query executor side, while advanced data analysis is executed at container side. Additionally, multiple versions of Python and R could be supported at the same time by PL/Container. For example, a user could build an Anaconda image to simplify data analysis tasks. In the future, we plan to support more languages in PL/Container and as well as support them on Postgres.
- 50 min
- PostgresConf US 2018
- Greenplum Summit