Conference Program

JUNE 17-18, 2020 | MESSAGING & STREAMING EVERYWHERE

Pulsar Storage on BookKeeper - Seamless Evolution

10:10 AM - 10:50 AM PDT, June 17, 2020 | Track 3

Apache Pulsar has a distinct architecture from other messaging systems. There is a clear separation of the compute layer that does message processing and dispatching, from the storage layer that handles persistent message storage, using Apache Bookkeeper. This separation of concerns leads to a very efficient design, in terms of performance and cost.
Messaging systems that provide guaranteed delivery, when used in production use cases, impose on the underlying storage, demands that are very different from simple benchmark scenarios that test write throughput. Pulsar, with both I/O isolation and separation of concerns, performs better than other messaging systems in production use cases. The strategy of I/O isolation provides better performance from each storage node at less cost, and the separation between computing and storage means that compute nodes can be scaled independently from storage. Irrespective of the choice of storage, Pulsar can be configured to get the best performance for any of those storage configurations.
This paper also discusses how some of the latest technologies like NVMe and Persistent Memory can be leveraged at a very low cost overhead, by Pulsar, without any architectural or design changes, with some data from real use cases. The fundamental choice of using Bookkeeper as the storage layer for Pulsar is validated from our experience.

Technology Deep Dive

Joe Francis

Joe Francis heads the team that provides Key-Value Storage and Messaging systems for all of Verizon Media. Joe has led the development and operation of multiple large scale distributed systems, including Pulsar, while at Yahoo. Prior to joining Yahoo, Joe was the architect for database kernel storage and recovery, at Sybase. He is a member of the PMC of Apache Pulsar. Joe is well-versed in the development of Pulsar from its very inception, its growing pains to a large-scale system, and operating at scale in production.

Rajan Dhabalia

Rajan Dhabalia is a Principal Software Engineer at Verizon Media working on messaging and distributed key-value storage technologies. His interests lie in building reliable, scalable distributed data processing systems. He is a PMC member of Apache Pulsar lead developer of Pulsar in Verizon Media and spent several years developing Pulsar in Yahoo before making it open source.