Pulsar Pipeline

How Pulsar Pipeline Works

Pulsar Pipeline is a highly scalable and reliable event-driven data pipeline for real-time analytics. It was primarily created for user behavior analytics but could be used for many other problems. User behavior events contain structured information, such as user-agent or native application identifier (IP address). Applications can extend such information to include attributes relevant to a specific event type. These events are captured by native or web applications for real-time and offline analysis. These events are captured by native or web applications for real-time and offline analysis. In Pulsar, events are transported asynchronously across the pipeline by stages; each stage can be built and operated independently, having its own deployment and release cycles.

 

Feature

 

All Pulsar Pipeline applications are built on top of Jetstream , eBay-built infrastructure software for stream processing. Jetstream provides a Java-based, distributed, complex event processing framework as well as tooling to build, deploy, and manage complex event-processing applications in a cloud environment.

The Pulsar Pipeline includes the following components:

  • Collector: Ingests events through a Rest end point
  • Sessionizer: Sessionizes the events, maintaining the session state and generating marker events
  • Distributor: Filters and mutates events to different consumers; acts as an event router
  • Metrics calculator: Calculates metrics by various dimensions and persists them in the metrics store
  • Replay: Replays the failed events on other stages
  • ConfigApp: Configures dynamic provisioning for the whole pipeline

What's News

For a full set of documentation and implementation guides to get up and running, visit the Pulsar Pipeline Wiki.