Pulsar FAQ

Can't find your answer? Contact us

General FAQ

Q1: What is Pulsar?

Pulsar provides a realtime analytics platform and stream processing framework. It can be used to collect and process user and business events in realtime to provide key insights and enable systems to react to user activities within seconds. Pulsar provides realtime sessionization, multi-dimensional metrics aggregation over time windows and custom stream creation through data enrichment, mutation and filtering using a SQL-like event processing language. Pulsar scales to millions of events per second with high availability. It can be integrated with metrics stores like Druid and Cassandra. After integrating with Druid, Pulsar can easily generate multi-dimensional and interactive reports with drill-down and slice-and-dice in real-time

Q2: Why Pulsar?

Pulsar provides several key systemic qualities that are important for e scale real-time analytic processing and visualization:

  • Scalability - Scale to tens of millions of events per second. Events are partitioned across cluster nodes dynamically. Pulsar will automatically detect and adjust event partitioning when a node joins or leaves a cluster.
  • Availability - No downtime during software upgrade, stream processing of rules and topology changes
  • Flexibility - SQL-like language and annotations for defining stream processing rules. Support for declarative pipeline topology definition to enable dynamic topology changes at runtime. Complex event processing (CEP) support is enabled through integration with Esper.
  • Visualization - Multi-dimensional and interactive reports with drill-down and slice-and-dice in real-time.

Q3: What are the use cases for Pulsar?

Pulsar supports use cases that require near realtime collection and processing of vast amount of events to derive actionable insights and generate signals for immediate action. Some common use cases include:

  • Real-time reporting and dashboard
  • Business activity monitoring
  • Personalization and targeting
  • Fraud and bot detection
  • Query within OLAP engines

Q4: What is a Pulsar Streaming event?

Events in Pulsar are a set of user-defined tuples. They can represent user interactions, business or system events. Once the events are ingested into Pulsar pipeline, data enrichment, filtering and processing can happen to drive various real-time analytic use cases.

Q5: What are the components of Pulsar Streaming?

Pulsar Streaming includes the following components to perform common realtime analytics:

  • Collector - Restful end point for event ingestion
  • Sessionizer - Sessionizes the events, maintains session states and generates session-begin/session-end marker events. Includes pluggable interface for persistent session store integration
  • Distributor - Filters and mutates event to different consumers; acts as an event router
  • Metric Calculator - Calculates metrics by various dimensions and persist them in the metrics store
  • Replay - Replays undelivered or unprocessed events on other stages to avoid data loss
  • ConfigApp - Configures and provision processing logic for Pulsar pipeline

These components are built on top of a realtime stream processing framework which is also open sourced as part of Pulsar. Through dynamic topology changes, the Pulsar pipeline can be easily extended with additional components or logic.

Q6: Where can I start and/or participate in a discussion or pose a question?

Visit us at Pulsar Google Group.

Q7: How do I report a bug?

You can file a ticket on GitHub.

Q8: What is Pulsar licensed under?

Pulsar source code is divided into multiple code repositories in GitHub. The realtime-analytics and jetstream-esper components of Pulsar are licensed under GPLv2. The jetstream component is dual licensed under the Apache 2.0 and MIT licenses. The Pulsar Reporting API and UI are dual licensed under the Apache 2.0 and MIT licenses.

Technical FAQ

Q1: How many metrics can be calculated in one MC node?

Based on our testing, one MC node runs on a commodity hardware can support up to 1M (cardinality) metrics in one metric collection cycle(1~5minutes)

Q2: What is the difference between JetStream Message and Kafka Message?

JetStream Message is preferred than Kafka Message because it’s cheaper. You don’t have to setup a Kafka cluster to subscribe to kafka topics and get messages. You can have your JetStream cluster subscribe to Jetstream topics directly with lower latency.

Q3: Is rest channel supported to dispatch messages?

The Jetstream framework has a built-in OutboundRESTChannel and httpclient, users can extend the ED wire-on flow and add rest channel and dispatch messages through rest.

Q4: How can I simulate some testing data?

You can simply use any HttpClient to post the request with the JSON data format to the collector http server. To help guide you, review the event model and take a look at some sample code .

Q5: What kind of permission control is included in Pulsar Reporting UI?

When a request is performed, it may fail if the authentication/authorization were not permitted. Additionally, UI developers may enable or disable using credentials in requests in Angular’s config step (By calling the prApiProvider.useWithCredentialsDatasource provider). If your setup includes the popular UI-Router, consider integrating angular-permission to control permissions of the application pages.

Q6: What kind of authentication and authorization are included in Pulsar Reporting API?

1. For authentication, we use database simple authentication based on spring security
2. For authorization, there are two kind of permissions: SYS permissions and resource permissions.
    a) SYS permissions: refer to <code>SysPermission</code> under pulsarquery-admin module
    b) Resource permissions: each dashboard, datasource, group are treat as RESOURCE.
3. There are 2 right for each resource: <resource name>_MANAGE, <resource_name>_VIEW.
    a) <resource_name>_MANAGE means you can do update view/delete/edit the resource.
    b) <resource_name>_VIEW means you can view the resource
    c) User could only do what they have permissions to do. User could only grant rights he already had to a group he has right to manage it.
    d) All resource’s owner have resource manage right by default.

Q7: Is Pulsar Reporting UI a chart library or a dashboard builder?

It’s a dashboard builder. It’s a pure Angular JS application. It includes many services and components to make building reports easily.

Q8: What is the procedure to add a new dashboard using the self-service tool?

1. Ensure you have included the ui.creator module dependency.
2. Go to Reports Creator page.
3. Click “+” button, then typing dashboard name and choose data source.
4. Click edit button to add, remove or edit the widget for your dashboard.
5. Do not forget to save it after you update the dashboard.

Q9: I would like to add a new type of chart in my customized dashboard. What are the steps to do so?

Three components must be added in order to create a new type of widgets.
1. Call the widget method in prDashboardProvider. This will configure the 2 next components.
2. The directive that renders the new widget.
3. The 2 controllers and templates for the widget view and edit modes.
We recommend checking the source code, prDashboardWidgets provides several examples of dashboard widgets.

Q10: What’s the difference between Pulsar Reporting UI and existing AngularJS dashboard templates?

Pulsar Reporting UI is built to work directly with the Pulsar Reporting API, providing an end-to-end solution to visualize data and make it easy for the user to create reports according to their needs.

Q11: How to extend Dashboard template?

Dashboard templates are simple collections of columns with CSS classes attached to them. You can use Angular’s config step to call the prDashboardProvider.layout method to define new ones for yourself.

Q12: How to add a new data source, what is the step to inject data and show it on the dashboard?

New data sources can be added dynamically from Admin Dashboard. Data sources can be added through API call directly also. For testing purpose, users can choose to configure the data sources in properties files and then restart the API server. Please refer to Pulsar Reporting API User Guide for detail.

Q13: How to use MySQL as metadata storage?

1. Create database and initialize the database, please refer to mysql.sql under project pulsarquery-amdin
2. Add mysql-connector-java dependency to pom.xml of pulsarquery-admin module as well as parent pom.xml
3. Specify configuration under ${user.home}/pulsarquery-config.properites
    pulsarquery.db.driver=com.mysql.jdbc.Driver
    pulsarquery.db.url=jdbc:mysql://<mysql server>/<database>
    pulsarquery.db.user=<username>
    pulsarquery.db.password=<password>
    pulsarquery.db.table.name.prefix=<prefix of each table, empty if none>

Q14: How can I add new dimensions and/or metrics?

To add new dimensions, you can modify the EPLs in Event Distributor (a stage in Pulsar Pipeline) which flows data to Druid though Kafka. To add new metrics, you can modify both the EPLs and the Druid configurations (ingestion spec) to change the Druid aggregation method on the metrics.