Apache Zeppelin
Apache Zeppelin is a web-based notebook platform that enables interactive data analytics with interactive data visualizations and notebook sharing. You can make data-driven, interactive and collaborative documents with SQL, Scala, Python, R in a single notebook document.
It shares a single SparkContext
between languages (so you can pass data between Scala and Python easily).
This is an excellent tool for prototyping Scala/Spark code with SQL queries to analyze data (by data visualizations) that could be used by non-Scala developers like data analysts using SQL and Python.
Note
|
Zeppelin aims at more analytics and business people (not necessarily for Spark/Scala developers for whom Spark Notebook may appear a better fit). |
Clients talk to the Zeppelin Server using HTTP REST or Websocket endpoints.
Available basic and advanced display systems:
-
text (default)
-
HTML
-
table
-
Angular
Features
-
Apache License 2.0 licensed
-
Interactive
-
Web-Based
-
Data Visualization (charts)
-
Collaboration by Sharing Notebooks and Paragraphs
-
Multiple Language and Data Processing Backends called Interpreters, including the built-in Apache Spark integration, Apache Flink, Apache Hive, Apache Cassandra, Apache Tajo, Apache Phoenix, Apache Ignite, Apache Geode
-
Display Systems
-
Built-in Scheduler to run notebooks with cron expression
Further reading or watching
-
(video) Data Science Lifecycle with Zeppelin and Spark from Spark Summit Europe 2015 with the creator of the Apache Zeppelin project — Moon soo Lee.