Getting Started

In this Section, we describe how to set up a Saasfee installation. Afterwards you can continue with the programming tutorial or run an example workflow.

What is Saasfee?

The ScAlAble ScientiFic workflow Execution Engine (Saasfee) is a scientific workflow management system focusing on data parallelism and scalability on the one hand and re-usability of legacy code and workflows on the other hand. Saasfee encompasses the functional scientific workflow language Cuneiform as well as the Hi-WAY scientific workflow scheduler for Hadoop YARN .

stack_puzzle.png

Cuneiform allows the composition of data-parallel workflows that seamlessly integrate foreign code (e.g., Bash, Python, R, etc.). Cuneiform workflows can then be executed either locally (on possibly multiple cores) or on top of Apache Hadoop via Hi-WAY, which is also able to interpret legacy workflows from the Pegasus and Galaxy workflow systems.

Set up a Saasfee installation

Routines for installing Saasfee, including the Cuneiform workflow language, the Hi-WAY workflow scheduler for Hadoop YARN, and Apache Hadoop, are provided in the form of Chef cookbooks. These cookbooks can easily be configured and executed using the Chef orchestration toolkit Karamel.

The procedure for installing Saasfee on a single physical machine, a local cluster, or Amazon EC2 is straightforward. A Saasfee installation, which builds on top of Apache Hadoop, has a master and possibly several worker nodes, where the master can be a worker itself. While the Saasfee installation can be launched from a Linux, Mac, or Windows client, the target machines, i.e., the machines on which Saasfee is to be installed, require Ubuntu 12.04 or upwards.

To install Saasfee, simply download Karamel along with one of the YAML files provided below. Then, following the instructions on the Karamel website, launch the Karamel web UI, load the downloaded YAML file in Karamel and adjust the configuration based on your preferences (including, for instance, type and number of EC2 virtual machines, IPs of physical machines, user name, etc.). Finally, press the launch button to commence with the installation.

Install on a physical machine or cluster

When installing Saasfee on a physical (or manually set up virtual) machine or cluster (baremetal), use one of the YAML files listed here and set the IP addresses of the target machines in Karamel. Also, make sure that passwordless SSH from the client, on which Karamel is run, to the target machines is set up. Provide Karamel with the appropriate SSH keypair, its password, and the sudo password of the machines on which Karamel is to install Saasfee. Finally, using Karamel, you might have to adjust the values for memory provided to Hadoop by your worker nodes.

saasfee_standalone_baremetal.yml

All Saasfee baremetal download links:

A detailed video on how to install Saasfee on a local machine can be found below as well as in the Video Tutorials section.

Install on a virtual cluster in Amazon EC2

When installing Saasfee on a new virtual cluster in Amazon EC2, use one of the YAML files listed here and provide Karamel with your AWS secret access key.

saasfee_distributed_ec2.yml

All Saasfee EC2 download links:

A detailed video on how to install Saasfee on Amazon EC2 can be found below as well as in the Video Tutorials section.

Continue Reading

From here you may want to get started with programming Cuneiform in the Hello World section of the tutorial section. Alternatively you may opt to run one of the provided Example Workflows or have a look at the different Video Tutorials.