Serverless Computing With OSv

By: Nadav Har’El, Benoît Canet

Serverless computing, a.k.a. Function-as-a-Service

The traditional approach to implementing applications on the cloud is the IaaS (Infrastructure-as-a-Service) approach. In a IaaS cloud, application authors rent virtual machines and install their own software to run their application. However, when an application needs, for example, a database, the application writer often does not have the necessary expertise to choose the database, install it, configure and tweak it, and dynamically change the number of VMs running this database. This is where the “PaaS” (Platform-as-a-Service) cloud steps in: The PaaS cloud does not give application writers virtual machines, but rather a new platform with various services. One of these services can be a database service: The application makes database requests – could be one each second or a million each second – and does not have to care or worry whether one machine, or 1000 machines, are actually needed to provide this service. The cloud provider charges the application owner for these requests, and the amount of work they actually do.

But it is not enough that the PaaS cloud provides building blocks such as databases, queue services, object stores, and so on. An application also needs glue code combining all these building blocks into the operation which the application needs to do. So even on PaaS, application writers start virtual machines to run this glue code. Yes, again VMs and all the problems associated with them (installation, scaling, etc.). But recently, there is a trend towards a serverless PaaS cloud, where the application developer does not need to rent VMs. Instead, the cloud provides Function-as-a-Service (FaaS). FaaS implementations (such as Amazon Lambda, Google Cloud Functions or Microsoft Azure Functions), run short functions which the application author writes in high-level languages like Javascript or Java, in response to certain events. These functions in turn use the various PaaS services (such as database requests) to perform their job. The application author is freed from worrying how or where these functions are run – it is up to cloud implementation to ensure that whether one or a million of these functions need to run per second, they will get the necessary resources to do so.

Implementation, and why OSv is a winner

How could function-as-a-service be implemented by the cloud provider?

It is very inefficient to start a VM for every invocation of a function, which could last for a fraction of a second. A more reasonable approach is to start a VM running the runtime environment, e.g., Node.js or Java, and then send to it many different requests. But if we were to start a single instance of the runtime environment to run the functions of many different clients, this would carry significant security risks: An exploit found in the runtime implementation may lead to one application being able to view or modify the functions run by another application.

So instead of having one VM serve multiple applications of different clients, it is safer to start separate VMs for each application: A single VM will run multiple functions before shutting down, but all of these functions will be the same one, or at least belong to the same application. Having a VM dedicated to the application and its small set of functions also makes it more efficient to run these functions – this VM can load and compile the functions and relevant libraries once, before running the same function or functions many times. Having the VM dedicated to the client also makes it easier to charge the client by actual CPU usage and memory usage of the VMs started for him.

But the hard part of this implementation is scaling: When the number of functions being run by one application changes from second to second, we also need to change the number of VMs dedicated to running these functions. Leaving behind too many of these VMs as spares cost money as resources (especially memory) are being wasted. Moreover, in the event of cloud bursting – a sudden unexpected burst of requests, we may need to start many more VMs than we had previously. For these two reasons, it is very important that we are able to boot and shut down these function-running VMs as quickly as possible, preferably in a fraction of a second.

OSv, similar to other unikernels, boots and shuts down very quickly. But what makes OSv a better fit for this use case than any of the other unikernels is the fact that it can run unmodified Linux executables, and in particular the complex run-time environments and languages we wish FaaS to support, such as Node.js and Java, as well as user-provided native code.

A FaaS implementation using OSv might work as follows:

  1. When the FaaS needs to run a certain application’s function, if a VM belonging to this application is ready to accept more requests, we send it the request to run the function. Otherwise, when all the application’s VMs are busy, we start a new VM:
  2. Starting a new VM will take only a fraction of a second. Beyond OSv’s quick boot, another reason for this quickness is that the VM image will not have to be sent over the network: All these VMs, regardless of which application they work for, boot from the same identical image (containing OSv, Node.js or Java, and the FaaS glue), and the image is immutable – these VMs cannot write back to it. This immutable image also means that for this use case, OSv does not need the read-write ZFS file system, and that further reduces OSv’s boot time and memory overhead.
  3. To ensure that the end-user doesn’t experience even a fraction-of-a-second latency when a new VM is started, we may choose to preemptively start new VMs as soon as the existing VMs are about to get filled up, before they actually do get filled up. The fact we can start new VMs very quickly allows us to keep the number of spare VMs low.
  4. When the rate of function executions for a particular application diminishes, the FaaS system will stop sending new requests to some of the VMs, and very soon such VMs will become idle and can be shut down. OSv’s shutdown is very quick, but in this case we don’t even have to bother with a clean shutdown – we can stop an idle VM instantaneously because we know there is not even a disk needed to be flushed.

Existing FaaS implementations, like Amazon’s Lambda, charge the application for each function’s wall-clock run time (and in large 100ms ticks). Paying for idle time makes it very expensive to run functions which need to make a request, wait for its response, and do something with it. We’ve seen bloggers recommend working around this problem by tricks such as starting multiple unrelated requests in the same lambda and then waiting for all of them to respond. We believe, however, that FaaS needs to have more natural support for functions which block, which we believe will be the typical use of FaaS. This natural support could be done with Node.js’s futures and continuations (the application starts an asynchronous operation, and runs a non-blocking function when it completes. https://serverless.com does this on Amazon Lambda), or alternatively by the implementation transparently running multiple application functions in parallel on the same VM. In any case, the client should pay only for actual CPU time used by the function or VM bringup, as well as for the memory used by those VMs.

Note that although the FaaS implementation we propose is very scalable, at the low end of the scale – e.g., just one request each second – it is not cost-effective: It does not make sense to bring up the VM and the runtime environment each second, as a better part of that second will be wasted just for this bringup; The alternative is to leave the VM up but idle most of the time. In either case, the memory required by the runtime enviroment will be reserved for the application continuously, so the cost of this memory will put a lower limit on the price of low-usage function. Note that if the function’s usage becomes even lower – say just once a minute – it again becomes a cost-effective option to bring the VMs up and down each time.

Epilogue

We believe that the difficulties of running code on VMs will drive more and more application developers to look for alternatives for running their code, alternatives such as Function-as-a-Service (FaaS). We already explored this and related directions in the past in this paper from 2013.

We showed in this post that it makes sense to implement FaaS on top of VMs, and that OSv is a better fit for running those VMs than either Linux or other unikernels. That is because OSv has the unique combination of allowing very fast boot and instantaneous shutdowns, at the same time as being able to run the complex runtime environments we wish to support (such as Node.js and Java).

An OSv-based implementation of FaaS will support “cloud bursting” – an unexpected, sudden, increase of load on a single application, thanks to our ability to boot many new OSv VMs very quickly. Cloud bursting is one of the important use cases being considered by the MIKELANGELO project, a European H2020 research project which the authors of this post contribute to, and which is based on OSv as we previously announced.

NFS on OSv or “How I Learned to Stop Worrying About Memory Allocations and Love the Unikernel”

By Benoît Canet and Don Marti

A new type of OSv workload

The MIKELANGELO project aims to bring High Performance Computing (HPC) to the cloud. HPC traditionally involves bleeding edge technologies, including lots of CPU cores, Infiniband interconnects between nodes, MPI libraries for message passing, and, surprise—NFS, a very old timer of the UNIX universe.

In an HPC context this networked filesystem is used to get the data inside the compute node before doing the raw computation, and then to extract the data from the compute node.

Some OSv NFS requirements

For HPC NFS is a must, so we worked to make it happen. We had some key requirements:

  • The NFS driver must go reasonably fast
  • The implementation of the NFS driver must be done very quickly to meet the schedule of the rest of the MIKELANGELO project
  • There is no FUSE (Filesystem in User Space) implementation in OSv
  • OSv is a C++ unikernel, so the implementation must make full usage of its power
  • The implementation must use the OSv VFS (Virtual File System) layer, and so be transparent for the application

Considering alternatives

The first possibility that we can exclude right away is doing an NFS implementation from scratch. This subproject is simply too short on time.

The second possibility is to leverage an implementation from an existing mainstream kernel and simply port it to OSv. The pro would be code reuse, but this comes with a lot of cons.

  • Some implementation licenses do not match well with the unikernel concept where everything can be considered a derived work of the core kernel
  • Every operating system has its own flavor of VFS. The NFS subproject would be at risk of writing wrappers around another operating system’s VFS idiosyncrasies
  • Most mainstream kernel memory allocators are very specific and complex, which would leads to more insane wrappers.

The third possibility would be to use some userspace NFS implementation, as their code is usually straightforward POSIX and they provide a nice API designed to be embedded easily in another application. But wait! Didn’t we just say the implementation must be in the VFS, right in the middle of the OSv kernel? There is no FUSE on OSv.

Enter the Unikernel

Traditional UNIX-like operating system implementations are split in two:

  • Kernel space: a kernel doing the low level plumbing everyone else will use
  • User space: a bunch of applications using the facilities provided by the kernel in order to accomplish some tasks for the user

One twist of this split is that kernel space and user space memory addresses are totally separated by using the MMU (Memory Management Unit) hardware of the processor. It also usually implies two totally different sets of programing APIs, one for kernel space and one for user space, and needless to say a lot of memory copies each time some data must cross the frontier from kernel space to userspace.

A unikernel such as OSv is different. There is only one big address space and only one set of programing APIs. Therefore you can use POSIX and Linux userspace APIs right in an OSv driver. So no API wrappers to write and no memory copies.

Another straightforward consequence of this is that standard memory management functions including malloc(), posix_memalign(), free() and friends, will just work inside an OSv driver. There are no separate kernel-level functions for managing memory, so no memory allocator wrappers needed.

Meet libnfs

libnfs, by Ronnie Sahlberg, is a user space NFS implementation for Linux, designed to be embedded easilly in an application.

It’s already used in successful programs like Fabrice Bellard’s QEMU, and the author is an established open source developer who will not disappear in a snap.

Last by not last, the libnfs license is LGPL. So far so good.

The implementation phase

The implementation phase went fast for a networked filesystem. Reviewing went smoothly thanks to Nadav Har’El’s help, and the final post on the osv-devel mailing list was the following:

OSV nfs client

Some extra days were spent to fix the occasional bugs and polish the result and now the MIKELANGELO HPC developers have a working NFS client.

Some code highlights

Almost 1:1 mapping

Given the unikernel nature of OSv, an useful system call like truncate(), used to adjust the size of a file, boils down to

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
static int nfs_op_truncate(struct vnode *vp, off_t length)
{
   int err_no;
   auto nfs = get_nfs_context(vp, err_no);

   if (err_no) {
       return err_no;
   }

   int ret = nfs_truncate(nfs, get_node_name(vp), length);
   if (ret) {
       return -ret;
   }

   vp->v_size = length;

   return 0;
}

OSv allowed us to implement this syscall with a very thin shim without involving any additional memory allocation wrapper.

C++ empowers you to do powerful things in kernel code

One of the known limitation of libnfs is that it’s not thread-safe. See this mailing list posting on multithreading and preformance. OSv is threaded—so heavily threaded that there is no concept of a process in OSv, just threads. Clearly this is a problem, but OSv is written in modern C++, which provides us with modern tools.

This single line allows us to work around the libnfs single threaded limitation.

1
2
thread_local std::unordered_map<std::string,
                               std::unique_ptr<mount_context>> _map;

Here the code makes an associative map between the mount point (the place in the filesystem hierarchy where the remote filesystem appears) and the libnfs mount_context.

The one twist to notice here is thread_local: this single C++ keyword automatically makes a separate instance of this map per thread. The consequence is that every thread/mount point pair can have its own separate mount_context. Although an individual mount_context is not thread-safe, that is no longer an issue.

Conclusion

As we have seen here, the OSv unikernel is different in a lot of good ways, and allows you to write kernel code fast.

  • Standard POSIX functions just work in the kernel.
  • C++, which is not used in other kernels, comes with blessings.

Scylla will keep improving OSv with the various MIKELANGELO partners, and we should see exciting new hot technologies like vRDMA on OSv in the not so distant future.

The MIKELANGELO research project is a three-year research project sponsored by the European Commission’s Horizon 2020 program. The goal of MIKELANGELO is to make the cloud more useful for a wider range of applications, and in particular make it easier and faster to run high-performance computing (HPC) and I/O-intensive applications in the cloud. For project updates, visit the MIKELANGELO site, or subscribe to this blog’s RSS feed.

Project Mikelangelo Update

By Nadav Har’El

A year ago, we reported (see Researching the Future of the Cloud) that ScyllaDB and eight other industrial and academic partners started the MIKELANGELO research project. MIKELANGELO is a three-year research project sponsored by the European Commission’s Horizon 2020 program. The goal of MIKELANGELO is to make the cloud more useful for a wider range of applications, and in particular make it easier and faster to run high-performance computing (HPC) and I/O-intensive applications in the cloud.

company logos

Last week, representatives of all MIKELANGELO partners (see company logos above, and group photo below) met with the Horizon 2020 reviewers in Brussels to present the progress of the project during the last year. The reviewers were pleased with the project’s progress, and especially pointed out its technical innovations.

project participants group photo

Represented by Benoît Canet and yours truly, ScyllaDB presented Seastar, our new C++ framework for efficient yet complex server applications. We demonstrated the sort of amazing performance improvements which Seastar can bring, with ScyllaDB – our implementation of the familiar NoSQL database Apache Cassandra with the Seastar framework. In the specific use case we demonstrated, an equal mixture of reads and writes, ScyllaDB was 7 times faster (!) than Cassandra. And we didn’t even pick ScyllaDB’s best benchmark to demonstrate (we’d seen even better speedups in several other use cases). Seastar-based middleware applications such as ScyllaDB hold the promise of making it significantly easier and cheaper to deploy large-scale Web or Mobile applications in the cloud.

Another innovation that ScyllaDB brought to the MIKELANGELO project is OSv, our Linux-compatible kernel specially designed and optimized for running on cloud VMs. Several partners demonstrated running their applications on OSv. One of the cool use cases was aerodynamic simulations done by XLAB and Pipistrel. Pipistrel is a designer and manufacturer of innovative and award-winning light aircraft (like the one in the picture below), and running their CFD simulations on the cloud, using OSv VMs and various automation tools developed by XLAB, will significantly simplify their simulation workflow and make it easier for them to experiment with new aircraft designs.

Pipistrel aircraft photo

Other partners presented their own exciting developments: Huawei implemented RDMA virtualization for KVM, which allows an application spread across multiple VMs on multiple hosts to communicate using RDMA (remote direct-memory-access) hardware in the host. In a network-intensive benchmark, virtualized RDMA improved performance 5-fold. IBM presented improvements to their earlier ELVIS research, which allow varying the number of cores dedicated to servicing I/O, and achieve incredible amounts of I/O bandwidth in VMs. Ben-Gurion University security researchers implemented a scary “cache side-channel attack” where one VM can steal secret keys from another VM sharing the same host. Obviously their next research step will be stopping such attacks! Intel developed a telemetry framework called “snap” to collect and to analyse all sorts of measurements by all the different cloud components – VM operating systems, hypervisors, and individual applications. HLRS and GWDG, the super-computer centers of the universities of Stuttgart and Göttingen, respectively, built the clouds on which the other partners’ developments will be run, and brought in use cases of their own.

Like ScyllaDB, all partners in the MIKELANGELO project believe in openness, so all technologies mentioned above have already been released as open-source. We’re looking forward to the next year of the MIKELANGELO project, when all these exciting technologies will continue to improve separately, as well as be integrated together to form the better, faster, and more secure cloud of the future.

For more updates, follow the ScyllaDB blog.

Building OSv Images Using Docker

By David Jorm and Don Marti

Why build OSv images under Docker?

Building OSv from source has several advantages, including the ability to build images targeting different execution environments. The CloudRouter project is working on integrating the build script into its continuous integration system, automatically producing nightly rebuilds of all supported OSv application images.

The main problem with this approach is that it requires a system to be appropriately configured with all the necessary dependencies and source code to run builds. To build a scalable and reproducible continuous integration system, we really want to automate the provisioning of new build servers. An ideal way to achieve this goal is by creating a Docker image that includes all the necessary components to produce OSv image builds.

osv-builder: a Docker based build/development environment for OSv

The osv-builder Docker image provides a complete build and development environment for OSv, including OSv application images and appliances. It has been developed by Arun Babu Neelicattu from IIX. To download the image, run:

1
docker pull cloudrouter/osv-builder

OSv includes a range of helper scripts for building and running OSv images. The build script will compile OSv from the local source tree, then create a complete OSv image for a given application, based on the application’s Makefile. For more on scripts/build, see the recent blog post on the OSv build system.

Capstan

The image comes with Capstan pre-installed. Note that to use Capstan, you’ll have to run the container with the —privileged option, as it requires the KVM kernel module. For example, to build and run the iperf application:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ sudo docker run -it \
  --privileged \
  cloudrouter/osv-builder
bash-4.3# cd apps/iperf
bash-4.3# capstan build
Building iperf...
Downloading cloudius/osv-base/index.yaml...
154 B / 154 B [=================================================================================================================] 100.00 % 0
Downloading cloudius/osv-base/osv-base.qemu.gz...
20.09 MB / 20.09 MB [=======================================================================================================] 100.00 % 1m27s
Uploading files...
1 / 1 [=========================================================================================================================] 100.00 % bash-4.3# capstan run
Created instance: iperf
OSv v0.19
eth0: 192.168.122.15
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------

Launching an interactive session

1
2
3
4
HOST_BUILD_DIR=$(pwd)/build
docker run -it \
  --volume ${HOST_BUILD_DIR}:/osv/builder \
  cloudrouter/osv-builder

This will place you into the OSv source clone. You’ll see the prompt:

1
bash-4.3#

Now, you can work with it as you normally would when working on OSv source. You can build apps, edit build scripts, and so on. For example, you can run the following commands, once the above docker run commands has been executed, to build and run a tomcat appliance.

1
2
./scripts/build image=tomcat,httpserver
./scripts/run -V

The osv Command

Note that the commands you run can be prefixed with osv, the source for which is available at assets/osv. For example you can build by:

1
2
3
4
docker run \
  --volume ${HOST_BUILD_DIR}:/osv/build \
  osv-builder \
  osv build image=opendaylight

The osv script, by default, provides the following convenience wrappers:

Command Mapping
build args scripts/build
run args scripts/run.py
appliance name components description scripts/build-vm-img
clean make clean

If any other command is used, it is simply passed on as scripts/$CMD "$@" where $@ is the arguments following the command.

You could also run commands as:

1
2
3
4
docker run \
  --volume ${HOST_BUILD_DIR}:/osv/build \
  osv-builder \
  ./scripts/build image=opendaylight

Building appliance images

If using the pre-built version from Docker Hub, use cloudrouter/osv-builder instead of osv-builder.

1
2
3
4
5
HOST_BUILD_DIR=$(pwd)/build
docker run \
  --volume ${HOST_BUILD_DIR}:/osv/build \
  osv-builder \
  osv appliance zookeeper apache-zookeeper,cloud-init "Apache Zookeeper on OSv"

If everything goes well, the images should be available in ${HOST_BUILD_DIR}. This will contain appliance images for QEMU/KVM, Oracle VirtualBox, Google Compute Engine and VMWare Virtual Machine Disk.

Note that we explicitly disable the build of VMware ESXi images since ovftool is not available.

Building locally

As an alternative, you can build locally with a docker build command, using the Dockerfile for osv-builder.

1
docker build -t osv-builder .

Then you can use a plain osv-builder image name instead of cloudrouter/osv-builder.

For more information regarding OSv Appliances and pre-built ones, check the OSv virtual appliances page.

Volume Mapping

Volume Description
/osv This directory contains the OSv repository.
/osv/apps The OSv apps directory. Mount this if you are testing local applications.
/osv/build The OSv build directory containing release and standalone directories.
/osv/images The OSv image build configurations.

Sending OSv patches

If you’re following the Formatting and sending patches guide on the OSv web site, just copy your patches into the builder directory in the container, and they’ll show up under your $HOST_BUILD_DIR, ready to be sent to the mailing list.

Conclusion

With the osv-builder docker image, building your own OSv images is now easier than ever before. If you are looking for a high-performance operating system to run your applications, go ahead and give it a try!

Questions and comments welcome on the osv-dev mailing list.

About the authors

David is a product security engineer based in Brisbane, Australia. He currently leads product security efforts for IIX, a software-defined interconnection company. David has been involved in the security industry for the last 15 years. During this time he has found high-impact and novel flaws in dozens of major Java components. He has worked for Red Hat’s security team, led a Chinese startup that failed miserably, and wrote the core aviation meteorology system for the southern hemisphere. In his spare time he tries to stop his two Dachshunds from taking over the house.

Don is a technical marketing manager for Cloudius Systems, the OSv company. He has written for Linux Weekly News, Linux Journal, and other publications. He co-founded the Linux consulting firm Electric Lichen, which was acquired by VA Linux Systems. Don has served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for Uselinux, Codecon, and LinuxWorld Conference and Expo.

Re-“make”-ing OSv

OSv 0.19 is out, with a rewrite of the build system. The old OSv build system was fairly complex, but the rewrite makes it simpler and faster.

Simpler Makefile

The old OSv build system had several makefiles including each other, playing tricks with the current directory and VPATH, dynamically rewriting makefiles, and running submakes.

The old Makefile was responsible not only for building the kernel, it also built tests, called various Python scripts to build modules for different applications, and carried out other tasks.

In the new build system, there is just one “Makefile” for building the entire OSv kernel. Everything is in one file, and also better commented.

Separate kernel building from application building

In the old build system, we used “make” to do everything from building the OSv kernel, building various applications, and building images containing OSv and a collection of applications. This complicated the Makefile, and resulted in unexpected build requirements. For example, building OSv always built some Java tests and thus required Maven and a working Internet connection).

In the new system, make only builds the OSv kernel, and scripts/build build applications and images. In the future, you could use Capstan instead of scripts/build to make an image that you would like to manage with Capstan.

Most make command lines that worked in the previous build system will continue to work unchanged with scripts/build. For example:

# build image with default OSv application
scripts/build

# build the rogue image
scripts/build image=rogue
# or
scripts/build modules=rogue

# clean kernel and all modules
scripts/build clean

# make parameters can also be given to build
scripts/build mode=debug

# build image with tests, and run them
scripts/build check

 

Additional benefits of this rewrite include:

  1. Faster rebuilds. For example “touch loader.cc; scripts/build image=rogue” takes just 6 seconds (14 seconds previously). make after “make clean” (with ccache) is just 10 seconds (30 seconds previously).

  2. It should be fairly easy to add additional build scripts which will build different types of images using the same OSv kernel. One popularly requested option is to have the ability to create a bootfs-only image, without ZFS.

  3. Some smaller improvements, like more accurate setting of the desired image size (covered in issue #595), and supporting setting CROSS_PREFIX without also needing to specify ARCH.

What happened to make test?

The OSv test are a module like other modules – they won’t be compiled unless someone builds the tests module. They will have a separate Makefile in tests/. The ant and mvn tools, both currently used in our makefile just for building tests, will no longer be run every time the kernel is compiled, but just when the “tests” module is being built. To build and run the tests:

scripts/build check

Try it out

The good news is that now, running make is faster, and the image build process is simpler and easier to extend. Check it out — questions and comments welcome on the osv-dev mailing list.

Wiki Watch: CloudRouter Images Available

Interested in running the CloudRouter VMs that we covered in a blog post earlier this week? Details are available on the CloudRouter wiki. (Pre-built images are available on the CloudRouter site, or you can build one yourself with Capstan.)

Questions on these or any other images are welcome, on the osv-dev mailing list. For more info on working with the CloudRouter project, see the CloudRouter community page.

For future updates, please follow @CloudiusSystems on Twitter.

Software-defined Interconnection: The Future of Internet Peering, Powered by OpenDaylight and OSv

By David Jorm and Don Marti

Most users are aware of cloud computing as a general term behind such trends as “Software as a Service,” where sites such as Salesforce.com can replace software run by a company IT department, or “Infrastructure as a Service” where virtual machines rented by the hour can replace conventional servers. But today, the technologies behind the cloud are changing the way that we connect the Internet at the most fundamental level, through Software Defined Interconnection (SDI).

What is SDI? A lot of manual work goes into hooking up the Internet between providers. The routers that send Internet traffic from one place to another can be configured to use “paid transit”, where a single provider will route packets to any destination. But the more Internet traffic you’re responsible for, the more you can benefit from another arrangement, called “direct interconnection” where you set up your company’s routers to directly connect to another company’s. Most networks will always need to buy transit from somebody; the best you can hope for is that a portion of your traffic bypasses the transit provider and is directly delivered to the destination. Maximizing the amount of traffic that is directly peered leads to better performance, lower latency, lower packet loss, and greater security.

Today, most direct interconnection is typically set up manually, with a physical fiber cable connecting one organization’s network to another. Agreements to interconnect and peer are also reached manually, typically via email or face-to-face at peering conferences. When agreement is reached, network admins must ssh in to routers in order to manually configure such peering. It’s not efficient or scalable, and depends on individuals or select groups.

Once an organization has agreed that they want to directly connect with another organization, how do you handle changes to router and switch configuration? Probably the same way you used to manage your httpd.conf back in the 1990s! Network managers ssh in, and update config manually. Some networks have sophisticated management tools, but for many, “the state of the router is the canonical state.”

Software-defined interconnection

Software-defined interconnection, under test in the IIX lab

SDI aims to improve all that. The OpenDaylight project is a common platform for network management that facilitates breaking traditional network devices such as switches and routers into separate “data plane” devices that handle high traffic volume and “control plane” devices that do management. Because the control plane device, or software-defined networking (SDN) controller, does not have the extreme throughput requirements of the data plane, it’s easy to virtualize.

In the lab today IIX is currently prototyping this next generation of devices in the lab, while more traditional network gear runs in production. The prototype system uses an OpenFlow switch for data plane, and a separate OpenDaylight server for control plane. Switches rely on the SDN controller. In the event of an unknown packet, they forward it to the controller.

No configuration changes are needed on the data plane hardware, only on the SDN controller, which can be a virtual machine. OpenDaylight manages both layer 2, switching, and layer 3, routing, and the same OpenDaylight APIs can be used to change configuration at both levels.

OpenDaylight is a pure Java application. It only requires the ability to run a JVM on the virtual machine. For security and ease of management, it can be advantageous to run an individual controller per customer. This means a lightweight, easy-to-manage guest OS is a big advantage. With OSv, IIX can deploy identical simple VMs for each customer, and the OpenDaylight APIs can be used to configure each one appropriately.

OSv’s high performance and low overhead allows for high density of VMs on standard physical hardware. And any compromise or configuration error should only affect one customer, because strong isolation is provided by a standard hypervisor, without the complex security model of containerization.

Conclusion While Internet applications have gained from cloud technologies, the fundamental lower layers are still coming up to speed. OpenDaylight and OSv are bringing cloud economics to the lower levels of the stack.

About the authors

David is a product security engineer based in Brisbane, Australia. He currently leads product security efforts for IIX, a software-defined interconnection company. David has been involved in the security industry for the last 15 years. During this time he has found high-impact and novel flaws in dozens of major Java components. He has worked for Red Hat’s security team, led a Chinese startup that failed miserably, and wrote the core aviation meteorology system for the southern hemisphere. In his spare time he tries to stop his two Dachshunds from taking over the house.

Don is a technical marketing manager for Cloudius Systems, the OSv company. He has written for Linux Weekly News, Linux Journal, and other publications. He co-founded the Linux consulting firm Electric Lichen, which was acquired by VA Linux Systems. Don has served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for Uselinux, Codecon, and LinuxWorld Conference and Expo.

Seastar: New C++ Framework for Web-scale Workloads

Today, we are releasing Seastar, a new open-source C++ framework for extreme high-performance applications on OSv and Linux. Seastar brings a 5x throughput improvement to web-scale workloads, at millions of transactions per second on a single server, and is optimized for modern physical and virtual hardware.

seastar Memcache graph

Benchmark results are available from the new Seastar project site.

Today’s server hardware is substantially different from the machines for which today’s server software was written. Multi-core design and complex caching now require us to make new assumptions to get good performance. And today’s more complex workloads, where many microservices interact to fulfil a single user request, are driving down the latencies required at all layers of the stack. On new hardware, the performance of standard workloads depends more on locking and coordination across cores than on performance of an individual core. And the full-featured network stack of a conventional OS can also use a majority of a server’s CPU cycles.

Seastar reaches linear scalability, as a function of core count, by taking a shard-per-core approach. SeaStar tasks do not depend on synchronous data exchange with other cores which is usually implemented by compare-exchange and similar locking schemes. Instead, each core owns its resources (RAM, NIC queue, CPU) and exchanges async messages with remote cores. Seastar includes its own user-space network stack, which runs on top of Data Plane Development Kit (DPDK). All network communications can take place without system calls, and no data copying ever occurs. SeaStar is event-driven and supports writing non-blocking, asynchronous server code in a straightforward manner that facilitates debugging and reasoning about performance.

Seastar is currently focused on high-throughput, low-latency network applications. For example, it is useful for NoSQL servers, for data caches such as memcached, and for high-performance HTTP serving. Seastar is available today, under the Apache license version 2.0.

Please follow @CloudiusSystems on Twitter for updates.

Researching the Future of the Cloud

By Nadav Har’El

What will the IaaS cloud of the future look like? How can we improve the hypervisor to reduce the overhead it adds to virtual machines? How can we improve the operating system on each VM to make it faster, smaller, and more agile? How do we write applications that run more efficiently and conveniently on the modern cloud? How can we run on the cloud applications which traditionally required specialized hardware, such as supercomputers?

Project Mikelangelo

Cloudius Systems, together with eight leading industry and university partners, announced this month the Mikelangelo research project, which sets out to answer exactly these questions. Mikelangelo is funded by the European Union’s flagship research program, “Horizon 2020”.

Cloudius Systems brings to this project two significant technologies:

The first is OSv, our efficient and light-weight operating-system kernel optimized especially for VMs in the cloud. OSv can run existing Linux applications, but often with significantly improved performance and lower memory and disk footprint.

Our second contribution to the cloud of the future is Seastar, a new framework for writing complex asynchronous applications while achieving optimal performance on modern machines. Seastar could be used to write the building blocks of modern user-facing cloud applications, such as HTTP servers, object caches and NoSQL databases, with staggering performance: Our prototype implementations already showed a 4-fold increase in server throughput compared to the commonly used alternatives, and linear scalability of performance on machines with up to 32 cores.

The other companies which joined us in the Mikelangelo project are an exciting bunch, and include some ground-breaking European (and global) cloud researchers and practicioners:

 • Huawei

 • IBM

 • Intel

 • The University of Stuttgart’s supercomputing center (HLRS)

 • The University of Goettingen’s computing center (GWDG)

 • Ben-Gurion University

 • XLAB, the coordinator of the project

 • Pipistrel, a light aircraft manufacturer

Pipistrel’s intended use case, of moving HPC jobs to the cloud, is particularly interesting. Pipistrel is an innovative manufacturer of light aircraft that holds several cool world records, and won NASA’s 2011 “Green Flight Challenge” by building an all-electric airplane achieving the equivalent of 400 miles per gallon per passenger. The aircraft design process involves numerous heavy numerical simulations. If a typical run requires 100 machines for two hours, running it on the cloud means they would not need to own 100 machines, and rather just pay for the computer time they use. Moreover, on the cloud they could just as easily deploy 200 machines, and finish the job in half the time, for exactly the same price!

Last week, researchers from all these partners met to kick off the project, and also enjoyed a visit to Ljubljana which, as its name implies, is a lovely city. The project will span 3 years, but we expect to see some encouraging results from the project—and from the individual partners comprising it—very soon. The future of the cloud looks very bright!

Visit The Mikelangelo Project’s official site for updates.

Watch this space (feed), or follow @CloudiusSystems on Twitter, for more links to research in progress.

Unikernel Research at the University of Utah

Ian Briggs, Matt Day, Eric Eide, Yuankai Guo, and Peter Marheine are conducting performance research on unikernels, and have thoughtfully posted some preliminary work on OSv performance.

The team tested OSv for DNS and HTTP, and got some encouraging results.

HTTP server comparison

The lighttpd web server on OSv performs consistently well up through 5000 requests/second. And on DNS tests, Linux can sustain a response rate of about 19000 per second, while OSv can handle approximately 28000 requests per second, with slightly lower latency.

The preliminary paper is Performance Evaluation of OSv for Server Applications (local copy).

The researchers did run into a bug running OSv on Xen, so we’re all looking forward to helping them track that down on the osv-dev mailing list. In the meantime, watch this space, or follow @CloudiusSystems on Twitter, for more links to OS research in progress.