Raspberry Pi at Amazon Re:invent

Want to see us build a virtual machine for the Amazon cloud at the re:Invent show? Come on over to our booth (K14) and we’ll show you. It’s simple to add a virtual appliance build step to any application build.

Well, that took nine seconds. What else can we do? How about a hardware giveaway? Since your virtual machines are going to take so little of your time, you should have plenty of time to play with a fun hardware kit.

Raspberry Pi media kit

We’ve got some Raspberry Pi media kits to give away. Not just the board, this kit includes

  • Raspberry Pi Model B+ Board

  • 8GB Operating System microSD Card

  • Multicomp Black B+ Case

  • Raspberry Pi Power Supply

  • Wi-Pi Wireless Adapter

  • 3’ HDMi Cable

  • 7’ Ethernet Cable

Just the thing for a home media center, or a starting point for a more ambitious project such as a vintage arcade.

Amazon re:Invent is sold out, but you can sign up for streaming if you’re not able to make it.

You can keep up with the latest OSv news from this blog’s feed, or folllowing @CloudiusSystems on Twitter. Hope to see you at the show.

OSv 0.14 Alpha Includes Management Dashboard

By Tzach Livyatan

We are pleased to announce the OSv 0.14 Alpha Release.

This is the first release to include the dashboard, a browser-based UI for OSv and virtual appliance status.

OSv dashboard

The dashboard includes (among other features):

  • Main tab with Memory, CPU, Disk

  • Thread tab with interactive visualization of thread status and thread usage. You can select and search for particular thread by name.

  • Trace tab, including frequency for each tracepoint. Selection by name and search are available.

  • JVM tab for basic JVM information including memory and GC, when a JVM is installed.

  • Virtual Appliance tab for information relevant to the particular appliance. Tabs for Cassandra and Tomcat are currently available, with more to come.

To build the dashboard, just includes httpserver in the module list. For example:

    $ make -j 4 image=cassandra,httpserver

Once the appliance is running, the dashboard is available on port 8000 by default. (You can change the port using cloud-init.)

The full 0.14 release announcement is available on the osv-dev mailing list. You can keep up with the latest OSv news from this blog’s feed, or folllowing @CloudiusSystems on Twitter.

A Cloud Wake-up Call

By Dor Laor

If you use AWS or Rackspace, there is a good chance that you got affected by cloud-reboot. Ten percent of AWS machines were forced to reboot during the weekend period due to a simple bug that created a security vulnerability. The reboot could have been prevented/mitigated through the use of sophisticated but handy tools. Such tools have existed for years but few people use them.

Let’s take a closer look at the particular problem and proceed toward a call for action for usage of additional, fantastic low-level features that are hardly being used by IaaS/PaaS vendors.

The cloud-reboot trigger is a tiny off-by-12kb xen hypervisor bug. A simple Model Specific Register check had a wrong limit as you can observe in the fix for the security vulnerability:

-    case MSR_IA32_APICBASE_MSR ... MSR_IA32_APICBASE_MSR + 0x3ff:
+    case MSR_IA32_APICBASE_MSR ... MSR_IA32_APICBASE_MSR + 0xff:

The vulnerability allows an attacker to either crash the hypervisor or retrieve data about other tenants. It’s really a long shot since the memory probably belongs to Xen addresses but in theory one can get lucky and read someone else’s passwords/keys.

Kudos to Amazon and Rackspace for being on the safe side. The fix is a huge hassle and pain for such a small chance of being successfully targeted. Kudos for Xen (and the other hypervisor vendors) for developing mature hypervisors that these events are that rare.

Now how could an IaaS vendor mitigate the problem without a reboot?

Option #1 - Dynamic Code Patching

Dynamic code patch technology for running-code has been available for years. Initially it was KSplice, and that was recently followed by KPatch. Lean cloud providers reported KSplice deployment for Xen 4 years ago!

Cloud provider announces Ksplice support, on Twitter

XSA-108, the cloud-reboot bug, could have been the perfect candidate for this. Hold your horses; Ksplice probably hadn’t been integrated to Xen and Ksplice is only applicable to dom-0. However, didn’t they see it coming? Now 10% of the Internet needs a reboot because no one picked it up. Let’s see whether a quick developer group will come to the rescue. Ouch.

Option #2 - Live migration

Look ma, no hands…live migrate the VMs from an old hypervisor version to a patched hypervisor without service interruption.

At Red Hat, I managed the KVM and Xen development teams. We were heavily invested in live migration development. A great deal of thought was given to cross-release migration, resulting in the ability to migrate a VM running on KVM version x to version x+y. Sometimes even the opposite direction was allowed. We maintained a huge matrix of migration options which included the preservation of the virtual-hardware version. This means that a KVM hypervisor can represent a variety of virtual-hardware versions (combo of cpu+devices) and keep the ABI (Application Binary Interface) compatible across KVM releases and live migration events.

Live migration was constantly optimized to reduce the effect on the running-workloads as well as to minimize downtime to few msecs. Smart compression, hot-page-transfer prioritization, and even more adventurous post-copy migration were deployed.

To my surprise, several years and millions of hypervisors later, most cloud providers do not implement live migration. That’s rather unfortunate for a couple of reasons:

  • Live migration allows maintenance mode. The host can be taken down while the VMs are being migrated to a different host

  • Dynamic load balancing It’s possible to over-provision resources such as cpu, network, memory, etc in order to increase virtual server density. In case of load, live migrate VMs balance the host resources. Over-provisioning can reduce the cloud-bills dramatically; for a theoretical example, check the cost of a t2-micro instance.

A leading cloud provider does use live migration, mainly because it uses shared storage for the VMs and the migration is just about the VM RAM. Other IaaS vendors use local storage but the ‘excuse’ does not hold since for long it is possible to live migration local storage too . Sophisticated scenarios are supported; for example, a VM template image can reside on shared storage. There is no need to copy the image to the local disk when the VM is provisioned. Instead, the VM starts execution locally while its disk is remote. On the fly the disk requests are served from the network while a background task transfers the entire disk to the local hypervisor. In a similar way, live migration of a VM with local storage can takes place.

Even open source projects such as Openstack and CloudFoundry do not support live migration. After all the time and effort invested in capturing the state of the virtual machine hardware state, it’s pretty sad that the feature isn’t enabled in practice and only data center solutions like vCenter and RHEV support it. Just to finalize this rant, please allow me to enclose the type of data a live migration captures:

  • Complete configuration of the virtual hardware setup
  • State of all CPU registers (General purpose, FPU, SIMD, MSRs,..)
  • State of the interrupt controllers
  • State of the disk drive (Registers, in-flight IO, interrupts)
  • State of the network cards (Registers, in-flight IO, interrupts)
  • State of all other devices - keyboard, mouse, USB, GPU, etc

Modern hypervisors manage to deal with the above complexity and send GBs of data underneath the guest execution. In turn, the cloud management software needs only to find a target host and evacuate the source host (in the case of hardware/software maintenance or a bit more sophisticated for load balancing needs). This is a fair deal, now please, go implement it.

<wake up call continues>

Since I started with two important OS features that aren’t implemented (dynamic patching and live migration), let me add to the list the following:

  • Hot (un) plug of memory and cpu This is a pure scale-up scenario. You start a small VM and if there is a need, add virtual CPUs and/or memory to the mix. Most OS’s and hypervisors support it. Imagine you run a c3.8xlarge during the day, and at night you unplug resources to form a c3.large VM which costs 1/16 as much.

Imagine you’re running a JVM application that needs an immediate garbage collection (GC). Today, the application will experience a Stop-The-World phase which will translate into downtime that can go up to several seconds (a function of heap size). Instead, such a VM can ask to hot plug additional RAM and CPUs 1 second before it really needs to pause. The JVM may use even a silly copy garbage collector to copy the live objects from the original RAM block to newer hotplugged-RAM blocks and unplug the old block entirely (using the extra vCPUs to accelerate the action).

  • Trusted boot/computing Trusted Computing is a technology to keep the integrity of an operating system, which is based on a secure chip such as “TPM (Trusted Platform Module)” and/or Intel’s TXT technology: Trusted Execution Technology provides a hardware based root of trust to ensure that a platform boots with a known good configuration of firmware, BIOS, virtual machine monitor, and operating system, forming a fully signed and secure stack.

  • Fast VM provision time OSv boots in under 1 second! However it takes significantly more time to provision a VM. If the hypervisor and the OS can boot that fast, I see no reason for the hypervisor management code to be slower.

</wake up call continues>

Enough rants for one day, now let’s get back to #OSv and make it shine some more.

For more info on OSv, please follow @CloudiusSystems on Twitter.

Merge the OS Into Your App, Not the Other Way Around!

By Glauber Costa

Don Marti and I will soon be presenting OSv at JavaOne. We are excited about the conference, and thought we could give you a small preview of what we will be talking about.

As you already know, OSv is at the same time an operating system, and a library. It is as functional as an operating system, and as invisible as a library. This means that you shouldn’t really install your app into OSv, but rather, merge them both naturally as a single entity.

If you are using Java, there is a huge chance that one way or another, your project is built through Apache Ant, or similar tool.

If you have Capstan, our image building tool, that merges nicely and beautifully with your Ant-based build process. Here is how:

<property name="hypervisor" value="qemu"/>
<basename property="vm-name" file="${basedir}"/>
<property name="capstanpath"
   value="${user.home}/.capstan/repository/${vm-name}"/>

<target name="vm" depends="jar">
<echo file="Capstanfile" append="false">
base:
   cloudius/osv-openjdk
cmdline:
  /java.so -jar ${jarname}
files:
  /${jarname}: build/jar/${jarname}</echo>
<exec executable="capstan">
    <arg value="build"/>
    <arg value="-p"/>
    <arg value="${hypervisor}"/>
</exec>
<copy tofile="HelloWorld.${hypervisor}"
   file="${capstanpath}/${vm-name}.${hypervisor}"/>
<delete file="Capstanfile"/>
</target>

The snippet above assumes that you already have a “jar” target in your build.

As you can guess from the image name, this example is a simple HelloWorld. The code, together with the complete build.xml file, can be downloaded on GitHub.

Adding that step to your existing build environment, allows you to have a first-class VM in a format consumable by QEMU/KVM. Booting in your hypervisor of choice, becomes just a matter of editing the file for the correct format.

And after that? Fire up Capstan and enjoy your VM!

[glauber@localhost JavaOne]$ capstan run JavaOne
Created instance: JavaOne
OSv v0.12
eth0: 192.168.122.15
Hello World

Alternatively, boot the HelloWorld.qcow2 image that was copied to the local directory, with any tool you want.

More info

Are you attending JavaOne? We will show you this and more, at Parc 55 - Powell I/II, Tuesday the 30th (12:30 PM)

For general questions on OSv, please join the osv-dev mailing list. You can get general updates by subscribing to this blog’s feed, or folllowing @CloudiusSystems on Twitter.

OSv at JavaOne

speaking at JavaOne

Glauber Costa and Don Marti are speaking at the JavaOne conference in San Francisco, on Tuesday, September 30. Hope to see you there.

OSv: The Operating System Designed for Java and the Cloud

Here’s what you’re in for:

A lot of the cloud discussion centers around which hypervisors are the best and which management tools will simplify our life the most. But is it the whole story? When problems are addressed from the lower and higher layers, the middleman—the guest operating system—is usually left behind. This changes with OSv, an operating system designed from the ground up to run Java applications in the cloud. OSv is a library OS. Therefore, you can think of using it as being a way to boot a JVM directly into the cloud. Forget OS management: it’s your application and the end of the story. Besides explaining the architecture, this presentation demonstrates how designing an OS with Java in mind can benefit the Java ecosystem.

We’ll demonstrate a nine-second build of a VM from a Java project. Hope to see you there.

You can keep up with the latest OSv news from this blog’s feed, or folllowing @CloudiusSystems on Twitter.

Shellshock

By Tzach Livyatan

A new bash bug is ‘bigger than Heartbleed’ and puts millions of websites. In short, Shellshock can take advantage of any server which call Bash. You can find a good insight into Shellshock on Michal Zalewski’s blog.

A Bash patch is already available, but there is a bigger question: Why do you want Bash on a production server in the first place? By its nature Bash is a dangerous beast, wouldn’t it be better to keep him on the cage and off your system?

What did you say?

“I need my Bash for troubleshooting?”

Do you now?

I assume your production server already writes logs and send traps to to a remote machine. If not, you probably do not have many production servers. What if in addition you had a secure remote REST API which allows you to probe files, get traces and any other information element you need? Do you still need Bash? And if you don’t, than wouldn’t it be better not to have it on the first place?

Don Marti writes that the need for fast, reliable VM builds is the important lesson from this bug, but I disagree. Why not just remove the shell from the server?

OSv takes a different approach from other OSs on the subject. Recognizing that most cloud servers only run one application, it is designed to run one and only one process. Every interaction with OSv is done via a set of REST APIs, over SSL. You can find the current list of supported endpoints on the OSv site. Since fork is inherently not allowed, there is not way for a Shellshock-like bug to exist. Sure, bugs in OSv may still lead to code injection via the API, but the surface of attack is much smaller, and dangerous APIs can be easily disabled. OSv still supports a CLI, but its run outside the OS, and administrators can use the secure API to access it, just like everybody else.

More info

For general questions on OSv, please join the osv-dev mailing list. You can get general updates by subscribing to this blog’s feed, or folllowing @CloudiusSystems on Twitter.

Waking Up Late, After Bash Fixing Night

By Don Marti

Yesterday we found out about a remotely exploitable hole in bash from our favorite Linux news sites. For some of us, our schedules on the night of September 24th were disrupted, and not in a good way.

It’s true that my own Internet-facing code, while not perfect, isn’t vulnerable. But for one older web-based application I have to deal with, it’s faster to install a new version of Bash than to trace through the code to make sure it isn’t doing something bad, somewhere.

Clearly this particular bug isn’t a problem on OSv, because OSv doesn’t run Bash. The whole point of Bash is to fork() and exec() other processes, and OSv doesn’t do all that. Everything runs in one process, with no shell available or needed.

As Tzach Livyatan points out, managing OSv doesn’t require a shell either. There’s one REST API for everything, from VM basics such as CPU and memory usage, up to JMX data from the application.

But it looks like Tzach is missing the main point.

The problem isn’t so much that someone discovered a bug in Bash. The problem is what happened to the evening of September 24, 2014? A software bug should be something that you fix, test, check in the fix, and go home, not a full night on duty.

I’m starting to think that what’s more important than any design advantages of OSv is the flow that it enables. The size and, more important, simplicity of an OSv VM means that regenerating one is a matter of, let me time it… 9 seconds. An OSv VM is a build artifact that I can crank out of my regular build system.

It would be full of security hubris to say that OSv will never have to issue a security fix. Yes, there are many fewer lines of code, and yes, the C++ experts on the development team will point to shorter, clearer programming constructs in which fewer old-school bugs can hide. But every software project has to issue a fix sometimes.

The question is how long it takes to get current and put the bug behind you.

Repeatable flow, from commit to deploy

At JavaOne next week, Glauber Costa and I will be speaking about OSv: The Operating System Designed for Java and the Cloud. Glauber summed it up: OSv is a library OS. Therefore, you can think of using it as being a way to boot a JVM directly into the cloud. Forget OS management: it’s your application and the end of the story.

The complexity of maintaining conventional OS environments looks like just a time-suck for developers, not a big problem. But simplicity matters on a Big Security Day.

More info

For general questions on OSv, please join the osv-dev mailing list. You can get general updates by subscribing to this blog’s feed, or folllowing @CloudiusSystems on Twitter.

Adding a NewRelic Agent to Your OSv Appliance

By Tzach Livyatan

New Relic is a popular real-time monitoring service for Web and mobile applications.

In the following post I will describe how to add a New Relic monitoring agent to your OSv virtual appliance, using Tomcat as an example.

As first step, go to the New Relic web site and log in or open an account. Following the instructions on the site, you should be prompted to download two files:

  • newrelic.yml
  • newrelic.jar

newrelic.yml should already have the your license key in it. If you downloaded the file directly, you should make sure to edit the license line. Make sure to update your application name in the same file. This name will be used in the New Relic GUI.

There are two ways to build an OSv appliance:

  1. Using an OSv build from source
  2. Using Capstan

The first requires cloning OSv source code with Git, as described here. The second assumes you are familiar with Capstan and is described below.

Using Capstan to add a NewRelic Agent

  • Create a new project directory
mkdir my-tomcat-with-newrelic
cd my-tomcat-with-newrelic
  • Copy newrelic.jar and newrelic.yml to this location
  • Create a new Capstanfile with the following contents:
base: cloudius/osv-tomcat

cmdline: >
  /java.so
  -javaagent:/tools/newrelic.jar
  -cp /usr/tomcat/bin/bootstrap.jar:/usr/tomcat/bin/tomcat-juli.jar
  -Djava.util.logging.config.file=/usr/tomcat/conf/logging.properties
  -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
  -Dcatalina.base=/usr/tomcat
  -Dcatalina.home=/usr/tomcat
  -Djava.io.tmpdir=/usr/tomcat/temp
  org.apache.catalina.startup.Bootstrap
  start

files:
  /tools/newrelic.jar: newrelic.jar
  /tools/newrelic.yml: newrelic.yml

The base OSv image is tomcat, the cmdline include both Tomcat and New Relic options, and the files are the two New Relic files: the JAR and the configuration file.

  • build the image
capstan build

You are done! you now have a ready VM with Tomcat and a New Relic agent. To run the image locally:

capstan run -n bridge
  • Go to the New Relic web app, monitor your application, and give yourself a pat on the shoulder :)

You can keep up with the latest OSv news from this blog’s feed, or by following @CloudiusSystems on Twitter. Questions always welcome on the osv-dev mailing list.

OSv Meetup Group: Convert Your Application to a Run-anywhere VM

By Don Marti

Fast is better than slow. Simple is better than complex. And easily building a run-anywhere VM from a Java application is better than trying to figure out the whole OS business.

OSv is the simple, fast OS platform designed to run one application in the cloud, without the complexity of old-school OSs with their local users, permissions, and sevices.

Who: You

What: Hands-on OSv hacking sesssion

Where: San Francisco, California, USA (see link for address)

When: 3pm-8pm Wednesday, September 24, 2014 (drop in any time)

Why: Get your application running in the cloud without the legacy OS complexity.

OSv now includes the Jolokia JMX-via-JSON-REST connector, providing full read/write access to the entire set of Java manageability attributes and operations. Now you no longer need to set up and secure separate JMX-over-RMI connectivity with your Java application to fully manage it. (details here).

Please join us for the next meeting of the OSv Meetup group in San Francisco, and learn to do a quick, three-second build of a first-class OSv VM that will run on your platform of choice, whether it’s VMware, VirtualBox, Amazon, Google, or KVM.

attendees

Bring your laptop and your favorite Java project, or just follow along. We’ll conclude with food and a few lightning talks and demos.

Our hosts at OhmData are once again making their cool South of Market office space available. If you’re not already a member of the OSv Meetup group, please join us for this hands-on session. For general questions on OSv, please join the osv-dev mailing list. You can get general updates by subscribing to this blog’s feed, or folllowing @CloudiusSystems on Twitter.

New Dashboard Tab for Insight on Cassandra Virtual Appliances

By Tzach Livyatan

We are constantly looking for ways to improve the OSv virtual appliance experience. The latest improvement is an integrated dashboard, presenting a combination of:

  • OS related metrics (CPU, memory, threads, …)
  • Profiling related metrics (trace points)
  • JVM related metrics (heap, GC, …)
  • Cassandra related metrics (latency, tasks, cluster status)

tab

The dashboard is yet another example of using REST API to monitor and control OSv and OSv virtual appliances. These REST APIs are open for the user directly. In particular, the new Capstan tab takes advantage of the newly added Jolokia connector, exposing JMX information over REST. The OSv REST API makes it simple to manage your OSv virtual appliance with curl(1) or your own script.

There are other great tools out there for JVM monitoring and profiling, like VisualVM. The OSv dashboard is unique by unifying the end-to-end virtual appliances functionality: From the application to the JVM, down to the OS and HW status.

In particular, trace points allow a deep dive into the system execution, providing similar functionality on OSv to what DTrace does for Solaris.

We are planing to provide similar application-level tabs for other OSv virtual appliances. Want to build a tab for your favorite application on OSv? Clone the osv-gui repository and start submitting pull requests!

You can keep up with the latest OSv news from this blog’s feed, or following @CloudiusSystems on Twitter.