Pure Demand-Fetch

Pure Demand-Fetch

 

Suppose a user arrives unexpectedly at an ISR site. If we wish to keep t4 as short as possible, we can use a pure demand-fetch policy to amortize the cost of retrieving the VM disk state over t5. In this policy, only virtual memory state (the file) is retrieved during t4; the transfer of disk state (the 256KB chunks corresponding to files) is deferred. As soon as the virtual memory state has arrived, the VM is launched. Then, during t5, disk accesses by the VM may result in Coda cache misses that cause slowdown. Demand-Fetch with Lookaside The performance of pure demand-fetch can be improved by using LKA in a number of different ways. If a user is willing to wait briefly at suspend, the file and an LKA index for it can be written to his dongle. He can then remove the dongle and carry it with him.

At the resume site, LKA can use the dongle to reduce t4. If read-only or read-write media with partial VM state are available at the resume site, LKA can use them to reduce the cost of cache misses during t5. This will reduce slowdown after resume. When combined with the use of a dongle, this can improve both resume latency and slowdown. Experimental Evaluation Benchmark ISR is intended for interactive workloads typical of laptop environments. We have developed a benchmark called the Common Desktop Application (CDA) that models an interactive Windows user. CDA uses Visual Basic scripting to drive Microsoft Office applications such as Word, Excel, Powerpoint, Access, and Internet Explorer. The operations mimic typical actions that might be performed by an office
worker. CDA pauses between operations to emulate think time. The pause is typically 10 seconds, but is 1 second for a few quick-response operations.

Methodology Our experimental infrastructure consists of 2.0 GHz Pentium 4 clients connected to a 1.2 GHz Pentium III Xeon
server through 100 Mb/s Ethernet. All machines have 1 GB of RAM, and run RedHat Linux. Clients use VMware Workstation 3.1 and have an 8 GB Coda file cache. The VM is configured to have 256 MB of RAM and 4GB of disk, and runs Windows XP as the guest OS. We use the NISTNet network emulator to control available bandwidth. Without ISR support, the benchmark time on our experimental setup is 1071 seconds. In this configuration, the files used by VMware are on the local file system rather than onThe effects of Fauxide, Vulpes and Coda are thus completely eliminated, but the effect of VMware is included. The figure of 1071 seconds is a lower bound on the benchmark time achievable by any state transfer policy in our experiments. Results: BaselineRelative to the metrics described in Section, we expect the baseline policy to exhibit poor resume latency because all state transfer takes place during the resume step. We also expect network bandwidth to be the dominant factor in determining this quantity. The column of Figure 4 labelled “Baseline” confirms this intuition. At 100 Mb/s, the resume latency is about 40 minutes. When bandwidth drops to 10 Mb/s, resume latency roughly doubles.

The reason it does not increase by a factor of ten (to match the drop in bandwidth) is that the data transfer rate at 100 Mb/s is limited by Coda rather than by the network. Only below 10 Mb/s does the network become the limiting factor. The results in Figure 4 show that the baseline policy is only viable at LAN speeds, and even then only for a limited number of usage scenarios. In contrast to resume latency, we expect slowdown to be negligible with the baseline policy because no ISR network accesses should be necessary once execution resumes. The “Baseline” column of Figure 5 confirms that slowdown is negligible at 100 Mb/s. The total running time of the benchmark increases from 1071 seconds to 1105 seconds. This translates to a slowdown of about 3.2%, where slowdown is defined as Tbw Tnoisr, with Tbw being the benchmark running time at the given bandwidth and Tnoisr its running time in VMware without ISR. As bandwidth drops below 100 Mb/s, the “Baseline” column of Figure 5 shows that slowdown grows slightly. It is about 9.2% at 10 Mb/s, 18.8% at 1 Mb/s, and 31.6% at 100 Kb/s. This slight dependence on bandwidth is due

Results: Fully Proactive

Fully Proactive

With a fully proactive policy one expects resume latency to be bandwidth-independent and very small because all necessary files are already cached. The “Fully Proactive” column of Figure 4 confirms this intuition. Resume latency is only 10 – 11 seconds at all bandwidths. Post-resume ISR execution under a fully proactive policy is indistinguishable from the baseline policy. The user experience, including slowdown, is identical.

Clearly, the fully proactive policy is very attractive from the viewpoint of resume latency and slowdown. What is the minimum travel time for a fully proactive policy to be feasible? This duration corresponds to t2 ✆ t3 in Figure 3. There are two extreme cases to consider. In the best case, the resume site is known well in advance and itscache has been closely tracking the cache state at the suspend site. All that needs to be transferred is the residual dirty state at suspend — the same state that is transferred to servers during t2. For our experimental configuration, we estimate this state to be about 47 MB at the mid-point of benchmark execution. Using observed throughput values in our prototype, this translates to minimum best case travel time of 45 seconds with a 100 Mb/s network, and about 90 seconds with a 10 Mb/s network. Both of these are credible bandwidths and minimum walking distances today between collaborating workers in a university campus, corporate campus or factory. At lower bandwidths, we estimate the best case travel time to be at least 800 seconds (roughly 14 minutes) at 1 Mb/s, and 8000 seconds (roughly 2 hours and 15 minutes) at 100 Kb/s. The 14 minute travel time is shorter than many commutes between home and work, and bandwidths close
to 1 Mb/s are available to many homes today. Over time, network infrastructure will improve, but travel times are unlikely to decrease. In the worst case, the resume site has a completely cold cache and is only identified at the moment ofsuspend. In that case, t3 must be long enough to transfer the entire state of the VM.

From the baseline resume latencies in Figure 4 and the value of t2 above, we estimate minimum travel time to b 2550 seconds (roughly 43 minutes) for a 100 Mb/s network, and 5250 seconds (88 minutes) for a 10 Mb/s network. Results: Pure Demand-Fetch In the pure demand-fetch policy, state transfer begins only at resume. However, in contrast to the baseline policy, only a very small amount of state is transferred. In our prototype, this corresponds to the compressed memory image of the VM at suspend (roughly 41 MB). The transfer time for this file is a lower bound on resume latency for pure demand-fetch at any bandwidth. As the “Pure DemandFetch” column of Figure 4 shows, resume latency rises from well under a minute at LAN speeds of 100 Mb/s and 10 Mb/s to well over an hour at 100 Kb/s.

We expect the slowdown for a pure demand-fetch policy to be very sensitive to workload. The “Pure DemandFetch” column of Figure 5 confirms this intuition. The total benchmark time rises from 1071 seconds without ISR to 1160 seconds at 100 Mb/s. This represents a slowdown of about 8.3%. As bandwidth drops, the slowdown rises to 30.1% at 10 Mb/s, 340.9% at 1 Mb/s, and well over an order of magnitude at 100 Kb/s. The slowdowns below 100 Mb/s will undoubtedly be noticeable to a user. But this must be balanced against the potential improvement in user productivity from being able to resume work anywhere, even from unexpected locations. Results: Demand-Fetch with Lookaside As discussed in Section 3.4, the use of transportable storage can reduce both the resume latency and slowdown of a demand-fetch state transfer policy. Our experiments show that these reductions can be substantial.

The “Dongle LKA” column of Figure 4 presents our results for the case where a dongle is updated with the compressed virtual memory image at suspend, and used as a lookaside device at resume. Comparing the “Dongle LKA” and “Pure Demand-Fetch” columns of Figure 4 we see that the improvement is noticeable below 100 Mb/s, and is dramatic at 100 Kb/s. A resume time of just 12 seconds rather than 317 seconds (at 1 Mb/s) or 4301 seconds (at 100 Kb/s) can make a world of a difference to a user with a few minutes of time in a coffee shop or a waiting room. To explore the impact of LKA on slowdown, we constructed a DVD with the VM state captured after installation of Windows XP and the Microsoft Office suite, but before any user-specific or benchmark-specific customizations. We used this DVD as a lookaside device for LKA during the running of the benchmark. The “DVD LKA” column of Figure 5 presents our results. Comparing the “DVD LKA” and “Pure Demand-Fetch” columns of Figure 5, we see that benchmark time is reduced at all bandwidths. The reduction is most noticeable at lower bandwidths. to the mid-1980’s. Both location transparency and client caching in AFS were motivated by this consideration. To quote a 1990 AFS paper [5]: “User mobility is supported: A user can walk up to any workstation and access any file in the shared name space. A user’s workstation is ‘personal’ only in the sense that he owns it.”

This capability falls short of ISR in two ways. First, only persistent state is saved and restored; volatile state such as the size and placement of windows is not preserved. Second, the user sees the native operating system and application environment of the client; in many cases, this may not be his preferred environment. ISR bears a close resemblance to process migration. The key difference lies in the level of abstraction at which the two mechanisms are implemented. ISR operates as a hardware-level abstraction, while process migration operates as an OS-level abstraction. In principle, this would seem to put ISR at a disadvantage because hardware state is much larger. In practice, the implementation complexity and software engineering concerns of process migration have proved to be greater challenges. Although successful implementations of process migration have been demonstrated, no OS in widespread use today supports it as a standard capability

Related Work

Related Work

Off-machine Lookaside We have recently extended LKA to use off-machine content-addressable storage (CAS). The growing popularity of planetary-scale services such as PlanetLab, distributed hash-table storage such as Pastry and Chord, and logistical storage such as the Internet Backplane Protocol, all suggest that CAS will become a widely-supported service in the future. For brevity, we refer to any network service that exports a CAS interface as a jukebox. When presented with a hash, the jukebox returns content matching that hash, or an error code indicating that it does not possess requested content. The “Jukebox LKA” column of Figure 5 shows the performance benefit of using a LAN-attached jukebox with same contents as the DVD of Section 6.6. Comparing the “Jukebox LKA” and “DVD LKA” columns of Figure 5, we see that the improvement in the two cases is similar relative to the “Pure Demand-Fetch” column.

Although ISR is new, it is only the latest step in a long historical evolution toward user mobility in fixed infrastructure. The earliest form of user mobility, dating back to the early 1960’s, was supported by timesharing systems attached to “dumb” terminals. A user could walk up to any terminal and access his personal environment there. Thin clients are the modern-day realization of this capability, providing just enough compute power to support GUIs. Thick client strategies became possible after the birth of personal computing circa 1980. The vision of walking up to any machine and using it as your own dates back at least

to the mid-1980’s. Both location transparency and client caching in AFS were motivated by this consideration. To quote a 1990 AFS paper [5]: “User mobility is supported: A user can walk up to any workstation and access any file in the shared name space. A user’s workstation is ‘personal’ only in the sense that he owns it.” This capability falls short of ISR in two ways. First, only persistent state is saved and restored; volatile state such as the size and placement of windows is not preserved. Second, the user sees the native operating system and application environment of the client; in many cases, this may not be his preferred environment. ISR bears a close resemblance to process migration. The key difference lies in the level of abstraction at which the two mechanisms are implemented. ISR operates as a hardware-level abstraction, while process migration operates as an OS-level abstraction. In principle, this would seem to put ISR at a disadvantage because hardware state is much larger. In practice, the implementation complexity and software engineering concerns of process migration have proved to be greater challenges. Although successful implementations of process migration have been demonstrated, no OS in widespread use today supports it as a standard capability.

ISR is a mechanism that accurately encapsulates all the customizations that a typical user cares about, and rapidly transforms generic hardware into a personalized computing environment. Both the accuracy and speed of transformation are important. The design of ISR pays careful attention to ease of deployment, a key requirement for universal infrastructure. Specifically, ISR sites can be set up and easily managed by relatively unskilled personnel. ISR reduces dependence on mobile hardware. A user need only carry hardware larger than a dongle under two conditions: when traveling to destinations with poor network or hardware capability; or when using hardware-integrated applications such as augmented reality. Seamless computation in spite of user movement is the holy grail of mobile computing. ISR is an important step toward this goal.

Exploiting advance knowledge of travel, transferring VM state incrementally, and using transportable storage are all important techniques for making ISR viable. With these techniques, resume latency can be reduced to little more than the typical delay one experiences when opening a laptop. By leveraging the consistency of a distributed file system, ISR is robust in the face of a variety of human errors, and is tolerant of environments that are not wellmanaged. These attributes give us confidence that ISR can play an important role in the future of mobile computing. This table shows resume latency (in seconds) for different state transfer policies at various bandwidths. In each case, the mean of three trials is reported, along with the standard deviation in parentheses. The results in the “Baseline” column for 1 Mb/s and 100 Kb/s are estimated rather than measured values.

The OSKit Project

The OSKit Project
The OSKit Project

The OSKit is a framework and a set of 34 component libraries oriented to operating systems, together with extensive documentation. By providing in a modular way not only most of the infrastructure “grunge” needed by an OS, but also many higher-level components, the OSKit’s goal is to lower the barrier to entry to OS R&D and to lower its costs. The OSKit makes it vastly easier to create a new OS, port an existing OS to the x86 (or in the future, to other architectures supported by the OSkit), or enhance an OS to support a wider range of devices, file system formats, executable formats, or network services. The OSKit also works well for constructing OS-related programs, such as boot loaders or OS-level servers atop a microkernel.

For language researchers and enthusiasts, the OSKit lets them concentrate on the real issues raised by using advanced languages inside operating systems, such as Java, Lisp, Scheme, or ML— instead of spending six months or years groveling inside ugly code and hardware. With the recent addition of extensive multithreading and sophisticated scheduling support, the OSKit also provides a nmodular platform for embedded applications, as well as a novel component-based approach to constructing entire operating systems.

Although the OSKit contains substantial machine-independent code, it currently only contains machine-dependent code for the Intel x86 and Digital DNARD (StrongArm SA-110 CPU). Ports to other platforms are being considered– if you’re interested, please contact us! A handy development and debugging capability is the ability to run most kernels on top of Unix. “Transmuting raw code into robust components since 2000.” Alchemy Project LogoThe Alchemy project integrates support for cross-cutting concerns, also called aspects, into component-based programming.

Aspects that span natural component boundaries are particularly pervasive within low-level systems software, embedded software, and middleware: such aspects include concerns such as concurrency, memory management, and real-time scheduling. Alchemy explores new ways of dealing with such cross-cutting issues in realistic systems and embedded software, thus making the software both easy to configure and robust.

The Alchemy project is creating new language and tool suites for componentizing systems and embedded software, with particular attention to the specification, verification, and optimized implementation of cross-cutting dependencies. The Alchemy project is also integrating new quality of service (QoS) aspect technologies with existing component-based systems, such as BBN’s Unmanned Aerial Vehicle (UAV) Open Experimental Platform (OEP). Alchemy is supported by DARPA under the Program Composition for Embedded Systems (PCES) program.