WebVM whitepaper
Unlocking the power of the mobile Web
Introduction
WebVM is an initiative by Aplix to improve the mobile web. This whitepaper sets out to explain why we think WebVM is needed, what problems it solves, and how it works.
The growing adoption of 3G broadband cellular services, and the recent revolution in the power of web-based applications on the desktop, are combining to reinstate belief in the mobile web and stimulate the creation of new mobile-oriented services. In addition, the hugely impressive iPhone has set a new standard for interactivity and functionality in mobile web applications.
In parallel, however, there is a growing recognition that mobile applications are not just about interactivity. As was evident years before in the context of mobile Java environments, mobile apps often depend on the ability to access services or data associated with the phone platform - be it access to information about the phone’s location, or access to personal or other context information, or use of the phone’s built-in messaging services - and application environments need to provide access to these services. This represents a problem because there is no standardised mechanism for a web application (i.e. an app based on html+javascript, or svg+javascript) to gain access to these services.
There have been experimental approaches to addressing this need by several companies but none has emerged as a serious candidate for standardisation. In addition to having limited adoption, these existing efforts universally fail to deal adequately with all of the use cases, and with the attendant security requirements, that are foreseen. There is therefore continuing fragmentation while piecemeal and proprietary solutions are coming into the marketplace.
WebVM is a new approach to the problem. It is based on some unique insights into the requirements and possible solutions, coupled with significant enabling technology. In this paper we explain in detail the needs that WebVM aims to satisfy, how it works, how it will be productised for the market, and how we plan to support developers in the wider content ecosystem.
The need
Everyone is talking about the unparalleled power of the web and its ability to connect individuals, form communities, and mediate between service providers and their customers. This power is changing the technology landscape for nearly everyone who has a PC.
There is naturally a desire to have that same power extended to mobile. After all, a mobile is an inherently connected device and there are already more phone subscribers than there are internet-connected PC users. In theory, there is huge potential in the extension of web technologies and services to mobile.
The reality falls a long way short of that promise right now. Even in the most advanced markets (e.g. Japan) the mobile web is a shallow shadow of the mainstream web. In other markets the situation is worse; the vast majority of sites and services make no concession to mobile clients, and are unusable or do not work at all. User experience is poor; accessbility problems - already significant - are magnified; content and devices do not interoperate or perform well; and in certain respects the open ecosystems of the regular web are simply not there. These basic problems began several years ago with the first introduction of WAP, and although the industry is working to solve them, progress is painfully slow.
Now, however, there is an attitude shift towards the role of the mainstream web, with the widely held view that it will soon become the dominant way to deliver services to the desktop. The major technology players (Google, Adobe and Microsoft) have all made very significant announcements recently about technologies to support “Rich Internet Applications”. The web is changing from being one medium for service delivery to the medium for service delivery. Corresponding to this, there is an attitude shift towards the role of the mobile web. Soon, as with all desktop technologies, it will come to mobile in a big way. There is still much work to do, but it is widely believed eventually to be inevitable.
As this happens, web applications will increasingly displace resident (Java or native) applications. There are compelling advantages that make web applications so powerful. Some developers claim that they can be much more productive with web development technologies - especially if they are adapting an existing website aimed at the desktop. For some applications, the power comes from the ability to hook up with existing services already on the web - look at all of the new applications that have been developed, for example, based on the Google maps API. Many believe that web apps scale better, because you can access a thousand different websites, but there would be no way you could install a thousand different clients onto your phone.
The issue that service providers cite most often, however, is this: every single day you can improve the user experience. The user does not have to install a new client to get the benefit - it’s just there the next time he visits the site. This is a huge deal for users, service providers and developers and is one of the practical issues that is fuelling the proliferation of web-based services and applications on the desktop.
In comparison with native and Java applications, however, the mobile web has its shortcomings. To make web-based services possible or useful on mobiles, there are several broad needs that are not satisfied by current technology or standards. Each of these needs is examined in this section.
The need for device APIs
The very connection that potentially has the most value - between the web application and the features of the phone handset such as location or the user’s PIM data - is missing. Java APIs to access these platform features have existed for some time, but right now there is no standardised way for a web-based application to get access to those services. This is the biggest single barrier to the development of the kinds of services that the mobile web has the potential to deliver.
The connection a web developer would want would be a collection of javascript APIs to device services. Right now, the picture is confused.
First, there are multiple ongoing initiatives. Looking at standards and industry bodies, there is an effort by the W3C, some (early exploratory) work within the OMTP, and discussions in the OpenAjax Alliance. In parallel, there are actual deployments by device and browser manufacturers of proprietary APIs.
All of the existing efforts by manufacturers are native implementations, within the browser, of APIs that are then frozen when the device ships. Freezing APIs in this way is limiting at any time, but especially limiting at this early stage of specifying and implementing these Javascript device APIs.
A further limitation is that, for security reasons, these APIs are often only made available to “web applications” (or “widgets” that are persistently installed in the phone. This undermines the usefulness of the device APIs - bearing in mind that one of the key attractions of web technology is the ability to upgrade and improve services continuously, without requiring the user to reinstall an application.
It is worth also looking at what Android is doing. Android does not itself specify any device APIs - it provides access to device features indirectly by providing a language-level mechanism, whereby a Java programmer can specify and implement (in Java) any API, but any object implementing that API can be attached (as a javascript global) to a webview window. However, this mechanism is not accessible to web applications, and is only available in WebViews that are instanced within the context of a resident (Java) application.
The security need
Disparity in approaches and inconsistent APIs are one problem. However, there is an even bigger hole: security. Without effective security controls, these device APIs have the potential to be the enablers for the most virulent and intrusive viruses yet seen. None of the approaches discussed above has made any genuine headway in solving the security problem. Addressing this in an effective but usable way is critical to the viability of any solution to the platform API requirement.
The security requirements are complex, and a framework needs to be able to deal with a wide range of competing requirements, including:
- the need to handle multiple specific policies of different manufacturers or network operators within a single technical framework;
- the need to maintain security, and ensure that the user remains in control of security-relevant operations that are performed on his behalf, without making the security checks unacceptably obtrusive;
- the need to establish reliably which sites are authorized to perform specific operations, and those that are not, when there are (effectively) infinitely many sites under multiple authorities, in the presence of different connection technologies;
- the need to allow the user to grant very specific rights to individual sites where these are legitimately required, without making the security configuration unwieldy or unmanageable.
The fragmentation need
A further issue in the creation of device APIs is fragmentation. We have already discussed how there are several parallel initiatives to define javascript APIs for location, or for access to PIM data. There will soon be devices available on the market supporting each of these different APIs.
It is tempting to think that the solution to this fragmentation issue is for a standards body to create standardised APIs that all devices and browsers can implement. Is this the right approach? In fact our belief is that this is exactly the wrong approach. There are really two problems with it:
- Centralised definition of APIs by standards groups delivers APIs that are too late to be useful.
- Synchronising the definition of the API with the embedding of the corresponding technology on the handset is itself the root cause of much fragmentation.
The historical development of J2ME exemplifies both of these issues. Ironically, the JCP standardisation efforts themselves became the cause of much fragmentation. Due to cost and the risk of delay, manufacturers and operators do not incorporate new APIs unless there is a clear commercial need; and when they do, they all make different choices as to which capabilities to include. Prior to standard APIs being available, operators and manufacturers still create private APIs, which must then be supported on an ongoing basis and contain a slightly different feature set from the standard so content cannot easily migrate to use a new API.
Although standardisation of APIs is beneficial, it is also important to allow APIs to be defined and implemented independently, for example by content developers, service providers, manufacturers or network operators. Although at first sight this would appear to create fragmentation, it doesn’t provided that APIs are deployable (meaning that they can be downloaded to a device after manufacture) - and are modular (meaning there is also a way of writing and deploying libraries or wrappers that can implement one API on top of another). If these two conditions are met, it is not necessary to wait for a centralised definition of any given API because developers can adapt to different device features and APIs without creating an explosion in the number of platform variants that need to be maintained.
Summary
So, to summarise the motivations for WebVM:
- for the system developer or integrator: make javascript APIs available for the platform services, so that web-based applications can exploit the power and context of the mobile platform;
but at the same time:
- for the user: establish effective and usable security mechanisms so that those sites can be used without exposing the user to threats from malware or poorly written web applications;
- for the ecosystem: allow for local API definition as well as supporting definition by conventional standardization processes, and thereby allow API definition to be decoupled from the embedding of the technology into the device.
This paper explains how WebVM addresses these needs.
How WebVM works
WebVM plugin
What WebVM provides is a connection between the web application environment (ie the browser, usually) and the Java runtime that is present on nearly every handset today. What it allows the web developer to do is deploy a Java library along with the web application - so the code in the web application can make calls to the Java library. The real power of this comes from the fact that the Java library can access platform features, through a wide range of existing standardised Java platform APIs defined as profiles (commonly known as JSRs). There is a multitude of profiles already deployed on most phones that allow a Java library to access:
- current location, and a database of “landmarks” (JSR179)
- contact and calendar data, and files in the phone’s filesystem (photos, movies etc) (JSR75);
- local connectivity via Bluetooth (JSR82);
- SMS and MMS (JSR120 and JSR205);
… and many more.
The concept behind WebVM is really quite simple. It hooks into the browser (or other environment - it’s not limited to the browser) in the same way that a traditional browser plugin works.
A web page invokes WebVM by referencing an asset of a particular type - in this case a Java library we’ll call a WebVM control - and the browser invokes a specific registered handler for that content type (in the same way as it would for a particular image or media type). In our case the registered handler is the WebVM plugin. This instances a Java VM, which loads the library referenced by the web page.
The WebVM plugin responds when scripts in the web page make function calls on it - in fact what it does is forwards the function calls to the Java VM, which in turn makes corresponding calls into the Java library. In effect, therefore, the WebVM plugin allows the API exported by the Java library to be made visible to the web app programmer.
Obviously, the WebVM plugin takes care of all of the details associated with mapping language types between JavaScript and Java, handling language exceptions and other errors when they occur, and so on. It also handles the (extremely important) issue of mapping the security context of the web app to a security context for the Java VM, so arbitrary web pages do not get to make unauthorised calls to sensitive Java APIs.
WebVM security system
The viabilty of WebVM depends on being able to define a security policy that makes it possible to expose device APIs whilst being able to control exposure to risk of compromise. Without this, device API access would allow hostile sites to compromise user data, expose the user to additional cost, proliferate viruses, or theft of personal data.
Of course, security models already exist for Java ME and for other environments that permit applications to be installed on a phone - for example the “Symbian Signed” system for Symbian and the “Mobile2Market” system for Windows Mobile. However, web applications present a new set of requirements and it is not as simple as replicating these security models for WebVM.
There are also security systems built into current desktop browsers; but these also have issues when examined in the context of the functionality offered by WebVM and the unique requirements for mobile; they are inflexible (for example, being unable to control permissions at an appropriate level of granularity), differ substantially between the browser frameworks, and are poorly understood by users. What is required is a new approach to the security policy.
In defining this security framework, a number of issues have to be addressed.
First, there is the issue with MIDP2 security (and certain other similar frameworks) in that it confuses authenticity and trust. Usually, a cryptographic signature (eg on a JAR file) is used to verify the authenticity of the signed entity - ie that it genuinely was authored by the party that is advertised as its author. Once the identity is verified, a policy can be applied to assigned a level of trust, and a set of associated permissions, to that entity.
However, in MIDP2, the assigned trust and permissions are not based on the verified identity - instead, they are based on the owner of the root certificate that was used in the signature verification. This system prevents any truly scalable security policy being defined and implemented.
The next issue is that there are complex identities - for example a object embedded in a web page has its own identity, but its security context also includes the identity of its containing page (ie its “referrer”) Also, there are multiple identity systems involved - including the Distinguished Name of a code signer and the domain of a web page.
Next there is the issue of having multiple namespaces of security-relevant actions. There are the actions that Java code might attempt that correspond to MIDP2 permissions, but also security-relevant actions in the browser (eg opening a popup, executing Javascript, or storing a cookie). Finally, there are actions that are part of the operation of WebVM itself that are security-relevant - such as the act of binding a web page to a given WebVM library.
Each of these security-relevant actions corresponds to a property that can be controlled by a security policy - ie a permission. These multiple permissions namespaces must all be addressed by the framework.
Finally, the security framework must take into account the fact that there are multiple provisioning systems in place, and it must be possible to establish effective and workable policies for locally installed web applications (or “widgets” but also for web sites loaded in the browser in the usual way.
High-level security models
WebVM allows code running in a web page (whether locally or remote) to call a Java library and, ultimately, to perform actions on the underlying platform via Java platform APIs. The WebVM security framework has two different ways of treating those actions of the library in relation the identity of the library and the identity of the containing page.
The first, and simplest, model simply treats the WebVM library as a transparent extension of the containing page. All security-relevant actions attempted by the Java library are regarded by the framework as having been attempted directly by script running in the containing page. Therefore, it is the script’s identity (which, under the browser security model equates to the page’s identity or origin) that is used in deciding whether or not to permit the action.
The second model aims to deal with the situation in which the WebVM library uses certain low level primitives to make a specific set of higher-level services available to calling scripts. The library is trusted (to some degree) not to abuse the primitive operations it has access to, and therefore the full generality of those lower level primitives cannot be exploited by scripts. For example, the library might use SMS to interact with a specific service - it only sends to a specific known address and can be trusted not to spam the SMS inboxes of other addressees in the user’s phonebook. The trusted library then exposes a different service to the enclosing app - say an SMS voting service. The library becomes a trusted subsystem, shielding the platform from abuse by web apps that make use of it. This is the trusted subsystem model.
With the trusted subsystem model, the WebVM library and the containing page are considered as having separate identities. The identity of the library is the relevant identity when determining whether or not to permit security-relevant actions at the MIDP level; the identity of the containing page is then relevant to:
- the act of binding to the WebVM library;
- any new security-relevant actions that the library itself defines, corresponding to the higher-level services that it exposes.
In order for the trusted subsystem model to apply, the library itself must be signed and verified, and the library must be installed locally on the device.
Formal security framework
The WebVM security framework is formally defined based in a definition of:
- Objects (entities that have an identity and can attempt security-relevant actions);
- Identities: any one or combination of domain, X.509 Distinguished Name or SSL CN;
- Security Properties (from a unified namespace of properties, which is the union of MIDP2, browser core and WebVM permissions namespaces);
- Rules (which grant or deny permission for a security property, either absolutely or based on evaluating data relating to the specific context in which the action was attempted);
- Rulesets, which bind a series of identities to a series of rules.
In a practical configuration, however, this full generality of the flexibility of the framework is not exposed, but it would be used to configure a policy as illustrated here.
A series of “trust zones” define a default set of permissions for websites in different categories (such as trusted sites, restricted sites, etc) as with IE. The management of these zones is such that a specific sites resolves to a unique trust zone.
Certain specific sites may have additional specific permissions granted - these might, for example, be those instances where the default policy for a zone required that the user be promted, but the user requested that the permission be assigned permanently.
Next, there will be security configuration belonging to those WebVM libraries that operate under the trusted subsystem model. These might be presented to the user as “trusted web extensions” that individual sites can make use of with the user’s permission. These would have their own configured policy (indicating which low-level operations they are permitted to perform), and are able to present their own set of higher-level security-relevant actions to invoking pages. These actions themselves can be the subject of a security policy, governing which web sites are permitted to invoke those operations, and whether or not interactive confirmation is required.
WebVM module system
In practice, a web application developer will not want to develop a Java library each time a new web page wants to access the location data (say); and a user will not want to install that library each time he visits a new site that uses it. (That would take away from one of the main aims of web apps, which is to avoid the need for the user to perform this kind of installation.) So, WebVM expects in fact that a series of JavaScript APIs will be developed, and each of them only needs to be deployed to the device once, no matter how many different web sites make use of them. So WebVM is built so that it supports the deployment of JS APIs (which are a combination of a Java library, and a JS wrapper that instances the library), and the management and coexistence of those APIs.
Of course, within a browser it is already possible to reference a JS API simply by referring to it in a tag. However, referencing many JS modules this way can be problematic when there are complex apps that could give rise to multiple inclusion, conflicts in the global namespace, load-order dependencies between modules, and contention for certain events (such as the onload event).
The community is beginning to address this problem by defining frameworks to allow modules to be developed independently and then combined within an application without conflict. WebVM includes such a framework, which additionally takes care of the installation of Java libraries, and persistent storage of permissions associated with those libraries.
With this framework in place, there are further clear benefits to WebVM. The biggest single benefit is that these new APIs (say a location API, or a PIM API) do not need to be embedded in the phone and, more important, do not need to be defined by a single committee as is the case with JSRs in Java. Now, any interested party can define their own APIs to suit their own purposes, so long as the underlying functionality needed is available via a Java profile. These APIs can evolve to meet changing requirements.
Of course, it would be great if industry bodies got together to define useful JS APIs - and some are already doing this (like the W3C). However, exploiting WebVM does not have to wait for this to happen. APIs can be created and deployed by publishers, carriers or manufacturers as needed, and superseded by standard APIs once they become available.
WebVM SDK
Developing for WebVM can be thought of as two essentially separate activities. SDK support is available for each of these, based on the Eclipse ATF (AJAX Tooling Framework) and EclipseME (JavaME) environments.
The primary activity is development of web applications (ie websites, or html/javascript applications packaged as widgets for local installation) that use pre-existing javascript APIs. The core WebVM technology is already accompanied by a series of APIs that are deployable, and as WebVM becomes adopted it is expected that further, more specialist APIs will become available.
Developers are then simply developing web applications in the conventional way, using whatever tools they choose. A WebVM plugin exists for the desktop that works with Safari (Webkit), Mozilla and Opera, and this will work with any web development IDE based on any of these browser cores. Of course, many of the WebVM-deployed APIs will only work in a desktop development environment if the PC is connected to a phone to get access to the relevant phone-specific functionality.
WebVM javascript APIs are expected to provide API Metadata in the OpenAjaxAlliance API metadata format, to allow those APIs to be seamlessly supported in IDEs such as the Eclipse ATF.
The second kind of WebVM development is the development of APIs to be deployed with WebVM. This involves a combination of javascript and Java development. To support this development, WebVM includes an Eclipse plugin that supports all of the elements of creating and debugging a WebVM API, including:
- templates for manifest and JAD files, and for security policy files;
- wizards for outlining Java classes and javascript bindings given an input API;
- plugins for both JavaME (EclipseME) and javascript (Eclipse ATF) perspectives to assist in the debugging of WebVM APIs.
Deploying WebVM
A JBlend extension
The WebVM plugin is closely tied to JBlend, Aplix’s JavaME runtime. As such, WebVM is mainly intended to be deployed alongside JBlend, integrated with a phone’s system software and embedded in the phone at manufacture. WebVM represents a very small additional porting and integration requirement in comparison with the main porting and integration activity for JBlend. Like JBlend, WebVM is structured with a porting layer as an independent library from the main plugin code, and therefore it is straightforward to create a small self-contained project to create and deploy the porting layer implementation on any given platform.
Browser integration
WebVM also requires integration with the browser (or other widget/SVG runtime). WebVM is already available pre-integrated with the primary industry plugin API, the Netscape Plugin API (NPAPI), and is also available as an ActiveX control for use with PocketIE. With many browsers, therefore, no additional integration effort is required.
However, some browsers, and most other target environments, do not support NPAPI. In these cases, it is necessary to implement a porting layer between WebVM and the browser. An SDK, including documentation and header files, is available to support this.
WebVM has been designed to place minimal requirements on the browser or runtime environment; at a minimum, the environment must support scripting of plugins, and must provide hooks that allow plugins to cause javascript to be invoked from the plugin. However, very low-end browsers are likely to be unable to support WebVM. Please contact Aplix to verify that a given browser will be able to support WebVM.
WebVM-JNI
A version of WebVM is also available that can work with any conventional JNI-based Java VM, which includes most commercial CDC VMs. This allows for interoperability of WebVM content between CLDC and CDC-enabled phones.
Validation
A new port of WebVM can be validated using an extensive set of tests that are available as part of the licensed WebVM porting SDK.