The DataSpace Framework
Juan J. Collas
The DataSpace is a generalization and simplification of the DataMaster concept introducted by Dave Neumann in the impressive Monitor suite for WebObjects. DataSpace is a server process that maintains state for client processes. It takes ideas from JavaSoft's JavaSpace, and is designed to be a general distributed persistent store for Foundation objects. Due to its design, it is also transactional in nature, and can be used as a shared store for applications that need to exchange information.
Much of the credit for this work goes to Dave Neumann, for providing an excellent testbed for these ideas.
The following are goals for the DataSpace architecture:
The DataSpace server is implemented by a new class called DataSpace. The dataspace is accessed as a proxy to a DataSpace server. The proxy is created with the following method:
id dataSpace = [NSDataSpace defaultSpace];
Once instantiated, the space provides access to domains which are available as dictionaries. For example, to write into a domain called "Monitor", you can use the following methods:
[dataSpace setObject:theChildren forKey:@"Children" domain:@"Monitor"];
To access data in a domain, you can use:
[dataSpace objectForKey:@"Children" domain:@"Monitor"];
To remove a key from a domain,
[dataSpace removeObjectForKey:@"Children" domain:@"Monitor"];
The DataSpace class maintains a single DO connection to the dataspace server. The server is not multithreaded for performance and transactional integrity reasons. All requests for data from a domain are sent to the server. If the server dies, the client class will attempt to connect to a new dataspace server, and will invalidate its caches. If it fails, it might make sense for the datamaster client to become a server.
This implies that the set of dataspace servers should somehow do their best to replicate information between themselves.
The Dataspace is an excellent way to manage the state of clients. If a client dies, the dataspace is aware of it, and can notify interested clients. If a client attaches to the dataspace, this information can also be made available to interested parties.
There is a domain called NameServer which allows clients to register themselves with the dataspace, and for other clients to quickly locate an instance of a registered name.
For example, here's a client registering itself with the dataspace.
[dataSpace setObject:self forKey:@"0-DLNMonitor" domain:@"NameServer"];
Any clients that want access to this instance can simply do:
remoteMonitor = [dataSpace objectForKey:@"0-DLNMonitor" domain:@"NameServer"];
Of course, if the server for this proxy dies, the dataspace will pass along the connection death notification that it receives.
Since there might be multiple servers registered for a given name, the API for setting and getting values might also include some array-specific methods to reduce communication time:
[dataSpace addObject:anObj forKey:aKey domain:aDomain];
A referent is a proxy that knows how to recreate itself if its connection dies. This implies that the referent contains its host and rootName. The referent acts like a normal proxy, forwarding messages across the network. If its connection dies, it can automatically re-establish a new proxy from the host and rootName. A referent can be described as a tuple of host, rootname, and proxy.
Domains are a way to manage the dataspace areas. Domains are created when first referenced. If the domain does not exist, it is created. When there is no data in a domain, the domain is removed. If a domain is not specified, the domain is set to the name of the application (whether an AppKit or WebObjects application).
Domains may be smart, which means there are classes tied to a domain name that have additional capabilities, such as cleaning up old data, or communicating with an LDAP store to persist information.
Here are some domains:
We may define a bundle based architecture to add additional storage types. These may include
Managing the Persistent Store
The default implementation of the persistent data would be done through an NSPPL. These are apparently designed to provide a simple, Foundation-based backing store capable of incremental addition of data, handling 10s of megabytes. NSPPLs already have machinery to manage updates between the RAM cache and the disk.
There needs to be some way to minimize reading data from the dataspace server that hasn't been changed since the last time the client accessed it. The NSDataSpace class should use a notification scheme so the server can tell the client when data has changed. Otherwise, the client can maintain a local cache of data selected from the server.
When data is accessed, the client will cache the data by the (domain, key) tuple and register for a notification on data modification. When that tuple changes value, a notification is sent to the client.
Changes to a DataSpace should be propagated to all the other dataspaces. This might either be a push or pull mechanism. Synchronizing them could be interesting.
The first dataspace will be the master, and additional dataspace servers will be clones, with changes to the master pushed to the clones. When the master dies, the next in line will become the writeable store.
Migrating Monitor V4 to DataSpace
The existing Monitor suite uses a set of servers to provide persistence management. It is expected that those servers would become domains in the DataSpace server. For example, to use the nameserver, the client would now use:
[dataSpace objectForKey:@"Hosts" domain:@"NameServer"];For the ESS, data would be written to the domain 'WOSessionDomain'. For SuperStateStore, the domain would be the name of the application.
The DataSpace is the replacement for the DataMaster. However, the DataSpace server can manage multiple domains, and can thus take over the functionality of the ExternalStateServer, the NameServer, the SuperStateServer and the ThreadStorage server.
A DataSpace server is created by one of two means:
The DMSessionStore class currently deals with the process of connecting to a DataMaster and creating thread records. The entire interface for storing and retrieving data from the DataSpace can be reduced to the follwing two calls:
[[NSDataSpace defaultSpace] setObject:aSessionData forKey:aSessionKey domain:@"WOSessionStore"];
Super state storage is the same as DMSessionStore, except the domain is tied to the application name. You can use the simple form of the DataSpace accessors:
[[NSDataSpace defaultSpace] setObject:aSessionData forKey:aSessionKey];
The domain is derived from the process name or the WebObjects name for the application.
The thread store is trickier. There key is still a sessionKey, but the value is an NSDictionary containing the status (BUSY, DONE) and the operationData, or message string. The domain is 'ThreadStore', and the client should register for a notification to receive updates to the value.
Monitor may change to become a client of the DataSpace, setting and retrieving values such as start time and stop times.
Monitorable applications no longer connect to the Monitor. Instead, they write their availability into a 'Monitorable' domain, which the Monitor registers interest in. Once they make themselves available, the Monitor can connect to them to query statistics and such. The monitorable applications also should register for information about themselves in the 'Declared' domain, so they can pick up configuration changes made by the monitor.
The MonitorProxy may be subsumed as a specialized domain in the DataSpace. If we can define a new domain called 'Tasks' which actually start processes when a value is set for an object and kill them when the object is removed from the domain, this functionality will simply consist of setting a key in a DataSpace on a given host.