1. Gnome website platform issues
(See also GnomeWeb/CmsRequirements)
Currently the Gnome Web consists of several loosely coupled services with no (or minimal) common policy or integration. The goal of this wiki document is to specify the platform on which a more integrated web presence can be built without sacrificing (physical, technical, administrative) independence of a distributed system. We will not deal with any implementation details (such as a specific CMS) or making any decisions (such as setting policies, content partitioning, setting responsibilites, theme design, etc).
First we look at the web as a platform. We cannot change this platform, but we can try to use it to its full extent: obey its rules and leverage its advantages.
Then, we look at how content can be managed and represented (CMS). There are many CMS systems and solutions. Our goal is not to use a CMS, but to provide content. If a CMS will get us there, we'll use one.
Finally, we look at content authoring issues.
Based on a guadec2006 BOF meeting.
1.1. 1. Web platform level
This is the lowes level of the platform stack and provides the basic infrastructure for all web applications. See also [1]
1.1.1. Resources
The World Wide Web (WWW, or simply Web) is an information space in which the items of interest, referred to as resources, are identified by global identifiers called Uniform Resource Identifiers (URI). [1]
It is important to note, that users don't retrieve Resources directly, but representations of resources. This is important when considering multiple formats or multiple languages , see higher in the stack.
1.1.2. URIs
A Uniform Resource Identifier (URI) provides a simple and extensible means for identifying a resource. (...) A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location"). The term "Uniform Resource Name" (URN) has been used historically to refer to both URIs under the "urn" scheme (RFC2141), which are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable, and to any other URI with the properties of a name. [2]
We expect the following of URIs:
- a single URI should point to a single resource, i.e. no URI collisions [1]
- but multiple URIs are allowed to point to the same resource
- persistence: URIs should not change very often (or preferably, no change at all!) so links and bookmarks don't break
- user friendly URIs to ease typing and referencing (no index.php?pageid=$cryptichash)
- namespacing: URIs can enforce namespacing, i.e. they can help us partition content into hierarchical structures with the help of subdomains and subdirectories
- can make us backend-independent: URIs should map to resources cleanly, independent from the used CMS which allows transparent migration or even using multiple CMS systems for different parts.
- consistence: we should preferably come up with a consistent naming scheme accross the whole gnome web presence.
We have to set policies what is considered a single resource, and what are different resources. For example, the same page in different languages or different formats (XHTML vs. PDF), or "this year's guadec" v.s. "guadec2006". To be discussed higher in the stack. This should be reflected in our URL policies (URL naming scheme).
See also the W3C URL style guide [3]
1.1.3. Accessing resources
Content can be either static or dynamically generated. Since this is an implementation detail, it should not be directly visible to the user.
- Static
- Dynamic
- Editable (e.g. wiki)
- Pre-generated (e.g. statistics)
- On-the-fly generated (e.g. cvsview)
Static pages are usually well cacheable. Care must be taken with dynamic pages though. This is important for browser caches and proxy caches. Implementation detail is up to the CMS.
1.2. 2. Content management (CMS) level
A content management system (CMS) is a computer software system for organizing and facilitating collaborative creation of documents and other content. [4]
As a platform, content management should provide us with an infrastructure to convey the information we want to communicate to our audiences. Different audiences may need different kind of information, thus we might want to use different CMSs for different purposes.
In this section we discuss all relevant CMS features. We should adopt a system with only as much features as necessary.
1.2.1. Providing content
There are several mechanisms for providing content:
- Online editing
- Offline editing, uploading
- Generating
As with static vs. dynamic serving of pages, this mechanism should be transparent to the visitor. A single web site may host content provided by different mechanisms, the content management system should provide means to integrate content created externally. This includes linking to, or even including such content inside CMS provided pages.
The actual mechanism should not be visible on the URLs, unless this is specifically desirable.
1.2.2. Managing workflows
A CMS system can support the authoring and publishing process, known as workflows. Some examples:
- Draft / Final version
- Approval, Moderation
- Publish on event, e.g. at a specific date and time.
- Feedback loops (e.g. API doc gets edited online, a patch is generated and sent to maintainer, developer integrates, change appears on website)
1.2.3. Multiple formats
Sometimes it is desirable to provide the same content in multiple formats, e.g. for online viewing and printing, or syndicating. This should be managed by the CMS, or the multiple representations can get out of sync.
- (X)HTML
- PDF and others
- XML (for integration, syndication, automation, metadata, RPC, etc)
1.2.4. Other CMS issues
1.2.4.1. I18N
The CMS should support providing the same content in multiple langauges. The display language may be selected automatically based on the browser's settings, but also selected explicitly by the user.
We have to come up with a nice way to reflect this in the URL.
Note, that language specific content, i.e. content relevant only for one language is a different issue from providing the same content in different languages.
Comment from murrayc: This is an overly simplistic view of this requirement. We need a system that allows translation, not just serving of translated pages. So, translators need to be able to see when the original text has changed, and need to be able to easily update that small part of the text, because the original _will_ change. This is how we achieved translated release notes. Readers would probably like to be able to see when a translated page is out of date, or see the english for a fragment of text instead.
See also http://www.w3.org/International/
1.2.4.2. Version management
The CMS can keep a history of the edited resource, and provide ways to display older versions or differences.
1.2.4.3. Accessibility and Usability
Accessibility is a general term used to describe the degree to which a system is usable by as many people as possible without modification. It is not to be confused with usability which is used to describe how easily a thing can be used by any type of user. [4]
Our design (theme) and markup (html code) should keep our web pages accessible. This means accessible by people with disabilities, but also by different kinds of devices (text broswer, PDA). There are several standards describing this, we should adhere to them.
Our web pages should also be usable, meaning easy navigation, clear communication (no cryptic text), readable fonts, etc.
1.2.4.4. Web Standards
The Web 2.0 is all about web standards, that increase interoperability of individual web sites. We should not only do this because this is the new cool thing[tm] (Gnome is all about innovation), but because it is actually useful.
Gnome's web presence is not homogenous, we have many independent sites, increasing interoperability is definitely a plus.
Standards to consider here:
- XHTML, XML
- Dublin Core
- Foaf, Doap
- Etc.
1.3. 3. Authoring level
1.3.1. Communication
The web allows interactivity, meaning we can really communicate with our audiences, not just "send" them information, but also recieve feedback.
1.3.2. Content partitioning
To be able to communicate with our audiences, we have to know them. We have to identify our target groups and serve their different needs. Different parts of our web presence may be dedicated to different target groups or different topics, perhaps served by different CMS systems.
Partitioning means to divide our web presence into organizational units. This division can be based on target groups (such as a developer section), functionality (such as a mailing list server), or affiliation (to integrate external entities, such as gnomefiles.org).
Partitioning can be hierarchical. This can be supported by our URL schemes, by assigning DNS domains or subdirectories.
1.3.3. Responsibles
Each partition should have clearly identified responsibles: a team or an individual. Perhaps separate responsibles for editorial and technical (server admin) issues. Contact information to these responsibles must be easy to find.
Their duties include coordinating with higher level partitions, setting local policies, providing content, and delegating duties to others (e.g. give editing rights to registered users).
1.3.4. Licensing
We have to set copyright policies for the content we provide. This also includes disclaimer policies, e.g. for user forums. Copyrights of foreign content, referenced or included, must be obeyed. Different partitions may have different policies.
1.3.5. Validity and Actuality
We must try to assure that the content we provide is valid and actual.
- author identifiable, where appropriate
- use timestamps to indicate when the content was created
- indicate validity time frame for content that expires
- remove or archive expired conent
- avoid broken references (links)
- handle multiple versions for content that changes in time
- multiple URLs for the same content but with different actuality (e.g. current guadec v.s. guadec2006)