Semantics
21 October 2008
Relevance
Before the benefits of a semantically oriented language can be understood it is first important to understand what is semantic and why such is relevant. Semantics is the process of describing the data so that description may be so relevant or functional that it be considered data itself, where data refers to any intended communication. In regards to programming and networking the term data is widely known and understood, but in regards to human linguistics this is not always so.
Human communication is data so long as it is intended for transmission from one human to a computer, system of computers, another human, or any combination thereof. A significant majority of human communication requires feedback or a response although such communication in not required or intended to receive feedback or a response. Some communication may be created intentionally expecting no feedback.
Data communicated by humans is either static or dynamic. Dynamic communication refers to communication that is spontaneous, live, and often interactive such as a spoken argument or an action. Static communication refers to any communication that exists as a storable artifact.
The job of a content description language, such as HTML or MML, is to allow formatting of human static communication in a manner where such formatting provides description to that communication. The difference between HTML and MML is that HTML binds visual associations to its meta-data descriptions so that the tag strong has a semantic description and also bolds text by default. Other default visual associations that are applied with HTML tags are italics to <em> tags, underlines and blue color to <a> tags, and top margins to <p> tags. In MML these visual associations do not uniformly exist by default. The MML specification, paragraph 4.2.1, states that there is no default presentation specified for any tag and software vendors are encouraged to make default presentation definitions independently.
Description
The primary component of semantics is description of data, also known as metadata. Uses of metadata in human communication and data systems is easily understood, but combine those two and use, or even definition, of metadata becomes significantly more ambiguous and less simple. In English, for example of human communication, metadata is conveyed as adjectives and adverbs. In data systems metadata typically refers to database column headings, table names, filenames, command/protocol documentation, and so forth.
The goal of a semantic computer system is to supply human metadata in a method that is relevant to equivocal syntax conformity for computer understanding. For this to be accomplished the source code formatting the human data must be well-formed, well structured, and accurately used. In order for the formatting of the data to be accurate in a markup language tags must be used for their intended meaning and not for presentation, programmability, or other unintended use. MML provides the role attribute to assist document authors in achieving a more specific detail of description than would be allowed by the provided tags.
Context
Context is the one to one or one to many association of relationships from data to data, data to metadata, or a combination of those relationships. This association may exist between data that exists within the same level of a document structure, which is a peer relationship. An example of a one to one peer relationship in English is: red hat, where the adjective red describes the noun hat. An example of a one to many peer relationship in English is: grape leaves, peach leaves, and tomato leaves are color green, where the color green is equally associated to one than one item. The association may also exist between data that exists in different levels of a document structure, which is a structured relationship. An example of a structured relationship in English is the association between headings and paragraphs on this page. In HTML and XML structure is defined by the Document Object Model.
The current prevailing technology for defining context is RDF, Resource Description Framework. RDF creates a context entity that is a formed from a data triplet. The data triplet allows definition of a subject, predicate, and a URI. The subject provides data to be associated. The predicate defines data that describes the subject. The URI provides a unique identification for this relational entity. The beauty of RDF entities is their simplicity, because RDF entities can be structured by using an RDF entity to define a subject or predicate value in a larger RDF entity for establishing structured relationships.
Symbolism
Simple context defined by RDF is important for representing human language. Human language is composed of a hierarchy of symbols and symbolic representation. Letter/glyphs are symbols themselves and have different uses and meaning under difference circumstances. Words are also symbols composed of letters/glyphs that can have multiple meaning and variant spellings. This hierarchy of symbolic usage can be recreated using RDF, but cannot be defined or represented by RDF.
Ontology Systems
RDF Schema language, RDFS, exists to allow expanded description and constraints for RDF triplets. RDFS is ultimately intended to serve as a linguistic interface between the simplicity of RDF and the complexity of ontology languages. Ontology languages exist to create formal structures of linguistic domains so that rules may exist for computers to understand human language in a capacity that may be interpreted by a human. At this point it's not enough to arbitrarily read descriptions of data. The goal of ontology languages is to allow computers to create their own descriptions, decisions, and structures for human oriented communication as output in response to standard human input. This can best be summarized as weak, or incomplete, artificial intelligence.
Additional Benefits
If a document formatted in a markup language is written with semantics in mind it is likely to be extremely accessible. This website, for instance, achieves the maximum certification for accessibility with extremely limited effort after proper code structure was implemented. Semantic documents are also search friendly documents. This is obviously important for pages on the internet that wish to be noticed by search engines. It is perhaps more information for email where many email users receive more mail than they can possibly read. The internationalization effort could also be significantly assisted if semantic conditions were more widely utilized.
Conclusion
If content described by markup languages were always well formed and semantically described/structured much of this technology could be easily implemented with minor knowledge of communication sciences. This technology would also be more wide-spread allowing many application opportunities to document authors that are entirely not plausible under the current internet. Imagine shopping agents that provide suggestions or assistance in accurate reflection of your natural language? This can easily happen, but it will never happen on its own. Support a semantically focused technology, such as MML email or XHTML for webpages, to drive conformance to strict rules and logical structures.
