You are here

Technical considerations


Conversion by program: As described in more detail in the Content overview, the documents on this Kairos site derive from a set of articles held in PHP format on the related Laetus-in-Praesens website. These typically longer "articles" have been split into the shorter "documents" imported into the free and open-source content management system Drupal.

The approach taken to the conversion of PHP articles into Drupal records is one which reflects the biases of a programmer exploring solutions such as to avoid manual copying of portions of those articles into the content management system (CMS). The challenge was whether the conversion could be extensively ensured by program, notably in order to extract other information and build that into other records to be imported into the CMS. The intention was therefore to build many of the CMS records prior to importing rather than endeavour to generate additional content types within the CMS facility. The approach was framed in this way because of extensive expertise in manipulation of text with a DOS-based application -- and little expertise in the PHP-related programming required for the CMS.

The approach was successively enabled using a suite of programs developed with the Advanced Revelation (AREV) application through a DOS box in Windows XP. The resulting pre-formatted flatfiles (CSVs) were then imported into Drupal 6 and later Drupal 7. This required some ingenuity to circumvent a 64k constraint on record length within AREV for documents of greater size. Cessation of support for Windows XP, encouraged a shift to OpenInsight, a later variant of AREV. The upgrade to the Kairos site in December 2018 was enabled using OpenInsight 9.4. Considerations is being given to upgrading that facility to OpenInsight 10.04.

Article formatting: Advantage has been taken of the fact that the PHP articles were in a format which had remained standard and relatively stable over decades, both from the earliest (in the 1960s), and since first placed onto a website (in HTML format) in the early 1990s. Articles from earlier periods were adapted to that format as they were digitized. The key factor enabling conversion was the presence in those files of HTML title delimiters defining the sub-titles of what could then be split out as Drupal records. The conversion challenge was defined such as to avoid any additional mark-up -- otherwise required to facilitate the process, using programming "tricks" to circumvent anomalies. This could well be described as a less than efficient process (if not stupid!), but it did offer some nice programming challenges for someone anxious to avoid manual manipulation (at all costs!).

Retaining relationship to original version: The conversion challenge was also seen as a means of preserving a degree of complementarity between the PHP articles in the Laetus facility and the variant on the Kairos facility. The intention was not to switch to writing articles within the Drupal CMS, or updating them there, since it has been far more convenient to continue the process of writing/editing of the PHP variants within the Laetus facility using Dreamweaver. This was one reason for using record (node) identifiers within the CMS based on the original PHP file/folder name -- rather than switch to a numeric node identifier as is most commonly the case for a CMS.

Note that it is the Laetus version which is considered to be the master copy. Only minor editing is done exceptionally on documents in the Drupal context -- most notably in the event of conversion issues which have not been resolved by program. Errors emerging in the Drupal may well be used as a means of detecting and correcting errors in the Laetus version or in the conversion scripts.

Constructing records for import: Building the various CMS record types prior to import, rather than depending on (absent) Drupal skills to manipulate the basic imported documents, has meant that new record types can be created and populated as required in order to enhance the CMS facility. Of particular interest are those relating to the pattern of links.

Drupal node import: Of interest in the strategy adopted is the constraint imposed by the state of development of the Drupal "node import facility". Basically the options available for updating any node of a particular content type by the import process are to delete such nodes individually, or in a batch process (VBO) -- and then to import the corrected set of nodes into the IDs thereby made "free". The provisions for "overwriting" a node, without prior deletion, have been progressively developed stage within the Drupal community -- but primarily for numeric nodes. In practice this means that it is easy to batch delete all the bibliographic reference records, or the associated author records, and then to re-import a set generated from PHP articles after corrections (in the light of errors that became apparent from Drupal sorts). The advantage of the alphanumeric node naming system is that the links to the other documents are not affected by this process since the pointers from those documents remain valid.

Upgrading and adaptation:

  • Drupal content management system:

    • Drupal 6: The Kairos facility has functioned successfully within the Drupal 6 content management system for 7 years.

    • Drupal 7: A much delayed necessary upgrade to Drupal 7 was implemented in January 2017. The process of doing so left a number of relatively minor issues unresolved. These include breaks in some documents due to formatting issues during the import into Drupal 7. In particular totalling of numeric fields in search results is no longer working. It was considered more appropriate to avoid delay in the upgrade to Drupal 7 and to repair such issues at a later date (with others which may become apparent). With respect to errors related to formatting, users are always able to revert to the original document on the Laetus-in-Praesens site. A further major upgrade has been made implemented in December 2018.

    • Drupal 8/9?: Consideration has been given to switching to Drupal 8, and experimental upgrades have been undertaken. The issue faced by many is whether the switch from the proven viabilty of Drupal 7 is appropriate, and when -- especially in the light of the anticipated subsequent upgrade to Drupal 9. The improvements within Drupal 7 in 2018, have been seen as usefully enabling any such later transition.

  • HTTP to HTTPS: Although seemingly of marginal significance, the switch from HTTP to the security certification of HTTPS contributed to other transitional difficulties, especially in the related Laetus site with its add-on domains.

  • HTML 4 to HTML5: Web pages of any age tend to be characterized by use of features which are increasingly deprecated in the light of the more stringent requirements of HTML5. Multiple errors and warnings are signalled by use of any code validation applications. It is very fortunate that browsers tend to be tolerant of many deprecated features. The issue is how to navigate the transition to HTML standards, for what specific features, and when. This is a continuing concern.

  • PHP 6 to 7.1: This server-side scripting language has been in progressive development since 2000. The difficulty in the development of websites has arisen from failure to update modules with features deprecated in the upgraded versions of PHP -- and a failure to exploit their enhanced efficiency. This failire otably engendered errors on the Laetus site with respect to to the operation of menus. Concerns remain as to whether use can be made of PHP 7.2 -- a matter to which development of Drupal has been sensitive.

  • Windows: from 7 to 10: Many users of the Windows platform are aware of the issues engendered by the unpredictable consequences of "invasive" upgrades. These have been notewoethy for the manner in which they rendered inoperable various animations and visualizations operating on the Laetus site (especially early demonstrations of virtual reality possibilities). In the case of the Laetus/Kairos processes, this required use of an outdated version of Windows long after it was no longer maintained -- in order to benefit from the DOS box facility required to operate the DOS-based AREV facility (as noted above). This constraint has been circumvented by use of the OpenInsight variant. The npredictable consequence of upgrades is necessarily a contiuing concern.

  • Migration/Import from Laetus to Drupal: The ability to update the Kairos site has been considerably delayed because of constraints on the Drupal migration facility resulting from the continuing use of alphanumeric nodes. The justification for this use was noted above. A further factor is that search engines have long provided access to Kairos documents using their URLs -- making it inappropriate to switch to any other modality. The Drupal implementation for Kairos is considered to have broken some fundamental design rules, being ambiguous by design -- most notably with respect to its use of record types. This has ruled out more conventional solutions to the problems encountered.

  • Server-related issues: A successful switch in server over the past year also introduced unforeseen delays because of the manner of attribution of security certificates for HTTPS.

Character encoding issues: Working with articles published as early as the 1960s, making extensive use of accented characters, has required a degree of flexibility in adapting the conversion to handle characters which pre-date the currently favoured UTF-8 standard. Some of the articles are in French and other languages. Many cite authors and articles requiring such characters. Although the conversion process enabled some of these anomalies to be "corrected", These issues have not been completely resolved and are reflected in the contrating ways in which Drupal itslef handles such encoding in different contexts.

Augmenting access possibilities: A major motivation for exploring a CMS variant was to segment the longer PHP files (some over 150k) into more "readable" forms as CMS records. This was seen as particularly valuable in that the sub-titles attributed to the HTML title-delimited segments were interesting to extract in order to benefit from the Drupal Views facility, in addition to enabling more speciic access via search engines.

Benefitting from extensive hyperlinking: A significant characteristic of the PHP articles is the degree of hyperlinking between them. The conversion was designed to derive further information from this pattern of links, notably by generating "checklists" of citations "from" and "to" the CMS records. Unfortunately, as noted above, the links have been enabled to the "main" document introducing a set, and not to the individual documents of the set. This could be improved in the future.