Digital Scriptorium  

How To: Marking up your transcription

Abstract

Preparation

Beginning a new file for transcription

Marking up a sample transcription

      Second line

      Third line and end of chapter

Abstract

This document presents an introduction to marking up a manuscript transcription using ds3.dtd, the Digital Scriptorium transcription DTD (Document Type Definition), and NoteTab Light, a freeware editing program. It uses MS UCB 152, f. 6r, to walk through the steps of marking up a transcription using the Text Encoding Initiative (TEI) Guidelines. Information on setting up the software is given in the Introduction, the Tag Set briefly discusses TEI-compliant markup, and the Alphabetical List of Element Types offers a description of the tags that comprise ds3.dtd; we suggest that you peruse those documents before continuing with this one. All of the documentation files are included in the ds3.exe package, available here, and are linked by a table of contents.


Preparation

Whether tagging an extant transcription (longhand or typed) or creating an electronic transcription from scratch, the first step is to plan the depth and breadth of information that your transcription will cover, just as for a more traditional endeavor. If you work from a document that results from complex prior decisions, preserving what has already been recorded is fairly straightforward once you become comfortable with what the TEI tags offer and restrict; your working document already includes metainformation, such as line break locations and notes on changes of hand, that you can translate to TEI's structure. If you tag while you transcribe, the markup you apply can be more flexible, but decisions about what to record (and how) rely upon a somewhat keener knowledge of what the TEI tags can help you accomplish. In either case, familiarizing yourself with the markup tags listed in the Alphabetical List of Element Types as part of the planning stage can help ensure that your file tags information consistently throughout. Tagging everything you can is unnecessary, but some experimentation may be helpful in deciding how much you do want to record.


To begin a new file for transcription:

Follow the steps below exactly. When you have finished this procedure, you will be ready to begin marking up your transcription and to enter information about your transcription practices (metadata).

  1. Follow the setup steps in the Introduction under the heading Setting Up the Software.
  2. Launch NoteTab Light, then select the ds3 library from the button row at the bottom of NoteTab's window.
    The left-hand pane displays the ds3 tools and tags. Always have this library selected when you work with the Digital Scriptorium transcription DTD.
  3. If necessary, choose File > New from the menu, so that the right-hand pane is blank.
  4. In the left-hand pane, under the red XML outer wrapper and DTD declaration heading, double-click XSL stylesheet declaration to insert it.
    Two lines appear in the right-hand pane: the first signals to a parser that the file's contents are XML, and the second declares the style sheet's filename and type.
  5. Under the same heading, double-click ds3 dtd declaration and TEI.2 outer wrapper.
    The first line that appears (beginning !DOCTYPE) points to the DTD, or set of rules that govern tag usage; the several subsequent lines define the .jpg file type and call other files containing entity definitions; and the final pair of lines open and close the top-level TEI.2 tag.
  6. Position the cursor on the line just after <TEI.2>.
    Note that 15:1 displays in the lower left corner; 15 is the line number, 1 the position (character number from left to right).
  7. In the left-hand pane, under the Metadata Section heading, double-click teiHeader, basic to insert a bare-bones teiHeader in your file.
    A screenful of tags appears in the right-hand pane; the first and last open and close <teiHeader>. These tags allow you to record information about the text you are transcribing, your transcription methods, and so on. When you are ready to enter metadata, see Identification (TEI Header), Alphabetical List of Element Types, and the sample texts for suggestions.
  8. Position the cursor after <teiHeader> and press the Enter key twice; then, in the left-hand pane, find and insert text, body, and div1.
    Open- and close-tags for these three appear on either side of your cursor.
    Inserting div1 causes a dialog box to appear. If you plan to track division type and number throughout your file, enter values in the type and n fields; for example, to define this first-level division as a chapter numbered one, you can enter "chapter" and "1" in the type and n fields (respectively). Otherwise, choose OK without entering a value.
  9. In the left-hand pane, find and insert paragraph under the Page Layout and Segmentation heading. (You may need to scroll down to see it.)
    The <p> tag appears on either side of your cursor.
  10. Save your work; make sure your new filename ends with the extension .xml.
    You have now completed the minimum steps necessary to create a TEI-compliant file in XML.
  11. Verify that this new file parses correctly: at the top of the left-hand pane, double-click XML Parse.
    See also the To parse an XML document procedure in the Introduction. A new file named error.txt opens; if your file parses correctly, error.txt contains only the name and full path of your file (for example, c:\ds3\document\demo.xml). You can restore your file to view by selecting the tab labeled with your filename.

To apply markup to a sample manuscript transcription:

This example uses the Digital Scriptorium's digitized image of MS UCB 152, f. 6r. Open the DS Test Database manuscript search form (http://sunsite.berkeley.edu/scriptorium/form_msimage.html) in a Web browser, enter "ucb 152" in the Shelfmark field, and choose Run. F. 6r is the third thumbnail on the ensuing page. The medium-sized image is convenient for transcribing from the screen or checking most details, but the small image is suitable for printing on 8.5x11" paper, which may be more practical than repeatedly using Alt-Tab to toggle between NoteTab and your browser. Using a color printer enhances legibility dramatically.

For a summary description of the different levels of transcription supported by the DS transcription DTD, see the Tag Set's Overview. Here we offer a philological transcription according to the Tag Set's terms, which "records the words of the text in enough detail to support text-critical, orthographic, or lexical investigations." We do not record details concerning gaps or erasures, therefore, which would be recorded in a paleographic or codicological transcription. Moreover, in order to get you started, we have not indicated many alternative solutions, though there is frequently more than one way to handle a given problem or situation. The procedure below yields a close line-by-line transcription that can easily be checked against the original. For additional information, see especially Section 35, "Elements," of the TEI Guidelines, P4 release (2002).

  1. Retrieve the image of MS UCB 152, f. 6r, in a format and size that suits you.
    Notice in particular that f. 6r begins with the final words of a rubricated chapter heading, that the capitals beginning some words are touched with yellow (e.g., the "E" of "Erles," line 3), and that a red grease pencil has marked twice near the right margin (lines 8 and 24).
  2. In NoteTab Light, complete the steps in the procedure above, To begin a new file for transcription.
  3. The first step is to identify the leaf you are transcribing. Accordingly, position the cursor to the left of <head>, at the beginning of the line, then find and insert page break (under Page Layout).
    A dialog box appears and prompts you to enter a number. Since this procedure focuses on f. 6r, type a logical value (such as 6 or 6r) and choose OK.
  4. Because f. 6r's first line is part of a chapter heading, you can specify values for its type and n attributes to clarify what sort of division encloses this piece of text. If you plan to track this information, return to the div1 tag; to maintain consistency within your transcription, we suggest that you use attribute values that are evocative and easy to type, such as type="cap" (for capitulum) and n="36" (since this is cap. 36).
    The sixth folio of MS UCB 152 happens to follow two now-missing leaves, so it would be equally accurate to supply the value mid as indication that this chapter does not begin at the beginning. Attribute values for div (at any level, 0-7) are up to the user, not predefined by TEI.
  5. To record the color of the rubric, you can define head's rend attribute to specify the heading's rendering: <head rend="red">.
    Note: The included style sheet renders headings as HTML second-level heads (<h2>) but ignores any color you specify. Specification here thus describes the manuscript's rendering without necessarily corresponding to the output of your transcription file.
  6. Position the cursor between the open- and close-tags of head, then either insert line break (under Page Layout) or simply type <lb/>.
    (If transcribing a text in verse, use the line group (<lg>) container, under Poetry Elements, and insert <l> to enclose each discrete line. Similarly, if your text features more than one column, insert column break and supply its number: e.g., <cb n="1/2"/> for column one of two.)
  7. Begin transcribing the heading's content.
    The first word, "london," offers several choices. The TEI Guidelines make no recommendations about how to record the results of editorial choice; here are a few possibilities.
    • Ascenders of l and d: these would normally not be marked at all in a philological transcription. In the teiHeader (normalization section), you can briefly describe instances where first-line ascenders are decorated.
        If you prefer to note the ascenders where they occur, supply a note immediately after the word: e.g., londone<note>l and d ascenders extend above ruled line</note>.
        Another option is to add a reference pointer, ref, with a target attribute that essentially adds a footnote placed elsewhere in the file. E.g.,
         londone<ref target="asc1">. ...
         ...
         <note id="asc1">l and d feature calligraphic ascenders that extend above the first ruled line.</note>
    • Mark above n: in MS UCB 152, this subpuncted suspension mark generally signifies final -e, final nasal, or final -ne/-me.
        Most simply, if you decide the mark signals an abbreviation, you can insert abbreviation under the Transcriptions heading: <abbr>london</abbr>. If you feel more editorially assertive, use expansion to record your interpretation: london<expan>e</expan>. Note that <expan> should enclose only the expanded letters.
        Alternatively, you can use expansion with type (<expan type="">) to record not only the manuscript's letters but what sort of mark has triggered your editorial expansion; see also the normalization and segmentation sections in the UCB 152 sample under editorialDecl. Since the attribute values for type are up to the user, choose something you find memorable as well as simple to type. Here, e.g., one might use the NoteTab library's sample value susp, for "suspension"--london<expan type="susp">e</expan>--or, bearing in mind that both plain and subpuncted marks appear in this manuscript, london<expan type="suspdot">e</expan>.
        If your transcription silently expands all abbreviations and views them as such, no tag is necessary to mark your added e.
        If you view the n-plus-mark as a single glyph, you can define a character entity in the teiHeader, such as &nsdot; for n-suspension-dot; for more information, see Characters and Abbreviations in the Tag Set and consult a thorough guide to XML.
  8. Complete the first line. (Use expan to expand the capitulum.)
    At the line's end, the chapter number offers another choice: the TEI Guidelines suggest that all roman numerals be supplied with arabic equivalents. Use num and its value attribute to satisfy this. You can also use <hi> to highlight the supralinear "m".
    If you use sup to indicate "supralinear" and distinguish what appears (the supralinear "m") from what you supply out of the superscript (the intervening letters "pitulu"), here is one way to end the line: Ca<expan type="ss">pitulu</expan><hi rend="sup">m</hi> <num value="36">xxxvj<hi rend="sup">m</hi></num>.
  9. Since the heading ends on this line, make sure that </head> is placed appropriately after the full stop.
  10. Save your work.


  11. Second line:
  12. Notice that the <p> you inserted previously opens on the next line. This step involves how you conceive of the manuscript's textual structure. Elsewhere, MS UCB 152 includes more than one piece of text set off by an initial under a single heading; also, in longer chapters (e.g., on f. 1r), blue and red paraphs break up the text that follows an initial. To record these levels in your transcription's structure, optionally insert div2 so that it encloses <p>.

    If you omit div2, this structure results:
       <div1 type="cap">
           <head>heading</head>
           <p>paragraph text</p>
           <p>another paragraph</p>
       </div1>

    If you add div2, this hypothetical structure becomes possible:
       <div1 type="cap">
           <head>heading</head>
           <div2 type="init">
               <p>paragraph text</p>
               <p>another paragraph</p>
           </div2>
           <div2 type="init">
               <p>paragraph text</p>
           </div2>
       </div1>
  13. Position the cursor immediately to the right of <p>, then insert <lb/>.
  14. The blue initial offers another choice of how much detail to record. For such initials, the UCB 152 sample uses the following:
       <div2 type="init"><p>
       <lb/><hi rend="initb">A</hi>nd
    Here, init indicates that the division begins with an initial; the rendering initb records the capital's blue color. These attribute values are user-defined, and one could as easily mark the letter 2b-filr to specify a two-line letter height and red filigree; in this manuscript blue initials are always accompanied by red filigree and gold initials by blue, so the -filr part may be considered redundant in a transcription focused more closely on the text's content than its presentation.
  15. After resolving how to record the initial A, continue transcribing line two.
    Additional points in this line that request editorial choice include the following, none of which is necessary to tag in a philological transcription: double crossed l ("bifell," "gentill"), the ligatured double p in "vppon," the lack of space between "a" and "daie" (which modern readers recognize as separate words), and the calligraphic ascender and ligatured -es in the line's final word ("kinges").
    It is advisable to mark editorial separation between "a" and "daie." Use either corr to emphasize editorial intervention or sic to emphasize the manuscript text's priority:
       a<sic corr=" "></sic>daie   or
       a<corr sic=""> </corr>daie
    Note: The w tag is defined in the TEI Guidelines as a tag that marks where word breaks occur. Phrase-level elements like <add> and <corr> cannot be used in conjunction with <w>, however, which limits consistency of its use.


  16. Third line and end of chapter:
  17. Upon completing line two, press Enter, then insert <lb/>.
  18. In the left margin, a later hand has made a brief addition. Here is one way to record it, using unk (unknown) as the hand ID and &longs; for the long s in "housholde":
       <lb/><add place="left" hand="unk">e.ii.<hi rend="sl">b</hi></add>hou&longs;holde
    Note: Hands and their IDs are defined in the teiHeader; for an example, see handList in the UCB 152 sample under profileDesc.
  19. The first word's fifth letter appears to be originally b, now h. Regardless of whether you view the final stroke as the main scribe's or as a later addition, optionally add a note to record the change.
    If the b were visibly marked for deletion, another option would be to use the add and del tags, hypothetically thus:
       hou&longs;<del>b</del><add>h</add>olde
    The disadvantage to using add and del is that the result displays b and h sequentially, not superimposed.
  20. Continue transcribing the third line and the rest of the section. When you reach line 19, which begins, "ýere of," transcribe to the end of the brown-ink portion.
  21. Position the paragraph close-tag (</p>) and div close-tag (</div2> if you chose to use it, else </div1>) immediately after the punctus.
  22. Insert a div of the same level you used for chapter 36 and specify its type if appropriate. (See above, step 10.)
  23. Position the cursor between the open- and close-tags of head, as you did previously, and transcribe the heading.
    As with paragraph text, insert <lb/> where the heading breaks across a line.
  24. Record the chapter number, which appears in the right margin in the main scribe's hand, slightly above the line following the heading. For example, if you consider the chapter number an intrinsic part of the heading that happens to occur on a separate line:
       <head>...
       <lb/>... kyng.<add place="right"><lb/>ca...</add></head>
       <div2 type="init"><p>
       <lb/><hi rend="initg">A</hi>fter ...
  25. Save your work.


  Behind The Scenes Home  

About Behind the Scenes    

Last published: 2007-12-10
© Columbia University Libraries