The last two weeks I have been creating MODS and TEI XML files for issues 1.1 and 1.2 of The Dilettante. Although I am still somewhat unsure [maybe say instead: "somewhat unsure"] what the MODS files actually do, by the end of these two weeks, I can pretty confidently navigate my way through them. These XML, which I’m creating with the help of oXygen’s XML editor, will allow users of the MJP website to conduct metadata and full-text searches of the contents of the Dilettante, after they’re added to the MJP’s database.
The process of creating the MODS files involved opening the PDFs created in ABBYY FineReader and translating the bookmarked sections into an XML file. Mark provided me with stylesheets, templates, and examples of documents with proper MODS notation. A majority of my learning came from deduction and emulation. I quickly picked up on the structure of XML which involves a series of opening and closing tags. Below is an example of a tagged piece of fiction along with an illustration that accompanies the text in the magazine. <mods:relatedItem type="constituent"> <mods:titleInfo> <mods:title>How It Feels to Be Rich</mods:title> </mods:titleInfo> <mods:genre authority="aat">fiction</mods:genre> <mods:part> <mods:extent unit="pages"> <mods:start>1</mods:start> <mods:end>9</mods:end> </mods:extent> </mods:part> <mods:relatedItem type="constituent"> <mods:titleInfo> <mods:title>Men in Pursuit of Money</mods:title> </mods:titleInfo> <mods:note>Title created by MJP cataloguer</mods:note> <mods:genre authority="aat">images</mods:genre> <mods:part> <mods:extent unit="pages"> <mods:start>9</mods:start> <mods:end>9</mods:end> </mods:extent> </mods:part> </mods:relatedItem> </mods:relatedItem> I began labeling sections of the literary magazine based on genre or function. The template Mark gave me proved extremely helpful and allowed me to work mostly with pre-created genre labels. However, various works in The Dilettante did not fit these pre-made templates. Some adaptations were as simple as removing the author sequence when there was no author in the magazine listed. For others, I was creating titles and subtitles, or I was adding notes to clarify specific attributes. For the few images that appear in The Dilettante, I had to create titles and notes. In most cases the titles were just the caption that appeared below the image in the magazine, but I also came up with names that described the image. I then had to leave a MODS note that informed the reader that the title was created by me. Once the bibliographic information was recorded in the MODS XML file, I moved on to create another XML file following TEI (Text Encoding Initiative) conventions. In this file, I simply pasted the text from the .txt file I created through ABBYY and tagged the elements it contained. The labels correlated with both the MODS XML file and the bookmarked PDF in FineReader; the only thing different was the XML language I used to tag the pieces of The Dilettante. Doing this allows parts of the transcript text to be identified by genre. This experience so far was the most rewarding because I was not simply following a list of instructions. I was given an end goal (examples of completed MODS and TEI XML files) and I was to recreate that. I learned a lot from being thrown into the process, but I also was able to have questions answered by email whenever they arose.
0 Comments
Like any other project, our task with the MJP is vulnerable to difficulties. The past two weeks have been been consumed by troubleshooting our OCRing software, ABBYY FineReader Pro. At the end of my last post, my progress was gaining steam. I had successfully converted these image scans of The Dilettante issues 1.1 and 1.2 into edited, compiled PDFs ready for OCRing. OCRing stands for optical character recognition. It is the process that ultimately allows the reader to search through a document for key words and phrases. It is a popular tool used frequently today, and it allows for easy navigation. The process involves the use of an OCR software such as ABBYY FineReader Pro. Two weeks ago, I began the process by loading my TIFF files into ABBYY. I had the program zone the areas of text I wanted it to read, and then I let it analyze/read the text for me. The expected output from ABBYY was a PDF with an editable transcript right next to it. This is where I ran into trouble. ABBYY could create the PDF, but there was no way to edit the transcript it analyzed from it. Any mistakes it made while OCRing the TIFF images could not be changed. Mark, my professor, only has knowledge of the Windows version of ABBYY so I turned to other resources in attempt to figure it out. Two weeks later, I was basically in the same rut I was in at the beginning. I had learned that the ABBYY FineReader for Mac does not have all of the capabilities the Windows version has; The most important one being the ability to edit the text transcript after reading. Mark and I made the decision to continue our work and not spend any longer on this issue. ABBYY FineReader Pro's percent error in its OCR'd data is very low and a majority of pages did not have any errors. Our compromise to combat the inability to edit the PDF text was to create two files through ABBYY (which is normal procedure), but only edit the ".txt" file, leaving the PDF and its unfixable errors alone. The process of copy-editing the OCR'd text (in the .txt file) was more interesting than the copy-editing I am used to. I was generally looking for words that were copied incorrectly or misinterpreted by the program. Although there were many simple ones where a letter was added or omitted, some cases called for major change. Three full words in a row would be jumbled with strange punctuation marks attempting to form letters. A common error in the program was the combination of the letter "i" and an apostrophe to create the letter "r." While editing the OCR'd text, I also noticed some mistakes found within the literary magazine itself. Spelling mistakes that are also found in the scanned PDF. Of course, it was much more difficult to copy-edit when Microsoft Word wasn't around. I also noticed some intentional spelling choices that have changed between then and now. Words like perseverance being spelled "perserverence" and harass being spelled "harrass" consistently throughout the magazine. Errors like these I obviously did not change because they are found within the actual text and may have something to say about the era and location this magazine was created in. After this two week lull, I know can begin moving forward to 1.3 and 1.4. My goal for this blog is to document my experience as I delve into the world of fadazines and ephemeral bibelots through The Dilettante.
The Dilettante is a little magazine that was published in Spokane, Washington in April, 1898 and ended in 1901. Very little is known about this magazine. It was published on a fairly irregular basis and never lasted more than a few years. My primary job is to rescue this little magazine by digitizing and uploading scanned versions of The Dilettante to the Modernist Journals Project. My secondary job, is to learn as much as I can from this mysterious collection of literary history. I started off my primary objective this week by converting the scanned PDFs of The Dilettante to readable pages. I tilted, cropped, edited, and split the files for the first two volumes. Keeping in mind specific book stitching and various design elements of the magazine, I had to be careful of what I edited. Does including the red page binding in the center of the page really matter to the reader’s understanding of the magazine? I was able to explore this idea further as I read up on the Modernist Journals Project’s guide on how to read magazines. Ultimately, there doesn’t seem to be an obvious, clear way to read magazines. Although there are many ways you can read a magazine, I think the beauty in magazines comes from the “no instructions required” aspect. I also had the opportunity this week to read some articles concerning other little magazines that had really made a name for themselves years down the road. Published out of San Fransisco, Le Petit Journal de Refusées carved out a new style for print magazines. Although Le Petit only had one issue, not one magazine in that issue had the same design. The covers were printed on used wallpaper, creating unique and intricate designs on each issue. The page shapes were trapezoidal rather than rectangular, and the content was bizarre; it contained many inside jokes with their sister magazine The Lark. Reading through these articles, I realized not only the importance of this era to the magazine publishing world, but I also learned how important magazines were to this era in history. The literary magazines published in the 1890s may have fueled the Modernism movement in an interesting way. In Le Petit’s case, it wasn’t the literary content that pushed modernism, but more so the blatant disregard for traditional publishing style. The literary works did not fall into the “critically acceptable traditions” of modern art, yet the magazine still pushed through new barriers in art. The page layout and design style were almost criticizing modernism itself. As I continue my work with The Dilettante, I hope to get a better understanding of the world in which it was made. I think these little magazines reflect local culture better than any historical documentation can. Their work is created, chosen, and distributed by the members of their communities for members of their communities. I expect to be able to know more about The Dilettante by the end of this year than anyone alive today. As I continue my work this week, I excitingly approach the responsibility of cataloguing this forgotten magazine. |