TLCMap Recogito Get Started Guide
Pelagios Recogito have a 10 Minute Tutorial but here are a few quick basics, and information about the new TLCMap features for handling Corpora (in Folders), adding places not already in the gazetteer, time attributes and more export options for use in other systems.
TLCMap adds functionality and enhancements to Recogito that we see as essential. This is done on our own Recogito set up, seperate to the main one. We anticipate these two will be merged back together in time, without losing your data. In the mean time, use the TLCMap instance if you want to:
- Use the Gazetteer of Historical Australian Placenames in NER auto recognition of places.
- Identify and add places that aren't already in a gazetteer.
- Handle a corpus of texts.
- Upload metadata from a spreadsheet for a large corpus of texts.
- View and let others view the map and texts side by side.
- Export the places you add, and or the place and time information about the texts in the corpus as a KML file, for further analysis and comparison in other systems - with a link back to your side by side text visualisation.
Place: each placename that occurs in the text. When exporting by place if a place is mentioned 3 times, there will only one entry in the output for the one place.
- Annotation: each mention, instance or 'annotation' of the placename. When exporting by annotation if a place is mentioned three times, there will be three entries for it.
- Start Date and End Date: These are optional and their meaning of these terms is dependant on the context and what it relevant to your project so we do not attempt to define or constrain their meaning. For example, for a corpus or file it could be the historical time period it relates to or the publication dates. For a place it might be the time the place was founded, or the extent of time that name was in use before it changed, or the length of time it was occupied by a historical figure etc. By default annotations are set to have the start and end date of the file that they occur in, but can be edited case by case if desired. This is so that when exporting annotations for a large corpus changes in mentions of place over time can be easily handled without doing data entry for every single one.
- KML - A common open standard, XML based mapping format. It was developed by Google and works in Google mapping products, but has been widely adopted in other systems.
- CSV - A 'Comma Seperated Values' file which stores tabular data and can be easily opened in, or saved from Excel (choose 'Save-as' and the 'csv' file format option). If you opened a file in a text editor you would see on each line your data seperated by commas, something like this: 'Newcastle,-32.925178,151.782942,A town in Australia'
- GeoJSON - Another common open standard format, not attached to any specific vendor, and very useful for web applications.
This step by step guide describes a basic common task of identifying places in a text using recogito, from beginning to end.
Recogito is open source software produced by Pelagios. The TLCMap development team is working on a seperate version to add new features. These will all be incorporated back into the main version when the project concludes. You work will be retained.
Create Plain Text File
The input to Recogito is 'plain text'. This means only the text is saved and no formatting, tables, images or other information. A plain text file has a '.txt' file extension. There's a few easy ways to get a plain text document:
Copy and paste text from anywhere (Word, webpage, PDF) into Notepad or a plain text editor (such as Komodo or Notepad++) and save it as a .txt file.
- In a MS Word document go 'Save As' and for the file type, choose 'plain text (*.txt)'. It will give you a few options. The best encoding to use is Unicode UTF8, because it is compatible across platforms and handles almost every language and special characters. If your text is in English it probably doesn't matter and you can just accept defaults.
Add File To Recogito
Log in to Recogito
Click the large blue 'New' button at top right.
Choose 'File Upload' and find your .txt file.
When added it will appear in your list of documents.
Auto Detect Place Names
To autodetect places that you can then correct do Named Entity Recognition, as follows:
Click the file name so it is highlighted in blue (Don't double click to open the file yet. If you do you can always come back to it.)
Click the blue 'options' button at top right and choose 'Named Entity Recognition'.
You can choose a recognition engine for a few languages. You can also choose to use all the Gazetteers (big lists of place names) or one or more when finding places in the text.
Click 'Start NER'. For a small file (a few thousand words) it should be almost instant. For a book length file it can take a minute or a few.
When it is done, close the little NER box.
Correct Placenames in Text
You can now correct the places in the text. Placename detection always has quite a few errors, but it saves us a lot of time if we were to do it all manually.
Double click the file to open it.
Places and peoples names detected are highlighted in grey. You can select them and then choose 'Place'. You can then search for and find or confirm the correct place. It will then give an option to apply this confirmation to all the instances of that place in the text, or just this one.
Correcting places in a large text can be time consuming. If you click on the spanner and screwdriver icon for Document Settings, you can share the document with other people. Many hands make light work.
As you go, and when you are finished you can view the document and the map together in two ways.
Click on the 'Map View' icon at the top to see a map with clickable dots showing the places identified.
Click the icon next to it, with two rectangles to look at the text and the map side by side. Clicking a place in text goes to that place on the map, and vice versa.
Once you have identified places in the text to your satisfaction, if you want you can go to Document Setting and set the map to public:
Go to Document Settings
Go to Metadata at the left and choose a licence.
You can then copy and paste the link from the address bar to others, or link to it from web pages and social media.
Backup And Archive Work
- You can go to 'Document Settings' and 'backup'.
Importantly, you can click the 'Download' icon at the top and choose a few options. For example you can choose KML to download and open in other TLCMap systems, and other mapping systems, such as Google Earth. If you identified many new placenames we'd appreciate it if you contribute it to the Gazetteer. You can add it to Temporal Earth to compare with other data, etc.
Deposit data, such as the KML file, in an official repository, so it can be found and used by others.
New TLCMap Features - Corpora, Additional Places, Export Options
To Create A Corpus
Files in a folder can be processed as a corpus. This is useful where you have a large amount of small text, such as 100 newspaper articles, that you can process all at once, and regard as a whole.
- Log in
- Click the blue '+New' button to create a folder (eg: I have each chapter of Watkin Tench as a text file, so that I can attach a date to each chapter, so that I can see patterns of change in places mentioned by chapter over time, and I want to treat them all as a corpus, so I call my new folder 'WatkinTench'.)
- Ensuring you are in the folder, by checking the breadcrumbs at top of the page(eg: 'My Documents > WatkinTench'), drag and drop your text files on the page.
Add Metadata To Files
Document metadata will be exported in your results. This can be useful for further analysis or display in other systems.
- To edit the metadata for a file, double click the file if it is not already open, so that you can see the text.
- Click the spanner and screwdriver icon at the top to see the 'Document Settings', edit and save.
- If you have a large corpus of texts you can upload a CSV spreadsheet of metadata. This can be handy if you have many texts, such as 100 newspaper articles with dates and publication information, etc. You might have these in Excel already, or find it faster to do the data entry in Excel (in Excel, simply 'save as' and choose CSV as the file type).
- Go to the folder that holds your corpus and click 'Download CSV metadata'. Now you have a spreadsheet you can open in Excel in the right format to fill out.
- Fill in the data, perhaps copy and pasting from some other source.
- If you are not sure of valid values for some columns such as 'Licence', set it manually in the Document Settings and export, then copy that value.
- For StartDate and EndDate two formats are allowed: dd-mm-yyyy (eg: 1867-03-27) or dd/mm/yyyy (eg: 27/03/1867) This is because Excel sometimes auto 'corrects' the format. TIP: always double check Excel is not converting your dates to US format which might treat dates such as January the 4th as April the 1st).
- Click the button to upload the CSV.
Note: NER will be performed on any file you have highlighted. Keep in mind that this may overwrite any manual changes you have made. In the first instance you probably want to do the whole corpus, but if you add another file later, you probably want to only select that file.
- To select all, select the first file, hold shift and select the last file.
- Under the blue options menu, choose 'Named Entity Recognition'.
- You can choose from some languages the text is in, and the gazetteers that will be used to resolve placenames. You can simply use them all, but you may want to be specific. Eg: if you are working not working on ancient Europe, you might improve the accuracy of place name matching by excluding these texts (ie: 'Roma' will have a better chance of matching the town in Queensland if you choose the Australian gazetteer, and exclude gazetteers of ancient Europe). Note that you can include the 'gazetteer' of 'User Contributions'. These are places added to Recogito by users because they were not already in a gazetteer. These are not from an official or vetted gazetteer - use of them is your choice.
- Click "Start NER". For a small text file of a few paragraphs this will be immediate. For large files, such as a whole book this could take a few minutes.
- When complete the Title of files that have had NER done are highlighted orange.
Check and Correct
- Double click the file to open it.
- NER automatically identified places are highlighted in grey.
- You can click on them to confirm or change the identified place.
- Click 'Change' to choose from possible alternatives found in the gazetteers. This will prompt you to make all places in the text that place, or just this instance.
- If the place is not in any gazetteer, click 'Create place' at the top of the window. This will ask you for some details about the place. The minimum is the name and coordinates.
- When confirmed a place is highlighted in Green.
- If a place has not been identified select the word or phrase and click 'Place'. The options to confirm, change, or create a place are as above.
- You can view the map and annotations in two ways.
- The second icon at the top of the page provides the map and if you click it you can see the annotations with some surrounding text, and click to go to the text.
- The third icon at the top is the TLCMap addition allowing you to see the map and text side by side, for the whole corpus.
- Click a place in the text to go to that place on the map.
- Click the annotation in the map pop-up to got to that location in the text.
Data can be exported in standard formats for further analysis, visualisation or archiving in other systems.
- Open a file. (If you want to export a corpus, just open a file within that folder.)
Click the 'Download Options' icon at the top of the page,
TLCMap has added options so data can be exported with some of the following variations:
- in the format KML, CSV or GeoJSON
- at the level of file or corpus
- listing by 'place' or by 'annotation'. (in some cases with each annotation is listed under the place)
If you need to convert the file from one format to another, there are GIS file converters on the web.
Import the file into another system, such as:
- Another GIS for further work.
- Excel for further analysis or working with other data.
- Ordinal Time to visualise how places change by order of occurence in text.
- Quick Coordinates to transform the data into a journey.
- Temporal Earth to view the data over time, or journeys (journey format exported from Quick Coordinates).
- STMetrics to compare basic statistical information, and identify clusters in space and time.
- GHAP to add any place names identified, or at least these attestations, and so others can find it.
- Heurist to build a database around it.
- Archive it with your research with ROCrate and Describo.