Searching physical archives
Visually searching documents (and especially microfilms) is tiring on the eyes, so give your eyes a rest occasionally. It is also easy to miss things, and there is a trade-off between scanning rapidly to cover more ground and looking more carefully at every heading and/or through every paragraph. If you know what you are looking for it will help. For example if you are looking for the record of an event in a journal, and you know when it happened then if you miss it on the first scan you may go back and try again, which you would not otherwise do if you are just looking for ‘any mention of someone or something’ without knowing when or if there is any report. Even if you don’t know for sure that a report or article appeared in a publication, you may be able to reduce the search task by getting to know whereabouts in each edition such things normally appear, and focus on that part.
Searching scanned documents
Increasingly, historic documents are being scanned to produce an image that can be viewed digitally, and in many cases (but not all) the image has also been analysed using OCR (Optical Character Recognition) to generate a digital text file that can be searched electronically for words or phrases. You don’t normally see the converted text because it is hidden behind the scanned image of the page. But when you appear to be selecting and copying what you can see, you are actually selecting and copying the corresponding text that is hidden behind the image.
When you search the document digitally you get a match if (and only if) the text you are seeking is in the hidden converted text. If the OCR is perfect, then the hidden text is the same as the image that you see. For the vast majority of scanned words it is – but OCR isn’t perfect and occasional characters can be misinterpreted. If a misinterpreted character is in a word you are looking for then the search won’t find it because it won’t match.
Some real examples are:
ful! (full), o f (of), cau.se (cause), N e w o a s tle (Newcastle), WoonPToCK RO’D (Woodstock Road),
Sp.ce (Spice), MINEHIAD, 80MERSET (MINEHEAD SOMERSET), practical!y (practically).
Words or phrases that span a line break appear as separate parts that might not be recognised together, especially in a multi-column document if the OCR text isn’t structured in corresponding blocks.
The OCR result can be proof read and corrected, but that is expensive so not usually done. OCR software has improved over the years – more recently scanned documents should have far fewer (but not zero) errors.
OCR errors undermine the reliability of digital searches. Finding something means it is there, but not finding it doesn’t mean it’s not there. It could be there but with one or more corrupted characters.
To improve the chances of a hit you can try using a partial search term and then visually scanning the results to eliminate any that aren’t relevant. For example if you are looking for ‘Williamson’ you could try searching for ‘Willia’ or ’iamso’. The first will also find ‘William‘, ’Williams’ and McWilliam but the latter will probably only find ‘Williamson’.
A (real) example of using this technique was searching for Minehead in one pre-war year of The Ringing World. The search term ‘minehead’ gave 28 hits but the search term ‘mineh’ gave 47 hits (including several ‘MINEHIAD’, ‘MINEHIAO’, ‘MINEHtAO’ and ‘MINEHiA’).
Modern documents that have been electronically produced don’t rely on OCR because the text is normally embedded in the original document. That makes searching more reliable, so (give or take any spelling mistakes) you should be able to find every instance of what you are looking for.
Searching the Web
There is a huge amount of information on the web – the problem is finding what you want. There are three basic ways to find things:
- If you know the website you need – go straight to it. For example if you are looking for ringers killed in war, go to: rolls.cccbr.org.uk/.
- If you know where to look but don’t know the website – use a directory page, for example: There are links to ringing society websites at: cccbr.org.uk/about/societies/ (and for CC affiliated societies details of representatives who may be able to help you). There are links to many other ringing related websites at: ringing.info/. There are links to family history societies at: ffhs.org.uk/.
- If you don’t know where to look use a search engine, which gives access to a very big index developed by software that crawls round the web following links between pages and websites.
There are many search engines. Google search is most widely used (often pre-installed on computers) but there are many others. None of them has a complete index of all that exists, and they use different rules to sort the results, so the same search may produce different lists of possible hits. They also differ slightly in the tools for you to filter results, and in how much information they record about you and what you are doing.
There is a list of search engines at: en.wikipedia.org/wiki/List_of_
There is a comparison of many of them at: en.wikipedia.org/wiki/
Whichever search engine you use, how you use it will have a big influence on how quickly you find what you want, or whether you find it at all. Advice on basic web search techniques is available from several places on the Web, for example: techrepublic.com/blog/10-
One thing to bear in mind is that while the web was originally developed for information sharing between researchers it has since been swamped by commercial interests and mass market usage. Search engines have adapted to this, so what appears high in your search results will be heavily biased towards what other people are interested in and things people are trying to sell. To compensate for that bias you may need a bit of cunning – unless you fancy looking through hundreds or thousands of hits rather than a few dozen.
Success when looking for a person’s name depends on how common the name is – some names give millions of hits. Unusual ones give fewer, but even then you may still need a bit of effort to find the right person.
Example 1 – Suppose you knew nothing about Hezekiah Briggs other than his name.
Searching for [ hezekiah briggs ] finds around 160,000 results, which you can cut down by adding quotes to get the exact phrase [ “hezekiah briggs” ] but that still gives over 200 results – too many to examine. Limiting the search to the UK brings it down to around 20, which is few enough to scan by eye and go to any that look relevant. In this example several references do in fact relate to the early 19th century Bingley ringer.
If you already knew the Bingley connection then you could have included that in the search. [ “hezekiah briggs” bingley ] gives around 130 hits, and limiting it to UK gives about 10, all of which relate to him. If you didn’t already know the Bingley connection then finding it gives you another useful search term.
Example 2 – Suppose you were looking for John Smith. That would be a lot harder since [ “john smith” ] gives over 20 million hits (about 4 million in UK). You could reduce that by including some ringing related words in the search. For example a UK only search for [ “john smith” bell ringer ] gives around 250,000 and [ “john smith” bell tower ] gives around 50,000.
You might spot references to John Smith clockmakers of Derby, and assuming you don’t want them you could eliminate them by using the ‘-’ prefix. [ “john smith” bell tower -clock ] gives around 30,000 hits and [ “john smith” bell ringer tower -clock ] gives around 8,000 hits – still too many to look at more than a few. It may be worth scanning the first few pages of hits, but unless you are lucky you will need more information.
Going a bit further, [ “john smith” “bell ringer” tower -clock ] brings it down to 300, which is just about manageable, but again it would eliminate a lot of ringing pages that don’t include the exact phrase “bell ringer”.
Using exclusions in a search can be particularly useful if someone famous like a footballer, singer or politician, has the same name as the person you are interested in. You could try excluding words like ‘football’ or ‘singer’, but you will probably need to exclude several words to get a useful reduction. It’s best to find these by trial and error, keeping the ones that cut the number of hits by a significant amount. Look for a word that appears in several of the unwanted hits. For a singer it might be the name of the group that she sings with. Search again with that word excluded and if she still appears lots of times see if there is another frequent word, maybe the name of a song, that you could also exclude. Repeat the process as long as you can get a useful improvement at each step. When you have suppressed one person you may find another dominates, say a business woman, so repeat the process excluding words that appear a lot with he! r, like the name of her company or its products.
There is no guarantee that using exclusions will get rid of enough unwanted references, but it should remove a lot of them and increase the chance of finding something useful. But remember that exclusions may remove wanted pages as well. In the example above, excluding ‘clock’ removes three quarters of the hits, but some of those will not refer to the clock-maker, they might be ringing pages that happen to mention a clock.
You can use place names as filters. That has pros and cons. It increases the chance of finding references to what the subject did while there but makes it less likely to find anything about what he or she did elsewhere. Some ringers spend parts of their lives in different places, and unless you are sure you know them all it can be useful to do some searches without place names, especially after you have found out something else, like a middle name, a spouse’s name or some other activity. If that produces any references to a place that you didn’t already know about then you can do a more targeted search using that place name.
Bear in mind too that compound place names are often abbreviated, for example Kirkby in Ashfield is often just called Kirkby (but so are several other Kirkbys that are nowhere near it). Upper and Lower Clapton (which are near each other) are commonly just called Clapton. In this case, a search for [ clapton ] will be swamped by references to Eric Clapton. A search for [“Lower Clapton” ] would avoid those but would also miss many genuine references to Clapton. So using an exclusion [ clapton -eric ] is better in this case.
Bear in mind also that using ringing terms to narrow your searches may help to find ringing related information but will also reduce the chances of finding out about the subject’s professional life or other leisure activities, since pages about them are unlikely to contain ringing terms.
Overall, the thing to remember about Internet searching is that it is an art. The tools are simple but getting the best results out of them requires quite a bit of skill – something you learn by experience. Each case is different. You might be lucky but more usually you will need patience and the willingness to attack the problem from several angles.