Library of Lost Government Content: the latest Aliss Showcase of Key resources for Social Scientists 24th June 2021


  • Jason Webber, Web Archiving Engagement and Liaison Manager, British Library.
    This presentation aimed to give an overview of what web archives are with a focus on the UK Web Archive.
    Interesting facts : extensive collecting is from 2013 onwards. Earlier collecting from 2005-2013 required owners permission. Everything in the UK domain is collected Once per year as part of an ‘Annual Domain Crawl’ which can takes months to complete
    Selected ‘targets’ (including News) -are gathered daily, weekly, monthly, quarterly, Six-monthly. It includes uk domain websites but those hosted on wordpress by Uk authors are excluded unless they notify. There is a cap on what is gathered per website which means some content may be missed
    Useful data sites
    This tool was developed as part of the Big UK Data Arts and Humanities project funded by the AHRC.
    The data was acquired by JISC from the Internet Archive (IA) and includes all .uk websites in the IA web collection crawled between around 1996 until April 2013 over 3.5 billion items (urls, images and other documents) and has been full-text indexed by the UK Web Archive. Every word of every website in the collection can be searched for and analysed.
    In 2014 the project awarded bursaries to 10 researchers to carry out research in their subject area using the UK web archive (particularly the dataset derived from the UK web domain crawl 1996-2013). The case studies that they produced showcase the richness of web archives as a source for humanities and other researchers, and are available as open-access publications
    Open data – JISC UK Web Domain Dataset (1996-2013)  contains all of the resources from the Internet Archive that were hosted on domains ending in ‘.uk’, or that are required in order to render those UK pages.
    Use ‘trends’ to analyse the number of pages a word or phrase appears in the collection over a given period (within 1996-2013). Comparisons can be drawn by adding several words or phrases separated by a comma. E.g. cat, dog, goldfish.Jennie Grimshaw, Government and Official Publications Service and Content Lead, British Library
    Title: The jewels in the crown: curated themed collections on the UK Web Archive
    This presentation discussed the aims and scope of topical and themed collections of archived web sites. It looked in detail at two collections which Jennie developed: the 2015 general election and the subsequent EU referendum. She explored the aims and scope, how they identified and evaluated sites for inclusion, and how they work to assure the quality of the gathers.
    Useful facts: Emphasised the value of careful selection and they seek to improve quality by inspecting the returns and identifying areas of concern, How they select resources they know will gather well. , The problems with collecting items on the cloud, facebook or youtube. It introduced the pandemic collection which includes lockdown sceptics resources and will have great value for future researchers

    Norma Menabney Subject Librarian, Queen’s University Belfast
    Title: The Northern Ireland Official Publications Archive : extending the reach of official materials. The presentation explained how the Library at QUB has established a fully searchable database through its processes of harvesting the websites of over 150 official bodies and creating records which are made available to the British Library and other Legal Deposits. In so doing the work allows all parties to meet their legal obligation while Queen’s continues to expand its archival holdings and extend access to the public and global research communities. Norma emphasised how the archive could support teaching and learning by: Explaining the governing structure of Northern Ireland; Helping answer ‘which departments are responsible for what subject area; Supporting an understanding of the political and historical landscape; Clarifing Northern Ireland Assembly publishing categories; making output easy to identify – independent inquiries are highlighted in a separate category; including all versions of publications.