LM  0:00  
Lise Jaillant, thank you so much for joining me today on the podcast, I want to start with your project after the digital revolution. Its aim is to bring together all the people who use digital born records. So professors, library professionals, I met a big research university. And this seems fairly unusual to me. There's often a pretty bright line separating faculty from archivists, and curators, and catalogers. So I know that your project has lots of aims related to the archival materials, the digital board materials, and we'll turn to that focus soon, but just staying with the participants for a minute, I wonder how did this collaboration come about? And is it as unusual in universities in the UK, and Canada as it is in the US? 

Lise Jaillant  0:52  
Yeah, once I get so much for the invitation first. And I think that's an interesting question about collaboration. So let me tell you a little bit about me. I mean, I did my PhD in North America in in Vancouver. And then I went back to the UK and full time I worked at john Rylands Research Institute in Manchester. And so john Rylands Research Institute is actually part of, of the libraries of john violin slightly. So it's a big research library. And at that time, I started becoming increasingly interested in digital archives. So I worked a lot with archivist, you know, I discovered the calcaneus press in a preservation project. Carcanet press is a poetry press based in Manchester, quite a large poetry publisher. And this, this preservation project is really about preserving email archives. And it started in 2012 until 2014. And it's resulted in the miscue and preservation of lot of emails, a total of 215,000 emails and 65,000 attachments. So I was really fascinated by this project. And at that time, I started working very closely with archivists. So I'm a literary scholar by training, but of course, you know, I've been doing quite a lot of archival work in paper archives for many, many years, including at Columbia University, by the way. And, and, and yes, I started, we're working very closely with the, with archivists, you know, paying attention to this is new materials, like email archives, you know, a new born digital archives. And that was really the start of my after the digital revolution project, which started in 2017. And so I've been working with, without chemists for many years now. And I'm also working with computer scientists at the moment on two new projects. So I think getting back to your question about collaboration, you know, this is extremely important for humanities scholars like myself to build this collaborations and really, to be able to work with a lot of different peoples, you know, for different diagrams, in order to be able to sort of sort out these big issues. So I mentioned the issue of email archives, and I'm sure we, we are getting to talk about this during this interview.  

LM  3:21  
So when you mentioned the Carcanet project, you said that part of the mission is about rescuing the emails. And I think that that that vocabulary seems central to digital projects right now. Underlying that vocabulary is the fear that digital documents are in danger of being lost. And that literary scholars who rely on the traces left by writers, including drafts and correspondence, are bound to be disappointed in the future. And humans in general aren't very good at thinking about future generations. But one could argue that archivists are in the business of doing just that. So can you describe how digital documents might disappear?

Lise Jaillant  4:07  
Yeah, absolutely. So if we go back a little bit in time, we see that very few people cared about the preservation of digital documents before the 1990s. I mean, paper remains as the most widespread medium of communication. And an archivist well, mostly concerned about preserving this paper records. And of course, once you've preserved the records, you also need to catalog them and to make them more accessible. So the most pressing concern was not digital documents, but the huge amount of paper records that were and catalogued, difficult to find difficult to access really, that really changed from the 1990s because the occupant community started seeing digital documents as indentured records, you know, as records that could disappear. In August 1993, there was actually an important legal decision in the United States, federal agencies, you know, like the White House, Congress now now had to keep all official email documents, you know, everything that was issued bases, federal agencies had to be kept on a computer systems. So it was not enough to print those messages, you know, out to paper, they're hard copies often lacked important information about the context. For example, the sender, the person to which the email was sent, and also the time of transmission. So electronic versions had to be retained to satisfy this record keeping requirements. So that was really a turning point in 1993. And for for the occupant communities, this was this was a big, big challenge. At the turn of the 21st century around it 1000s, it became clear that it was not enough to keep to preserve those untouched digital records, it was also important to make them accessible to researchers. So the question of preservation started moving to the question of access, it's not enough to preserve these documents. It's also important for researchers to access those documents. And of course, it's a it's a huge task, which is why, you know, we told you about collaborations that are your nights important to create this collaborations across various institutions, across various people use various backgrounds, not only at the national level, you know, but also at the international level. And I'm sure you're you're familiar with the digital preservation coalition, the GPC which was actually created in 2002. And initially, it was a partnership between several agencies in the UK and Ireland. And since its creation, 2002, it has become an international organization, you know, with numbers in the states in Australia and continental Europe and elsewhere. And it's really an organization for digital archivist. And for for the people who are interested in digital preservation. So nearly 20 years later, after the creation of the digital preservation coalition, the issue is not so much about preservation, it's, you know, it's very much about access. And one important thing as well is that many documents do not end up in the archive. So at the time, when everyone was sending letters, it was common practice to preserve the letters that you received. And we see we are using the same with emails, when we keep everything in our email boxes. The thing is, we often rely on private companies, you know, and those private companies may disappear at any time. So very simple lens to VISTA a couple of years ago, had a free email service, but it shut down in March 2002. And at the time, you know, when Alta Vista should go in there had a 200,000 active email accounts on the platform. And of course, people lost a lot of data here. And the same happened with my space really shut down. And there are so many so many examples. S o last year, Richard Ovenden director of the Bodleian library, published an important book called Burnings the Books, a history of knowledge under digest. And he wrote, and I have the quote in front of me here. And as more and more of the world's memories placed online, we are effectively outsourcing that memory to the major technology companies that now control the internet service. It's basically saying that the problem is that these companies do not really care about preservation. This is not you know, the core of the business. digital documents that are preserved on these digital platforms are at risk of disappearing. And I think this is a major major issue. So just just to summarize, what I'm trying to say here is that you can leave old letters and attended for for many, many years, and there is a good chance that those old letters will still exist for for the next generations. The true is not as the same is not true of of digital correspondence. And of course files that are saved on your computer can disappear if you use a computer, if you know you have a major crash on your system. And of course external storage can also become obsolete over time. So I think we have a render a few issues here but preservation and access these are, you know two main issues.

LM  9:57  
So what you're describing Lise are two layers. As of loss one at the level of the file, losing it, it being corrupted, and then also at the level of the institution or the company. So the technology around writing paper letters didn't change. But we already have obsolete digital ways of communicating that are no longer supported. However, it's not actually lost that's usually focused on in acquisition decisions. It's the surplus of files and how to deal with so many proliferating copies. There's a clear understanding that email will be interesting for literary scholars and for researchers, and scholarship that investigates the sociology of literature, or the historical production of literary institutions would seem to find documents fascinating that archives are often tempted not to preserve but are beginning to do so. So shipping notices to different bookstores or the methods for distribution or advertising agreements. When you're talking to archivists and facilitating conversations between archivists and literary scholars, what are the kinds of digital born documents that you advocate institutions should preserve?

Lise Jaillant  11:18  
Then the thank you so much for that for this question, because I see we have this, this conception of literary scholars are only interested in the text, you know, or in drafts of, of literary texts. And I think it's more complicated than that. I mean, obviously, the people like myself are interested in the literary text. But we're also interested in the paratext, we are interested in every single rounds of text. So it can be, for example, the biography of the author, their personal relationship, their professional relationships, you know, everything that could be relevant to interpret the text. So scholars are interested not only in drafts, but also in publishing materials in letters. And also, of course, in emails now, and as you know, I mean, in the mid 20th century, libraries in the US started buying manuscripts and correspondence from via writers, and many, many writers in the US and elsewhere. For many of these writers, it was an interesting opportunity, you know, not only to make money, but perhaps to reach a broader audience urgency, and you had this emergence of a new marketplace, you know, for for manuscripts for, for drafts, etc. So many writers started preserving their letters, their drafts, in the hopes that one day, you know, they realize they would be able to sensors, materials, libraries. And just to give you one example, I mean, I've been working a lot on the British writer on Chris Wilson, who lived most of his life in, in seufert, in the UK, and he sold his manuscript to the University of Iowa in the 1960s. And under time, it was quite an unusual thing to do, you know, for for British writer, but I think today, there is a growing awareness among writers that digital documents are also valuable. So not only paper documents, you know, everyone knows that since documents are valuable that they can be sold to libraries, but also as a digital documents, and in many writers preserve their digital drafts in Word document, as a Word format, or as a PDF format. But I don't think that many people knows that their emails are also valuable. So it's easy means of languish, you know, in an email box, which is, as we've just discussed, at risk of disappearing, you know, as the commercial company just shuts down. So the problem is that there is not a real market for emails, and most of the collections that libraries acquire today, actually, hybrid collections. So you have paper records, you also have digital materials in as part of these collections. And there is no specific price placed on digital materials. So basically, for writers, this does not really provide an incentive to preserve their image. I mean, why would they preserve their image if, you know, libraries do not really put a price on the mesa? I was I was talking about, you know, publishing materials that are also of interest to literary scholars to people like myself. And as you know, I mean, not many publishers preserve their correspondence with writers. I know that Columbia has a wonderful collection of publishers archives, but many publishers actually do not really care about their archives. I mean, the publishing industries as a general rule is to move faster and there is little time to to reflect on the past and you need to think about the archiver. So the dungeon today is really that emails. And also data preserved on online platforms, which simply disappear and be inaccessible to future researchers.

LM  15:09  
You brought up the question to have price for digital files. And I've heard that some institutions are thinking that they won't have to pay anything for digital files that one shouldn't have to pay for non unique copies. I've also heard from major dealers in New York City that they simply just don't want to deal with the new digital frontier. So I also wonder if some of the important intermediaries are not equipped to deal with to inventory and to try to sell and shop around digital materials.

Lise Jaillant  15:49  
This is that's might be the case. There was actually an interesting book by Amy Chen , published very recently on this literary marketplace, you know, for paper records that started appearing in the mid 20th century. And I think it was a digital landscape, everything is changing very fast. It's also the chase at the moments that, you know, I mean, the kind of collections that that are acquired at the moment are mostly from older writers, obviously. So of course, these are mostly hybrid collections, paper plus digital, and it will probably change with the new word generation, you know, people who actually have done all of their careers they've written on laptop or desktop, and they've always, you know, exchanged emails instead of letters. So this will probably change actually, with the new generation.

LM  16:39  
I think that's right. It'll force a reckoning, that major collections that Columbia has purchased have been hybrid. And there has been a focus on the paper part of the archive, I wanted to also draw out something that you mentioned, which is that part of the reason that our that artists sell their archives is to make money, but another is to reach more researchers or to reach almost a second audience. And that's a step towards posterity. You also talked about international collaborations working together to create best practices. So I wondered, are there best practices around virtual reading rooms? It seems like when authors or artists sell their papers and hope that they're digitized, in fact, one of those goals is to reach a wider audience. And yet, many digital archives are only available physically on site at the library. So there seems to be a tension there between the major benefits of digitization which is making things more widely available. What's the argument with that? Why keep digital items linked to a physical location when they don't technologically need to be?

Lise Jaillant  17:51  
Well, at the moment, as you know, it's often necessary to travel to archival collections to to look at born digital records and of course, is born digital records can be easily made accessible online. And most libraries could not do that at the moment. Let's take a particular example. I mean, the example of the archives, a British writer Ian McEwan athletes actually at the Harry Ransom Center in Austin in Texas. And it's an interesting story, actually. So when Ian McEwen sold his archive to the Harvey Ransom Center a couple of years ago, he included 17 years of he made from 1997 to 2014. And when you look at the finding gauge, you know, for this collection for me and my crew and collection, well, it tells us that the email correspondence has not been processed and is not available to researchers at this time. And I actually checked this morning and it's still you can still see this message. So I guess I was very lucky because I was able to access Email Archive when I went to a trustee in 2017. And in the meeting room staff was very helpful. They prepared a selection of females based on the keywords that I had given them. So they brought the laptop during the reading room. And I was able to read Ian McEwan's he made which was, you know, wonderful experience ready. But it was not always easy to understand the context. I mean, sometimes I was not sure if the image I was reading on the screen were originally in the inbox, or perhaps in another folder, you're in the sand folder. So a typical experience for me was to see Ian McEwan's response to a query. And then I clicked, you know, through dozens of other emails, and then I found the original question. So that was not really user friendly. But at the end of the day, I was able to find relevant data for my researcher, just to give you an idea of the kind of data that I found in ccamlr In the early 1970s, Ian McEwan was still a very young man, you know, looking for mentors, you know, influential literary figures who could help him establish himself. And many people actually helped him, you know that, for example, not only British people, but also as the US people. So for example, the editor of the New American review was called Ted solotaroff. He published several Lafayette McLuhan's short stories between 1972 and 1975. And three decades later, Senator ruff wrote an email to your naiku and Collette to congratulate him on his recent novels. He said, Okay, so that's really, you know, wonderful to me, are you, you know, your career was a great success, etc, etc. and McLuhan was very happy, of course, and he replied, and I have Nieman in front of me, actually, he said, I colored in the face of your praise and the experience, Reza took me back to the early 1970s, when the letter R, those were the days from you appear to me my little flat in Norwich, to have been sent from an Olympian realm. So basically, he was saying, okay, you know, you remember when we were Bruce, you know, young Jones, the 1970s, we used to exchange these letters. No, of course, it seems right. So that's, that was quite a reflection on the medians that they used to add to correspondent to create these relationships, which are, of course, extremely important for writers and you know, to really get get established. So I actually use this quotation in my monograph on the history of creative writing programs. And the monograph is gains to be published very soon with Oxford University Press. So I actually use this data either for my research. And it's really a pity that also researchers cannot access users important materials at the moment at the Henry Ransom Center, and I'm sure it will change in the future. So it's a one way of changing unisys and making this archive more accessible Paris would be to imagine a system where people don't need to physically travel to archival collections, to consult versus bond digital records. So we could imagine a system where researchers would provide an ID document like a passport or perhaps a letter of recommendation, and then they will be allowed to log in and access the bond digital records. And I would say So you mentioned virtual reading groups is that really a system that could be designed to make those archives more and more accessible. And I would really say that archives are not meant to be locked, you know, they are meant to be used by not only by researchers, but also by a wide range of users, including people who cannot physically travel to represent Risa. And of course, you know, the reserve is a COVID situation physically traveling that has become, of course extremely difficult or even impossible. as so many institutions have been closed servicing, it's really time to sink of trying to make those  archives more accessible perhaps, you know, using a system like virtual room. But there are so many issues with not only with with copyright, but also with data protection, technical issues, so many issues to where to sort out with, I guess.

LM  23:31  
The Ransom Center is an interesting example. It's so far from Ian McEwans home. And I love that quotation that you read the letter that reaches him from an Olympian realm and is sent to Norwich. And you had to travel from London to Austin to see the email. And travel is so difficult for people who can't do it for a number of reasons for cost, physical ability care commitments at home, not to mention what you do when you actually get to the archive, I'm located in Manhattan. And the cost of staying while you use the archive is also prohibitive. So I want to turn to copyright. On the podcast, we've spoken also to Peter hurdle, who is a digital copyright specialist, and archivist. And we talked about the obstacles put up to users to access materials, sometimes even for items that are in the public domain. But copyright and data protection laws are different in the UK in the US. Can you describe for an American audience what the UK is general data protection regulation is and how that approach to privacy affects archives?

Lise Jaillant  24:41  
Yes, absolutely. So general data protection regulation, we says GDPR is a regulation that applies only to Europe or European territory, and it's on data protection and privacy. So as I said, it applies only to the user of course As you know, Britain is no longer part of the EU. But the good news is that in 2018, Britain adopted its own version of the GDPR. It's called the Data Protection Act. So now that Britain has left the EU, when's the Data Protection Act create a contiguity between the UK and the rest of the European Union. So what is GDPR? About? Well, Article 15 is really important. I mean, it gives people the right to access their personal data and information. And also it gives them some some control about how his personal data is being processed. But it's really about access to personal data since the birth of GDPR. And the Data Protection Act in Britain puts a lot of emphasis on the documentation that needs to be kept, and also on the right of individuals to access their data to rectify their data, and also to erase their data. So as you can imagine, for archival institutions, this creates some issues, because, okay, on the one side, they need to respect as there is a need of individual to be able to access, rectify and erase their data. But on the other side, they also have what we call the archiving in the public interest principle. So basically, it's recognized that archival collections need of course to to do their job, which is to archive in the public interest. This means that, for example, if an individual wants to delete all their data in a specific archiver, whereas the library can refuse that because the archive presents a public interest. So in practice, and in many archival institutions in the UK, are quite risk adverse, they prefer to cruise down and tie collections resins and give access to records that could be potentially problematic. And I should say, because I've worked with both  in the UK collections and also us collections in the US, it's generally much easier to get access to archives. So what happens is that American archivist will often bring you a bunch of documents. And if you find anything sensitive, of course, you know, you're expected to ask for permission, before publishing this information, or even not, of course, avoiding publication, of course. So I should says that sometimes it's still not easy to get access to certain archives in the US. I mean, I mentioned the Ian McEwan archives. And unit archive is close to researchers, and it's in Austin in Texas. So even when an institution wants to share this digital file, it cannot put everything online. So I would say that few institutions have solved all the issues that are related to digital archives, including technical issues, also how to design an interface, which would make it easy for researchers to consult this documents, even researchers who are not very tech savvy, who are not very familiar with complex technical interfaces, I think it's important, again, to think about access and access to a wide range of users. So not only researchers, but also people, you know, for example, people who are doing research on their family who are very student who don't have these kind of professional trainings that academic researchers have.

LM  28:34  
Absolutely, I want to turn back to the tech savvy qualifications in a minute. But earlier, you mentioned the Carcanet, publisher in Manchester. And as I understand it from reading your work, this is both a case of technical triumph and frustration, because it remains close to users. Can you talk about what you know about the archive and how it's being preserved for future use? And why researchers can't use it now? Is that related to the GDPR?

Lise Jaillant  29:06  
Yeah, so I should say, first, your pronunciation of Carcanet is perfect. It's calculated. So it means a necklace, usually of gold or set with jewels. And perhaps to give a little bit of context to to people who will listen to this podcast. And the word calculate was chosen in 1962, when you had this group of students who set up a magazine between Oxford and Cambridge, and Michael Schmidt, who is a young student at Oxford, to COVID magazine in 1967. So he was still an Oxford students. And then he decided to create his own company. So in 1969, so based based on this literary magazine, and it became quite quite a major publishing company, and it has been publishing poetry for the past 50 years. So what's happening with the Carcanet archive it was acquired by the John Rylands slightly in the late 1970s. And they still receive some documents on an ongoing basis. I mean, every year, you know, they receive new materials. And of course, many of these new materials are digital now. So it's I mean, a fascinating archive, quite a large archive. But it's also a bit frustrating because it's difficult to access his materials, especially the new way and materials. And that there are many issues. I mean, we mentioned copyright issues, we mentioned that the GPR technical issues are, you know, not a problem, of course. So I would say that, you know, the John Rylands library is not unique here. I mean, lots of UK libraries have exactly the same problem is access. And of course, you know, in the US that's, that's the same C as well, even though you don't have the GDPR. There. That's I mean, if we take the example of the British Library, which is of course, you know, a major institution, and it's not easy for for users to access boom, digital collections at the British Library. And, and the main problem is that born digital records are not always listed on the catalog and finding aids. So some people don't know about this collections. Let me take an example. I mean, that's an example of the will self archiver will self is a contemporary writer, he's based in London, and to the British Library, brewed his collection a couple of years ago. So if you look at the funding gauge for the will self collection, you will see school that will self personal and literally read papers. So basically, when you download the finding guide, well, there are absolutely no mention of born digital records. I mean, you wouldn't notice that the British Library has any born digital records. It's rarely presented as a paper collection as a traditional paper collection, but in fact that it is an odd reference. And I discovered that's when the British Library birthday collection a couple of years ago, they presented it as a hybrid collection. So it contains this paper records and born digital records. And in particular, the collection includes the hard drive of will sell for the computer hard drives. So they have manuscript drafts, they have approximately 100,000 teammates or so quite a large number of females, they also have a lot of files that have not been entirely identified or mined. So for example, they have downloads of his iTunes. So quite a lot of interesting materials here for for researchers. So again, the problem is that users of seesaw can do not even know that the British Library has this material set this is so my suggestion is a literary scholar would be to be more transparent, and perhaps to include more details about his boom digital collections, even though you know, the issue of access might still be you know, a big problem is in the next few years, I think it's still important to catalogs his collections, and at least to tell users that Okay, so library has says materials and products, they are not available, you know, for such reason, but you know, they will be available in the future. So I think trying to inform users that these materials existed is really important. And also it will make the library more accountable. Because of course, you know, archives cannot be closed forever. I mean, the library would need to give an approximate data when the medallions would be available. So for example, it could be like in 10 years in 20 years, that at least having some date would be useful, I think.

LM  33:56  
Absolutely. I want to put a pin in that and come back to the timeline of making materials accessible. But Wow, not even knowing that the digital part of will self archive exists. That's incredible. We call those hidden collections at Columbia. And there's been a move to simply make them all visible. It doesn't mean they're all accessible. But they they show up now in the archives portal. you highlight also, I think exciting opportunities with digital archives, the iTunes library, that's really interesting. I would love to know, you know, the Spotify lists that artists or authors listen to. So there are these new data forms that researchers have to have the opportunity of dealing with. And I love the example you give of the Enron Email Archive, which I will describe at a little bit of length because I think it's so interesting. So if you want to key word search those emails for incriminating evidence, say about expunging records, looking for the words delete or race wouldn't work, right, because they had used an internal language to describe and disguise their actions. And as you would recount, in the Enron emails, the code words, were using Star Wars references, like Millennium Falcon, to describe getting rid of emails. So in that case, you would have to either read the emails in an analogue fashion all the way through, and notice the strangeness of the language in decoded. Or you'd have to use some other tool altogether. And you describe how data visualization was used to see these strange words pop out. And this clearly requires some special data skills. But I guess my question is, do you think that this is true, more generally, of digital archives? Do you think using a digital born archive demands new or different research skills than visiting a special collection in person and looking through the records that are kept in an institutional archive traditionally?

Lise Jaillant  36:16  
Well, data visualization is certainly a great way to find relationships that are not obvious at first sight. I mean, especially with huge Archer, like you mentioned, the end run archiver. Another example would be the calculate press archive we discussed, you know, at the john violence library in Manchester. So when you have an archive, which is normally close to researchers, you know, being able to create this data visualization can be extremely useful to make sense of the archive and this huge amount of data, just to tell you a little bit about the kind of research that I've been doing during the calculate press archive. And then with my HRC grant, I was able to employ an archivist who was actually based in the John Rylands library. And what she did, she prepared a selection of 200 emails that were created by Carcanet during a single year, 2010. So of course, you know, 200 emails, that's not the same thing as yen run collection, which is absolutely huge. But that still gives us you know, quite a good overview of the kinds of activities that can add in 2010, which was really a key year, because what happened, of course, after the economic crisis of 2008, you had funding cuts in the UK, and many cultural organizations, including the outcomes were England, really struggled and how to, you know, to give the few were more or less manage calculate press, and also the poetry publishers that were severely impacted. So basically talking about press, there were quite a lot of uncertainties in this context. But also, there were quite a lot of uncertainties for all the publishers, you know, like Bloodaxe, like, Peepal Tree press as other poetry publishers in the UK, as were quite, you know, wondering what would happen is, this difficult context. So what I did, really, I looked at the selection of females. And I found that there were a frequent correspondence between Michael Schmidt who was the Founder and Managing Director of Carcanet. And since independent presses, as I mentioned, Bloodaxe, you know, Peepal Tree Pressis also one. So what I did I use that gift sheet to create this network visualizations that had news and I had, what I did for each new that at the column represent gender. And then the size is proportional to email exchanges, since difficult to explain the results showing what does the visualization looks like. But really, what you can imagine is that Michael Schmidt is at the center of this visualization. And he's exchanging emails with a lot of older male publishers, or like, for example, Jeremy Pointing, who was born in 1946, was a founder of Peepal Tree press, also with Neil Astley, who was born in 1953, who founded Bloodaxe in 1982. So he's very much, you know, in touch with this network of all the major publishers. So what I really sure you know, in my research, is that leadership and management responsibilities are often in the hands of this major publishers, while women often occupy less prominent roles in the publishing industry, but I think it would be a bit simplistic to present women as marginalized and men as literal outsiders. I mean, it's a little bit more complicated than that, because as I mentioned, Carcanet that is based in Manchester Bloodaxe books and Peepal Tree press are based in Newcastle and Leeds respectively So basically, they are far from London far from traditional regional centers. And what's important here is that the founders, the founders of this press often has this discourse, you know, insisting on the, on the marginalized geographical location, you know, saying, okay, we are based in Manchester We are based in Newcastle or Leeds. That's not the same thing as London. And they present themselves as totally independent from us as powerful figures in London. So what I did you know, I tried to question this discourse of marginalization. And I analyzed tweets that have the hashtag Carcanet50 to Carcanet50 was a hashtag used to celebrate the 50th anniversary of calconnect in 2019. And what I showed, you know, looking at this tweets is that you have close links between London based literary figures, and since literary figures based in Manchester, in Leeds, in Newcastle, etc. And I use the Twitter profiles, and also the publicly available data. And I also added a location associated with each accounts that use this hashtag Carcanet50. So what I did really, I used gesi, to identify purposes, geographical clusters. And I also use this calculate 15 data set to show that you don't really have a separate Manchester lead to receive. Its Reza, you know, in interconnection between London and Manchester. So I see it was really interesting to compare, you know, my selection of females from 2010. So as I said, it was only at 200 females, and also the chocolate 50 data set from 2019, as it shows the newness and potential for linked open data for research, and I think that archival emails could be unreached resumes, various sources, you know, including Twitter accounts, really, I mean, data visualization, you know, I mentioned guesses. That's one approach one methodology. I mean, the main issue is that we can't really review huge amounts of data. I mean, I mentioned the galkin. At press, our judge, you mentioned earlier on the Enron, email archiver, one of the examples that we could give is the WikiLeaks documents, and in the WikiLeaks document, obviously, contain a huge amount of data. And the files are mostly unstructured, there is no clear indexer it's very difficult to to make sense of this data. So of course, insistent text, data visualization can be used to identify patterns, you know, trying to make sense of this huge amount of data. Yeah, so identifying patterns. That's one one thing. Data Visualization can also be used to tell stories or based on data. I mean, obviously, you know, visualizations are of interest researchers, you know, to people like myself, they are also of interest to, to private companies, and even to policymakers. I mean, I know that Harvard, for example, has a project called the Atlas of economic complexity. And it's an open access to an extremely useful tool that offers visualization, to understand group support entities in specific countries, like, for example, you type France and your happiness and data, the data visualization on the group supporting teams for France for this particular country. So again, I seen it's a it's a good example of an Open Access Tool, which is quite easy to use, you know easy to manipulate. And I think these realizations can be applied to a wide range of sectors, they can also be used to create links between several disciplines. And it's the same with artificial intelligence and machine learning. Yeah, you know, another advanced techniques that we could use, you know, to make sense of this huge amount of data. And I mean, like, data visualizations, artificial intelligence can be used to create this links between various disciplines. So just to sum up, and to conclude on this, I mean, I was trained as a literary scholar, but increasingly, I work with computer scientists, you know, people who are real Specialists of artificial intelligence and machine learning. I have two projects at the moment funded by the Arts and Humanities Research Council. So one project is called aura. So it means archives in the UK Island and AI artificial intelligence. And the other one is actually with US partners, and is funded by the HRC on the UK side, and by the National Endowment for the Humanities on the US side, and it's called AI for Cultural Organizations that three are going to look at artificial intelligence applied to this in a huge amount of data. The project is has just started actually just last month. And we are going to organize a series of workshops, a series of case studies. So very, very exciting project, I think. I'm sure there are so many so many methodologies that we could explore, to make sense of this huge amount of data says that's a very exciting moment, I think.

LM  45:23  
We were talking about how archives remain closed to researchers for privacy and copyright and logistical reasons. Since data visualizations can be used to quote unquote, read materials, by feeding them through an algorithm, could they be applied to closed collections? That is, could you imagine a research pathway being developed with archival institutions in which distant reading practices are applied to collections that are closed, say, for 25 years, or 50 years, or the duration of a researchers lifetime?

Lise Jaillant  46:00  
I think it's a bit complicated, because of course, you know, I mean, with my selection of 200 emails, of course, I was able to read this email, so to do close reading, and also to do this form of distant reading, you know, with data visualization. So I was able to combine close and distant reading, you know, to use as a well known phrase of Franco Moretti. But in the case of archives that are totally closed researchers, access to data remains an issue because of course, you cannot do data visualization if you don't have access to any kind of data. So see, at the moment, you know, a lot of argument collections are a bit risk adverse, and they prefer to close the entire collection. So even if sometimes they could give access to metadata, for example, and sometimes meta data. I mean, it's may contain sensitive information, but lots of the time, you know, meta data is not sensitive. So giving access to some forms of data, if even if it's not, you know, the entire text would still be useful for researchers.

LM  47:06  
So not to put not to put you on the spot somebody, somebody is about accessibility, a lot about accessibility, and also uses archives for yourselves, or do you think that an open date should be part of purchase to timely accept his purchase, there are a lot of wealthy institutions that tend to edge out smaller institutions in purchasing important collections. But making accessibility a necessary part of purchasing collections that have importance to the broader public? might be a way of evening the playing field, do you think that having an open date should be part of purchase plans?

Lise Jaillant  47:49  
Yes, I think that's important to give an approximate date when libraries and archival collections by acquire archives, because at the end of the day, I mean, this collections are have to be opened at some point, there's no point in buying your collection and keeping it locked, you know, for for ages, and nobody knows when it's going to open. So at least, you know, having I understand that, you know, sometimes they are, of course, sensitivity issues can be extremely complicated with donors, etc. But at least having a specific date when the collection will be accessible would be, you know, of great help for researchers and other users as well,

LM  48:32  
and would further fulfill the public commitment to stewarding these archives for the public and for the future. Absolutely. Absolutely. So these this has been a fascinating conversation for me. And I want to ask you one question to close, which is, what do you foresee will be some of the most important digital archives that will be preserved in the coming decades? And perhaps what do you see as some of the most troublesome Digital Archives being captured?

Lise Jaillant  49:03  
So, I talked a lot about email archives in this interview, because they are tricky for for many reasons, including at the preservation level, I mean, it's easy to turn an email into a PDF format, but it's much more complicated to preserve a thread of emails, you know, with all the metadata associated with the names. So when we consider the huge amounts of emails that people send, I mean, the issue of preservation become particularly complicated, but of course, you know, this issue of preservation is complicated, but it's less complicated than the issue of faxes. Because as we've seen years there's so many issues with data protection copyright technical issues that explain wises image collections are not accessible most of the time, and this is a big problem, of course, is a team and contain invaluable information. your emails are extremely precious, which is why they can be So controversial. I mean, of course, we have image scandals that come up in the news on regular basis. In the US, you're had the scandal over Hillary Clinton's emails in 2017. There are, of course, many other examples of email scandals, you know, in the past few years. So I think that we have methods today, you know, to identify sensitive, sensitive documents, you know, problematic documents in email archives. I mean, it's called sensitivity review. And artificial intelligence can be used to identify sensitive information, and also of course you know, once you've identified sensitive information, you can put them aside and improving access to information which is not sensitive, which is not confidential. So even, I mean, you asked me a question about troublesome archive or, you know, archives that are a bit problematic, when even the most problematic er chaos could be made accessible using innovative methodologies, such as artificial intelligence. And there are, of course, you know, many other techniques and approaches that could be used. So as I said, you know, it's an exciting time for literary scholars to do research and also to create those collaborations with without archivists with computer scientists, with other people from various backgrounds.

LM  51:23  
Thank you so much, these this was a real pleasure.

Lise Jaillant  51:25  
Thank you for the interview. Thank you so much.

Transcribed by https://otter.ai