Massive, Unarchivable Datasets of Cancer, Covid, and Alzheimer's Research Could Be Lost Forever
www.404media.co
Almost two dozen repositories of research and public health data supported by the National Institutes of Health are marked for review under the Trump administrations direction, and researchers and archivists say the data is at risk of being lost forever if the repositories go down.The problem with archiving this data is that we cant, Lisa Chinn, Head of Research Data Services at the University of Chicago, told 404 Media. Unlike other government datasets or web pages, downloading or otherwise archiving NIH data often requires a Data Use Agreement between a researcher institution and the agency, and those agreements are carefully administered through a disclosure risk review process.A message appeared at the top of multiple NIH websites last week that says: This repository is under review for potential modification in compliance with Administration directives.Repositories with the message include archives of cancer imagery, Alzheimers disease research, sleep studies, HIV databases, and COVID-19 vaccination and mortality data. A list identified by an archivist includes:DASH Data and Specimen HubNational COVID Cohort CollaborativeThe DANDI ArchiveThe Brain Image LibraryThe Cancer Imaging ArchiveBioData CatalystNational Sleep Research ResourceNational Alzheimers Coordinating CenterAgingResearchBiobankSeattle Alzheimers Disease Brain Cell AtlasApoE Pathobiology in Aging & Alzheimers DiseaseChild Language Data Exchange SystemLDbaseCellular Senescence Network (SenNet)The National Center for Advancing Translation Sciences OpenData PortalCatalog of the NINDS Human Cell and Data RepositoryThe Brain Research Through Advancing Innovative Neurotechnologies InitiativeHIV databasesThe Neuroscience Multi-Omic ArchiveThe Human Health Exposure Analysis Resource Data CenterMouse Models of Human Cancer DatabaseBased on archived versions of the websites, the message was added to most of the sites last week, around March 26 or 27. On March 28, Health and Human Services Secretary Robert F. Kennedy Jr. announced that HHS and agencies it oversees, including NIH, would lay off 10,000 full-time employees as part of a reduction in force plan. On Tuesday, at least five directors of NIHs 27 institutes and centers were told they were put on leave. Kennedys plan outlines 1,200 layoffs at NIH alone. Yesterday, Kennedy said some of the cuts to programs will be reinstated. Personnel that should not have been cut were cut. Were reinstating them, he said. Part of the DOGEwe talked about this from the beginningis were going to do 80 percent cuts, but 20 percent of those are going to have to be reinstalled, because well make mistakes. Earlier this week, researchers filed a lawsuit challenging the cancellation of research grants totaling more than $2.4 billion over the past month by NIH.Under the Trump administrations purge of public government websites and health resources, archivists have been diligently saving what they can. But there are limits to what can be archived by volunteers, and many of these databases marked for potential modification cant be saved."People don't usually appreciate, much less our current administration, how much labor goes into maintaining a large research dataset."Even if someone does have access through a DUA, they might not have long term access or the data might only be accessible through secure devices that arent connected to external networks, so data cant be downloaded or backed up. And much of the data contains personally identifying information or health information thats protected under HIPAA, which complicates volunteers efforts to store it.Henrik Schnemann, a historian who started the Safeguarding Research & Culture archivist project, told 404 Media that as part of the project, they rely on institutions to help contribute storage; if they cant guarantee all of the data is legal to download and store, they cant save it in partnership with an institution if the opportunity arises.In general its very important for us to be able to say to institutions, yes we got public data, we did not break paywalls, we did not break any agreements, its fine for you to contribute with hosting, Schnemann said. The group is using Bittorrent to store and seed archived pages for now. But the NIH datasets under threat contain potentially multiple petabytes of data to be saved, and archivists need hosts to help with storage. All of this is only possible for the publicly funded institutions if they can be sure they dont host any infringing material, he said.Researcher Captures Contents of DEI.gov Before It Was Hidden Behind a PasswordThe list includes budget claims like $3.4 million for Malaysian drug-fueled gay sex app and Disbursed $15,000 to queer Muslim writers in India.404 MediaSamantha ColeSo far, it seems like what is happening is less that these data sets are actively being deleted or clawed back and more that they are laying off the workers whose job is to maintain them, update them and maintain the infrastructure that supports them, a librarian affiliated with the Data Rescue Project told 404 Media. In time, this will have the same effect, but it's really hard to predict. People don't usually appreciate, much less our current administration, how much labor goes into maintaining a large research dataset.The impacts that Ive personally seen are that researchers lose five years of research because they once had access and now their DUA is up, and theres no one in office, because theyve been fired, to renew their DUA, Chinn said. This means researchers cant publish (de-identified versions) papers based on data analysis theyve already completed. She gave an example of research from the Department of Education, which has decades of studies that some researchers use to compare student performance and learning outcomes that teach us about how wealth and location impact education. In a scenario where that data is lost, we will not have access to that data to compare year over year shifts in performance, she said. We will also not be able to compare, on a national scale, where we stand in comparison to other nations.Right now, the best I can do is advise the researchers that they need to get copies of the data that they are researching with that's restricted, the librarian-archivist said.
0 التعليقات ·0 المشاركات ·14 مشاهدة ·0 معاينة