Inside the AI Prompts DOGE Used to Munch Contracts Related to...

shared a link to vibe with.

2025-06-06 10:18:03 -

WWW.PROPUBLICA.ORG

Inside the AI Prompts DOGE Used to Munch Contracts Related to Veterans Health

by Brandon Roberts and Vernal Coleman ProPublica is a nonprofit newsroom that investigates abuses of power. Sign up to receive our biggest stories as soon as theyre published. When an AI script written by a Department of Government Efficiency employee came across a contract for internet service, it flagged it as cancelable. Not because it was waste, fraud or abuse the Department of Veterans Affairs needs internet connectivity after all but because the model was given unclear and conflicting instructions. Sahil Lavingia, who wrote the code, told it to cancel, or in his words munch, anything that wasnt directly supporting patient care. Unfortunately, neither Lavingia nor the model had the knowledge required to make such determinations. Sahil Lavingia at his office in Brooklyn (Ben Sklar for ProPublica) I think that mistakes were made, said Lavingia, who worked at DOGE for nearly two months, in an interview with ProPublica. Im sure mistakes were made. Mistakes are always made.It turns out, a lot of mistakes were made as DOGE and the VA rushed to implement President Donald Trumps February executive order mandating all of the VAs contracts be reviewed within 30 days.ProPublica obtained the code and prompts the instructions given to the AI model used to review the contracts and interviewed Lavingia and experts in both AI and government procurement. We are publishing an analysis of those prompts to help the public understand how this technology is being deployed in the federal government.The experts found numerous and troubling flaws: the code relied on older, general-purpose models not suited for the task; the model hallucinated contract amounts, deciding around 1,100 of the agreements were each worth $34 million when they were sometimes worth thousands; and the AI did not analyze the entire text of contracts. Most experts said that, in addition to the technical issues, using off-the-shelf AI models for the task with little context on how the VA works should have been a nonstarter. Lavingia, a software engineer enlisted by DOGE, acknowledged there were flaws in what he created and blamed, in part, a lack of time and proper tools. He also stressed that he knew his list of what he called MUNCHABLE contracts would be vetted by others before a final decision was made. Portions of the prompt are pasted below along with commentary from experts we interviewed. Lavingia published a complete version of it on his personal GitHub account.Problems with how the model was constructed can be detected from the very opening lines of code, where the DOGE employee instructs the model how to behave: You are an AI assistant that analyzes government contracts. Always provide comprehensive few-sentence descriptions that explain WHO the contract is with, WHAT specific services/products are provided, and WHO benefits from these services. Remember that contracts for EMR systems and healthcare IT infrastructure directly supporting patient care should be classified as NOT munchable. Contracts related to diversity, equity, and inclusion (DEI) initiatives or services that could be easily handled by in-house W2 employees should be classified as MUNCHABLE. Consider 'soft services' like healthcare technology management, data management, administrative consulting, portfolio management, case management, and product catalog management as MUNCHABLE. For contract modifications, mark the munchable status as 'N/A'. For IDIQ contracts, be more aggressive about termination unless they are for core medical services or benefits processing. This part of the prompt, known as a system prompt, is intended to shape the overall behavior of the large language model, or LLM, the technology behind AI bots like ChatGPT. In this case, it was used before both steps of the process: first, before Lavingia used it to obtain information like contract amounts; then, before determining if a contract should be canceled.Including information not related to the task at hand can confuse AI. At this point, its only being asked to gather information from the text of the contract. Everything related to munchable status, soft-services or DEI is irrelevant. Experts told ProPublica that trying to fix issues by adding more instructions can actually have the opposite effect especially if theyre irrelevant. Analyze the following contract text and extract the basic information below. If you can't find specific information, write "Not found".CONTRACT TEXT:{text[:10000]} # Using first 10000 chars to stay within token limits The models were only shown the first 10,000 characters from each document, or approximately 2,500 words. Experts were confused by this, noting that OpenAI models support inputs over 50 times that size. Lavingia said that he had to use an older AI model that the VA had already signed a contract for. Please extract the following information:1. Contract Number/PIID2. Parent Contract Number (if this is a child contract)3. Contract Description - IMPORTANT: Provide a DETAILED 1-2 sentence description that clearly explains what the contract is for. Include WHO the vendor is, WHAT specific products or services they provide, and WHO the end recipients or beneficiaries are. For example, instead of "Custom powered wheelchair", write "Contract with XYZ Medical Equipment Provider to supply custom-powered wheelchairs and related maintenance services to veteran patients at VA medical centers."4. Vendor Name5. Total Contract Value (in USD)6. FY 25 Value (in USD)7. Remaining Obligations (in USD)8. Contracting Officer Name9. Is this an IDIQ contract? (true/false)10. Is this a modification? (true/false) This portion of the prompt instructs the AI to extract the contract number and other key details of a contract, such as the total contract value.This was error-prone and not necessary, as accurate contract information can already be found in publicly available databases like USASpending. In some cases, this led to the AI system being given an outdated version of a contract, which led to it reporting a misleadingly large contract amount. In other cases, the model mistakenly pulled an irrelevant number from the page instead of the contract value. They are looking for information where its easy to get, rather than where its correct, said Waldo Jaquith, a former Obama appointee who oversaw IT contracting at the Treasury Department. This is the lazy approach to gathering the information that they want. Its faster, but its less accurate.Lavingia acknowledged that this approach led to errors but said that those errors were later corrected by VA staff.Once the program extracted this information, it ran a second pass to determine if the contract was munchable. Based on the following contract information, determine if this contract is "munchable" based on these criteria:CONTRACT INFORMATION:{text[:10000]} # Using first 10000 chars to stay within token limits Again, only the first 10,000 characters were shown to the model. As a result, the munchable determination was based purely on the first few pages of the contract document. Then, evaluate if this contract is "munchable" based on these criteria:- If this is a contract modification, mark it as "N/A" for munchable status- If this is an IDIQ contract:* For medical devices/equipment: NOT MUNCHABLE* For recruiting/staffing: MUNCHABLE* For other services: Consider termination if not core medical/benefits- Level 0: Direct patient care (e.g., bedside nurse) - NOT MUNCHABLE- Level 1: Necessary consultants that can't be insourced - NOT MUNCHABLE The above prompt section is the first set of instructions telling the AI how to flag contracts. The prompt provides little explanation of what its looking for, failing to define what qualifies as core medical/benefits and lacking information about what a necessary consultant is.For the types of models the DOGE analysis used, including all the necessary information to make an accurate determination is critical. Cary Coglianese, a University of Pennsylvania professor who studies the governmental use of artificial intelligence, said that knowing which jobs could be done in-house calls for a very sophisticated understanding of medical care, of institutional management, of availability of human resources that the model does not have. - Contracts related to "diversity, equity, and inclusion" (DEI) initiatives - MUNCHABLE The prompt above tries to implement a fundamental policy of the Trump administration: killing all DEI programs. But the prompt fails to include a definition of what DEI is, leaving the model to decide.Despite the instruction to cancel DEI-related contracts, very few were flagged for this reason. Procurement experts noted that its very unlikely for information like this to be found in the first few pages of a contract. - Level 2+: Multiple layers removed from veterans care - MUNCHABLE- Services that could easily be replaced by in-house W2 employees - MUNCHABLE These two lines which experts say were poorly defined carried the most weight in the DOGE analysis. The response from the AI frequently cited these reasons as the justification for munchability. Nearly every justification included a form of the phrase direct patient care, and in a third of cases the model flagged contracts because it stated the services could be handled in-house.The poorly defined requirements led to several contracts for VA office internet services being flagged for cancellation. In one justification, the model had this to say:The contract provides data services for internet connectivity, which is an IT infrastructure service that is multiple layers removed from direct clinical patient care and could likely be performed in-house, making it classified as munchable. IMPORTANT EXCEPTIONS - These are NOT MUNCHABLE:- Third-party financial audits and compliance reviews- Medical equipment audits and certifications (e.g., MRI, CT scan, nuclear medicine equipment)- Nuclear physics and radiation safety audits for medical equipment- Medical device safety and compliance audits- Healthcare facility accreditation reviews- Clinical trial audits and monitoring- Medical billing and coding compliance audits- Healthcare fraud and abuse investigations- Medical records privacy and security audits- Healthcare quality assurance reviews- Community Living Center (CLC) surveys and inspections- State Veterans Home surveys and inspections- Long-term care facility quality surveys- Nursing home resident safety and care quality reviews- Assisted living facility compliance surveys- Veteran housing quality and safety inspections- Residential care facility accreditation reviews Despite these instructions, AI flagged many audit- and compliance-related contracts as munchable, labeling them as soft services. In one case, the model even acknowledged the importance of compliance while flagging a contract for cancellation, stating: Although essential to ensuring accurate medical records and billing, these services are an administrative support function (a soft service) rather than direct patient care. Key considerations:- Direct patient care involves: physical examinations, medical procedures, medication administration- Distinguish between medical/clinical and psychosocial support Shobita Parthasarathy, professor of public policy and director of the Science, Technology, and Public Policy Program at University of Michigan, told ProPublica that this piece of the prompt was notable in that it instructs the model to distinguish between the two types of services without instructing the model what to save and what to kill. The emphasis on direct patient care is reflected in how often the AI cited it in its recommendations, even when the model did not have any information about a contract. In one instance where it labeled every field not found, it still decided the contract was munchable. It gave this reason: Without evidence that it involves essential medical procedures or direct clinical support, and assuming the contract is for administrative or related support services, it meets the criteria for being classified as munchable.In reality, this contract was for the preventative maintenance of important safety devices known as ceiling lifts at VA medical centers, including three sites in Maryland. The contract itself stated:Ceiling Lifts are used by employees to reposition patients during their care. They are critical safety devices for employees and patients, and must be maintained and inspected appropriately. Specific services that should be classified as MUNCHABLE (these are "soft services" or consulting-type services):- Healthcare technology management (HTM) services- Data Commons Software as a Service (SaaS)- Administrative management and consulting services- Data management and analytics services- Product catalog or listing management- Planning and transition support services- Portfolio management services- Operational management review- Technology guides and alerts services- Case management administrative services- Case abstracts, casefinding, follow-up services- Enterprise-level portfolio management- Support for specific initiatives (like PACT Act)- Administrative updates to product information- Research data management platforms or repositories- Drug/pharmaceutical lifecycle management and pricing analysis- Backup Contracting Officer's Representatives (CORs) or administrative oversight roles- Modernization and renovation extensions not directly tied to patient care- DEI (Diversity, Equity, Inclusion) initiatives- Climate & Sustainability programs- Consulting & Research Services- Non-Performing/Non-Essential Contracts- Recruitment Services This portion of the prompt attempts to define soft services. It uses many highly specific examples but also throws in vague categories without definitions like non-performing/non-essential contracts.Experts said that in order for a model to properly determine this, it would need to be given information about the essential activities and whats required to support them. Important clarifications based on past analysis errors:2. Lifecycle management of drugs/pharmaceuticals IS MUNCHABLE (different from direct supply)3. Backup administrative roles (like alternate CORs) ARE MUNCHABLE as they create duplicative work4. Contract extensions for renovations/modernization ARE MUNCHABLE unless directly tied to patient care This section of the prompt was the result of analysis by Lavingia and other DOGE staff, Lavingia explained. This is probably from a session where I ran a prior version of the script that most likely a DOGE person was like, Its not being aggressive enough. I dont know why it starts with a 2. I guess I disagreed with one of them, and so we only put 2, 3 and 4 here.Notably, our review found that the only clarifications related to past errors were related to scenarios where the model wasnt flagging enough contracts for cancellation. Direct patient care that is NOT MUNCHABLE includes:- Conducting physical examinations- Administering medications and treatments- Performing medical procedures and interventions- Monitoring and assessing patient responses- Supply of actual medical products (pharmaceuticals, medical equipment)- Maintenance of critical medical equipment- Custom medical devices (wheelchairs, prosthetics)- Essential therapeutic services with proven efficacy For maintenance contracts, consider whether pricing appears reasonable. If maintenance costs seem excessive, flag them as potentially over-priced despite being necessary. This section of the prompt provides the most detail about what constitutes direct patient care. While it does cover many aspects of care, it still leaves a lot of ambiguity and forces the model to make its own judgements about what constitutes proven efficacy and critical medical equipment.In addition to the limited information given on what constitutes direct patient care, there is no information about how to determine if a price is reasonable, especially since the LLM only sees the first few pages of the document. The models lack knowledge about whats normal for government contracts.I just do not understand how it would be possible. This is hard for a human to figure out, Jaquith said about whether AI could accurately determine if a contract was reasonably priced. I dont see any way that an LLM could know this without a lot of really specialized training. Services that can be easily insourced (MUNCHABLE):- Video production and multimedia services- Customer support/call centers- PowerPoint/presentation creation- Recruiting and outreach services- Public affairs and communications- Administrative support- Basic IT support (non-specialized)- Content creation and writing- Training services (non-specialized)- Event planning and coordination This section explicitly lists which tasks could be easily insourced by VA staff, and more than 500 different contracts were flagged as munchable for this reason. A larger issue with all of this is there seems to be an assumption here that contracts are almost inherently wasteful, Coglianese said when shown this section of the prompt. Other services, like the kinds that are here, are cheaper to contract for. In fact, these are exactly the sorts of things that we would not want to treat as munchable. He went on to explain that insourcing some of these tasks could also siphon human sources away from direct primary patient care.In an interview, Lavingia acknowledged some of these jobs might be better handled externally. We dont want to cut the ones that would make the VA less efficient or cause us to hire a bunch of people in-house, Lavingia explained. Which currently they cant do because theres a hiring freeze.The VA is standing behind its use of AI to examine contracts, calling it a commonsense precedent. And documents obtained by ProPublica suggest the VA is looking at additional ways AI can be deployed. A March email from a top VA official to DOGE stated:Today, VA receives over 2 million disability claims per year, and the average time for a decision is 130 days. We believe that key technical improvements (including AI and other automation), combined with Veteran-first process/culture changes pushed from our Secretarys office could dramatically improve this. A small existing pilot in this space has resulted in 3% of recent claims being processed in less than 30 days. Our mission is to figure out how to grow from 3% to 30% and then upwards such that only the most complex claims take more than a few days. If you have any information about the misuse or abuse of AI within government agencies, reach out to us via our Signal or SecureDrop channels. If youd like to talk to someone specific, Brandon Roberts is an investigative journalist on the news applications team and has a wealth of experience using and dissecting artificial intelligence. He can be reached on Signal @brandonrobertz.01 or by email brandon.roberts@propublica.org.

0 Comments 0 Shares 7 Views 0 Reviews