Categories
Feature Stories

THE WORKAROUND THAT BECAME THE SYSTEM

A Failure Hackers Story – A “temporary fix” quietly hardens into an operating model. Years later, the workaround is no longer a patch – it’s the platform. And when it breaks, it takes reality down with it.


1) “Just for This Release”

The first workaround was introduced on a Thursday at 18:47.

It started the way most dangerous things start: with good intentions.

ParkLight Finance was a UK fintech in its awkward adolescence – too big to improvise, too small to fully govern itself. Two hundred and forty people, three product lines, a growing set of enterprise clients, and a brand promise built around “real-time reconciliation.” They were competing against banks with budgets large enough to swallow a street.

Their platform moved money. Not in the sexy, cryptocurrency sense. In the boring, regulated, contract-bound sense: direct debits, payouts, refunds, chargebacks, settlement. The kind of money movement where an error can cause a regulatory incident, a client breach, and a five-week storm of escalations.

That Thursday, the team was preparing a release to support a new enterprise feature: partial refunds across multiple settlement windows.

It was complicated. It touched everything.

And on the final integration test, something didn’t add up.

The ledger totals – the numbers that had to align perfectly down to the penny – were off by £12.33.

Not a lot in absolute terms. But in reconciliation terms it was… existential.

Rina Patel, the Delivery Manager, stared at the report.

“Where’s the discrepancy coming from?”

Theo, one of the engineers, rubbed his eyes. “It only happens when a refund is split across settlement windows and the client’s billing schedule crosses midnight UTC.”

“Of course it does,” muttered someone from QA, a laugh that was more despair than humour.

The release window was tomorrow morning. An enterprise client had been promised the capability by end of week. Sales had already celebrated.

A few desks away, the CTO, Colin, looked at the clock and said the sentence that changed the next three years:

“We can’t slip. Create a manual adjustment step. Just for this release.”

It landed softly. Nobody gasped. Nobody argued. It sounded pragmatic.

Theo nodded. “So we’ll patch the ledger with a correction entry after processing?”

Colin waved his hand. “Yes. We’ll do a daily reconciliation sweep and apply a balancing transaction. We’ll fix the root cause next sprint.”

Next sprint. That phrase was the lullaby of deferred reality.

Rina asked the question she didn’t want to ask:

“Who will run the daily reconciliation sweep?”

Colin paused, then gestured toward Operations. “Ops can do it. It’s just a small check.”

Ops. The team who already carried the burden of every “temporary fix.”

In the corner, Nadia – the Ops Lead – was still in her coat. She’d been about to leave.

She heard the word “Ops” and slowly turned around.

“How small?” she asked.

Colin smiled. “Ten minutes. A simple spreadsheet.”

Nadia held his gaze. She had learned something in fintech: when an engineer says “simple spreadsheet,” it means “an invisible new system.”

But it was late. The client deadline was tomorrow. And everyone was tired.

So Nadia nodded.

“Fine,” she said. “Just for this release.”

No one noticed how quickly “just for this release” became “just for now.”


2) The Spreadsheet With a Name

The spreadsheet arrived in Nadia’s inbox at 09:02 the next morning.

Subject line: “Temporary Recon Fix ✅”

Attachment: Recon_Adjustment_v1.xlsx

Inside were a few tabs:

  • “Input” – copied totals from a database query
  • “Diff” – calculated discrepancy
  • “Journal Entry” – instructions for posting a balancing line
  • “Notes” – a single sentence: Delete when fix is deployed.

Nadia laughed once, sharply. Not because it was funny. Because it was familiar.

In the weeks that followed, the spreadsheet gained gravity.

It got renamed:

Recon_Adjustment_v2.xlsx
Recon_Adjustment_FINAL.xlsx
Recon_Adjustment_FINAL_v3_REAL.xlsx

It got a macro. It got conditional formatting. It got a “do not edit” warning and a password no one remembered. It got a pinned message in Ops Slack.

And then, as all workarounds do, it started to expand.

Because once you have a mechanism to correct one mismatch, you’ll notice others.

A settlement rounding issue appeared. The spreadsheet added a tab.
A delayed webhook created a timing drift. Another tab.
A client-specific rule created a mismatch. Another tab.

Soon the daily “ten-minute check” became a 45-minute ritual.

Nadia and her team would run the query, paste results into cells, check numbers, generate a correction entry, post it into the ledger, and then – because auditors existed – attach screenshots.

Then they would send a message to Finance:

“Recon complete ✅”

And Finance would breathe out.

The dashboards looked green again.

No one in leadership questioned why.


3) When a Workaround Feels Like Safety

Six months later, ParkLight closed another major contract.

The company grew. Teams split. Priorities shifted.

The “next sprint” fix for the refund logic never happened. It wasn’t that anyone forgot. It was that it didn’t compete with visible work.

Everyone agreed the workaround was “temporary,” which meant no one gave it a permanent home.

It didn’t belong to Engineering, because it wasn’t “product code.”
It didn’t belong to Ops, because it wasn’t “operations.”
It didn’t belong to Finance, because it wasn’t “accounting.”

So it belonged to… Nadia.

Workarounds always belong to the people who keep the lights on.

One morning, she noticed something subtle: the workaround was no longer a patch; it was being treated as a control.

Finance began asking, “Has the recon spreadsheet been run?” before approving payouts.

Sales began telling clients, “We reconcile daily.”

Compliance started referencing “daily manual reconciliation verification” in a risk register.

It had become part of the organisation’s identity.

And like any identity, it became defended.

When Nadia raised concerns about the increasing complexity, a senior leader replied:

“But it’s working, isn’t it?”

Yes. It was working.

That was the trap.


4) Symptoms That Look Like Normal

The failure didn’t arrive with alarms. It arrived with noise that sounded like ordinary life.

A missed check one day.
A slight delay in posting an adjustment.
An engineer who didn’t know why certain ledger entries existed.
A client asking why their settlement report included “manual balancing line items.”

Rina, the Delivery Manager, had moved teams but still remembered how this started. One afternoon she bumped into Nadia in the kitchen.

“How’s recon these days?”

Nadia smiled, tiredly. “It’s… a system.”

Rina frowned. “We were supposed to fix that.”

Nadia didn’t reply. She didn’t need to. The silence carried the truth: you can’t fix something if it isn’t visible as a problem.

Rina later found herself on Failure Hackers, reading about symptoms and workarounds – partly because it soothed her to see the pattern written down somewhere else.

She landed on a page called Symptoms of Project Failure.

The phrasing struck her:

Symptoms aren’t always dramatic. They’re often subtle signals that multiply.

Rina recognised ParkLight immediately.

Then she clicked something else:

Project Failure Workarounds 

The metaphor was painfully accurate:

A workaround is a bucket catching drips from a leak – useful, but finite.

Rina whispered, “We’ve built a plumbing system out of buckets.”


5) The Day the Bucket Overflowed

The breaking point came on a Monday at 08:06.

A junior Ops analyst named Jamie – new, conscientious, eager – followed the documented recon steps. The process had been “operationalised” now: a runbook, a checklist, and three pinned Slack messages.

Jamie ran the query. He pasted the results into the spreadsheet. The totals were off.

This wasn’t unusual. The spreadsheet existed because totals were off.

So he generated the balancing entry – a manual correction line – and posted it into the ledger system.

Then he sent the familiar message:

“Recon complete ✅”

By 09:20, Finance noticed something strange.

The ledger now balanced perfectly – but client settlement reports were wrong.

A big enterprise client’s report showed their payout total as £0.00, with a correction entry wiping it out.

At 09:45, Support escalated: the client had called, furious.

At 10:10, Compliance escalated: “Potential misstatement of settlement reporting.”

At 10:30, the CEO was pulled into a call.

At 10:43, Nadia walked into the Ops room and saw Jamie’s face.

He was white.

“I followed the runbook,” he said quietly.

Nadia looked at the spreadsheet and felt her stomach drop.

The spreadsheet had been updated last week by someone in Finance to accommodate a new client rule – and the macro now mapped the wrong account code for certain settlement types.

A single cell reference shift.
A single hidden assumption.
A single quiet change.

And the entire balancing system had just posted a correction entry that nullified a client payout.

It wasn’t fraud. It wasn’t incompetence. It was the natural outcome of building a core control mechanism out of something never designed to be one.

The workaround had become the system.

And the system had failed.


6) The First Response: Blame

By lunchtime, the crisis room was full.

CEO. CTO. Finance Director. Compliance. Ops. Support. Delivery.

The first instinct was predictable.

“Who changed the spreadsheet?”

Finance pointed at Ops. Ops pointed at Finance. Engineering pointed at “process.” Compliance pointed at everyone.

Jamie sat silently, crushed.

Nadia felt anger rising – not at Jamie, but at the fragility of the whole arrangement.

Rina, watching the blame begin to crystallise, interrupted.

“We’re not doing this,” she said.

The room paused.

She took a breath and said something that surprised even her:

“We need a blameless incident review.”

Colin, the CTO, scoffed. “This isn’t an incident. This is…”

“This is exactly an incident,” Rina replied. “A system failure. And we’re about to punish the wrong people.”

She pulled up a page on the screen:

How to Conduct a Blameless Incident Review 

Then she turned to the CEO.

“If we don’t learn properly, we’ll repeat this. In fintech, repeating is lethal.”

The CEO nodded slowly.

“Fine,” she said. “No blame. Find the truth.”

Jamie exhaled – the first breath he’d taken in an hour.


7) The Timeline That Changed the Conversation

They started with facts, not judgement.

Rina ran the review like a facilitator. She asked for a timeline:

  • What happened?
  • When did we first notice?
  • What signals were present earlier?
  • What assumptions shaped our choices?

As the timeline formed, a pattern emerged:

The spreadsheet workaround had become “normal operations.”
No one had formal ownership.
Changes were made quietly by whoever needed them.
Testing was minimal.
Controls were informal.
Training was tribal.

At one point, Compliance asked Nadia:

“Is this spreadsheet listed as a key financial control?”

Nadia hesitated.

“It’s… it’s not listed as anything,” she said.

The room went silent again – but this silence was different. It wasn’t avoidance. It was recognition.

They had discovered something terrifying:

A core control mechanism was invisible.


8) Naming the Workaround as a Workaround

After the incident review, Nadia sent a message to the wider leadership team:

“We need to talk about workarounds.”

Not “the spreadsheet.” Not “the recon process.” Not “the macro.”

Workarounds – as a category.

She linked the Failure Hackers page directly:

Project Failure Workarounds

Then she wrote:

“This was supposed to be temporary. It became permanent. That isn’t a people problem. It’s a system problem.”

She included a second link:

Project Failure Root Causes

Because she suspected the spreadsheet wasn’t the root cause. It was the symptom management mechanism, the bucket, and the root cause lived elsewhere.

The CEO replied within minutes:

“Agreed. Let’s find the root cause.”


9) What Was the Root Cause, Really?

In the next workshop, Rina asked the group to avoid the easy answer:

The easy answer: “The spreadsheet is bad.”
The real question: “Why did we need it?”

They started peeling layers:

  • Why did the ledger mismatch happen originally?
  • Why did we ship with a known discrepancy?
  • Why did we treat daily manual balancing as acceptable?
  • Why did the temporary fix never get removed?
  • Why did it keep expanding?

At first, it sounded like “technical debt.”

But as they dug, a deeper root cause appeared.

The root cause was not a bug.
It was not a spreadsheet.
It was not Jamie.

It was the decision structure and the incentive structure.

ParkLight rewarded:

  • shipping on time
  • satisfying sales promises
  • keeping dashboards green
  • avoiding delays

ParkLight did not reward:

  • slowing down to fix systemic integrity issues
  • surfacing hidden risks
  • investing in non-visible control work
  • saying “No” to unrealistic commitments

In other words:

The system rewarded buckets.
Not plumbing repairs.


10) The Workaround Inventory

Rina suggested something bold:

“Let’s list every workaround we run.”

Everyone laughed nervously.

“How many could there be?” someone asked.

Nadia replied, flatly: “More than you think.”

They created a shared doc called “Workaround Register” and for two weeks, asked every team:

What are you doing manually because the system doesn’t support it?

The list grew fast:

  • daily recon spreadsheet
  • manual settlement correction entries
  • “special handling” for one client’s chargebacks
  • weekly data cleanup script run by Support
  • manual toggling of feature flags for specific tenants
  • copy-paste compliance reporting from logs
  • manual approval of refunds above a threshold
  • operational “black book” of exceptions

By the end, they had 43 workarounds.

Some were small. Some were enormous.

But every single one was a signal.

Failure Hackers had described this precisely:

If you end up with too many workarounds, you risk them failing when you least need it. 

Nadia stared at the list and felt something she hadn’t expected: relief.

Because once you name it, you can see it.
And once you can see it, you can change it.


11) Rebuilding the Real System

They decided to treat the workaround register like an engineering backlog, but not owned by engineering alone.

For each workaround they asked:

  • What symptom does it address?
  • What cause creates that symptom?
  • What might be the root cause?
  • What’s the risk of workaround failure?
  • Who owns the decision to remove it?
  • What would “removal” even mean?

They used the incident as a forcing function to rebuild properly.

Three actions emerged:

A) Promote Critical Workarounds Into Formal Controls

For high-risk workarounds, they either:

  • formalised them as proper controls with ownership, testing, audit evidence; or
  • replaced them quickly with code/system changes.

No more invisible control mechanisms.

B) Remove Workarounds at Source

For the recon mismatch, engineering finally addressed the original settlement logic fault and introduced automated reconciliation with immutable audit logs.

It took six weeks of painful refactoring. But when it shipped, the daily spreadsheet ritual ended.

Nadia printed the old spreadsheet and pinned it on the wall like a trophy and a warning.

C) Create “Workaround Exit Criteria”

Every time a new workaround was proposed, it required:

  • a named owner
  • an expiry date
  • a measurable exit condition
  • a risk rating
  • an escalation path if it persisted

If the workaround couldn’t meet those conditions, it wasn’t allowed.

At first, engineers complained.

Then they noticed something: fewer emergencies, fewer late nights, fewer surprise client issues.

The company became… calmer.


12) The Moment of Quiet Pride

Six months later, Jamie the junior Ops analyst who had triggered the failure incident, walked into Nadia’s office.

“I wanted to say sorry,” he said.

Nadia looked at him, surprised.

“Sorry for what?”

“For… you know. The spreadsheet thing.”

Nadia shook her head.

“That wasn’t your fault.”

Jamie looked unconvinced.

Nadia leaned forward.

“That incident did something important,” she said. “It revealed the truth. And once we saw the truth, we fixed the system.”

Jamie blinked, processing.

“So… I helped?”

Nadia smiled.

“Yes,” she said. “You did. You were the signal.”


Reflection: How Workarounds Become Root Causes

Workarounds are not bad by default. In fact, they often prevent immediate harm.

But when a workaround is allowed to persist without ownership, expiry, and redesign, it becomes:

  • a hidden dependency
  • an invisible system
  • a fragile control mechanism
  • a source of new failure

Failure Hackers frames workarounds as interim fixes that should only exist until a permanent resolution is implemented. 

When a business accumulates too many workarounds, it increases the risk of them failing at the worst possible moment. 

The practical takeaways from this story:

  1. Treat workarounds as signals, not solutions.
  2. Maintain a workaround register as a living risk map.
  3. Require exit criteria for any workaround introduced.
  4. Make invisible controls visible or replace them.
  5. Go hunting for root causes, not just symptom relief: Project Failure Root Causes
  6. Use blameless learning practices after failure: How to Conduct a Blameless Incident Review

Author’s Note

This story is built around a pattern that appears in every sector, but is especially dangerous in regulated environments: the normalisation of temporary fixes.

Workarounds feel safe because they create immediate stability. But stability achieved through invisible manual effort is not resilience, it’s deferred risk.

If you recognise your organisation in this story, the goal isn’t to eliminate every workaround overnight. It’s to make them visible, reduce them deliberately, and stop them breeding in silence.

Related posts: