maintenance-repair-spares

Maintenance, repair & spares economy

capability Semi-native manufacturing

TRL Mars

—

Energy intensity

—

Required by

Requires

The capability of keeping the settlement's machines running across 26-month resupply gaps with no instant supply chain — through reliability engineering, repair, remanufacturing, and provisioned spares. It turns every component's failure rate into a survival calculation: carry a spare, make one locally, repair the failed unit, or lose the function. The governing trade is the spares-mass problem — reliability alone cannot guarantee a long Mars mission, so in-situ repair and manufacturing must cover the gap.

Last reviewed: 2026-06-14

Governing equations

A = \frac{M T B F}{M T B F + M T T R}

Availability = mean-time-between-failures over (MTBF + mean-time-to-repair). For life-critical systems A must approach 1, which on Mars means either very high MTBF (heavy/expensive) or fast local repair (MTTR) — usually both. ^[1]

P (k failures in t) = \frac{( λ t ) ^{k} e ^{- λ t}}{k !}

Poisson statistics size the spares: given failure rate λ over the resupply interval t, how many spares give an acceptable probability of not running out. The 26-month t makes the required inventory large. ^[2]

m_{spares} \propto λ t_{resupply} m_{part} \Rightarrow repair beats stockpiling

Spare mass scales with failure rate × resupply gap × part mass — for a long mission this grows beyond what reliability alone can carry, which is the quantitative case for in-situ repair and manufacturing over pure stockpiling. ^[2]

commonality ↓ part-number count \Rightarrow spares mass ↓

Standardizing on fewer part types lets one spare cover many uses (pooling), cutting total inventory — design commonality is a first-order spares-mass lever. ^[2]

Key constants & quantities

Symbol	Value	Units	Conditions	Description
Resupply interval	26	months (launch-window-locked)	—	The gap a colony must survive without resupply — the t that drives the entire spares calculation, fixed by orbital mechanics.^[2]
Life-critical availability	0.999–0.99999	A (target)	—	Availability required of life-support and other life-critical functions — extreme, demanding redundancy plus fast repair.^[1]
Spares mass fraction	10–40	% of system mass over mission life	—	Order-of-magnitude spares mass a long Mars mission must carry under a stockpile-only strategy — large enough that repair/ISM is worth heavy investment.^[2]
Repair vs replace crossover	1	favors repair as λ·t·m grows	—	The point where repairing/remanufacturing beats stocking spares — reached quickly on Mars because t (26 mo) is large.^[2]
Locally-remanufacturable fraction	40–90	% of failed parts (with machine tools + AM)	—	Share of failures a colony with machining and additive manufacturing can repair or remake — the rest (chips, membranes, catalysts) need spares.^[3]

Operating envelope

Parameter	Range	Units	Source
Resupply interval	26 – 26	months	^[2]
Life-critical availability	0.999 – 0.99999	A	^[1]
Spares mass fraction	10 – 40	% mission mass	^[2]
Locally-remanufacturable	40 – 90	% of failures	^[3]
Mean-time-to-repair (local)	1 – 72	h (with local capability)	^[1]

Mass balance

Basis: sustaining the operating base across one 26-month resupply gap (capability)

Inputs

Provisioned spares (imported + local)	1	sized to λ·t	^[2]
Repair/remanufacturing capability	1	machine tools + AM + diagnostics	^[3]
Failed components (feedstock)	1	recycled/repaired	^[1]

Provisioned spares (imported + local): Critical irreplaceables (chips, membranes, catalysts, seals) stocked deep; commodity parts made locally.
Repair/remanufacturing capability: Machining, metal/polymer printing, electronics rework, welding — the local fix-it base.
Failed components (feedstock): Broken parts are repaired, remanufactured, or recycled to material — not waste.

Outputs

Sustained system availability	1	A → target	^[1]

Sustained system availability: Functions kept running through the gap; life-critical systems above their availability targets.

TRL · Earth

9/ 9

TRL · Mars

5/ 9

Reliability engineering and spares provisioning are mature, and the ISS proves long-duration maintenance with limited resupply (crews repair and replace constantly). In-space manufacturing has flown (3D printers on ISS). The Mars-specific gap is the FULL 26-month autonomy with deep local repair/remanufacturing at settlement scale — demonstrated in pieces, not as a whole.^[2]

Energy budget

0 kWh_e / the capability (energy lives in the machine-tools/AM/recycling it uses) ^[1]

Maintenance itself is not energy-heavy, but it is the multiplier on every other node's value: a reactor or chemistry plant that can't be kept running is worthless. Availability, not nameplate capacity, is what actually delivers over a mission.

Variants & trade-offs

Reliability + provisioned spares (baseline)

^[2]

Design for high MTBF and carry Poisson-sized spares for the resupply gap — the conventional approach, extended to Mars timescales.

Materials: High-reliability components · Spare inventory · Inventory management

Predictable; proven; simple for irreplaceable parts (chips, membranes, catalysts)
No local manufacturing needed for the stocked items

Spares mass grows large over a long mission; can't anticipate every failure
Obsolescence and inventory management overhead

When preferred: Irreplaceable/critical parts and early missions before local repair matures.

In-situ repair & remanufacturing

^[3]

Fix failed units and remake commodity parts locally with machining, additive manufacturing, welding, and electronics rework — repairing rather than stockpiling.

Materials: Machine tools + metal/polymer AM · Welding + electronics rework · Diagnostics + metrology

Slashes spares mass for the large fraction of parts that are locally makeable
Adapts to unanticipated failures; turns broken parts into feedstock

Can't remake chips, membranes, catalysts, or seeds; needs the manufacturing base and skills

When preferred: Commodity mechanical/structural parts; the core of a self-sufficient maintenance economy.

Redundancy + graceful degradation

^[1]

Run parallel/redundant units and design systems to degrade gracefully, so a single failure doesn't stop a function while repair proceeds.

Materials: Redundant units · Fault-tolerant architecture · Health monitoring

Maintains availability through failures; buys time for repair
Essential for life-critical functions

Extra mass/complexity; redundant units can share common-mode failures

When preferred: Life-critical and single-point-of-failure systems (power, life support, key rotating equipment).

Condition-based / predictive maintenance

^[1]

Monitor equipment health (vibration, temperature, performance) to fix things before they fail and to avoid unnecessary teardowns.

Materials: Sensors + instrumentation · Analytics/prognostics

Catches failures early; optimizes spare use and downtime
Reduces both surprise failures and wasteful preventive swaps

Depends on the very electronics/sensors that are themselves import-dependent

When preferred: High-value rotating equipment (compressors, pumps, reactor systems) across the plant.

Failure modes

Mode	Cause	Detection	Mitigation
Spare stock-out of an irreplaceable part (safety-critical)^[2]	A critical un-makeable component (chip, membrane, catalyst, seal) fails more often than stocked, with no local substitute and 26 months to resupply.	Inventory vs failure-rate tracking; reorder-point alarms against the resupply gap.	Deep spares on irreplaceables, redundancy, design commonality/pooling, and develop local substitutes where possible.
Common-mode failure defeats redundancy^[1]	Redundant units fail together from a shared cause (same dust exposure, same bad batch, same software bug) — redundancy gives false confidence.	Failure correlation analysis; diverse-redundancy review.	Diverse (not identical) redundancy where feasible, separate failure causes, stagger maintenance and part lots.
Cascading failure during a maintenance backlog^[2]	Multiple concurrent failures (e.g. after a dust storm) overwhelm repair capacity; backlog grows and availability collapses.	Repair-queue and availability trending.	Maintenance capacity margin, prioritization by criticality, redundancy to buy time, surge repair procedures.
Skill / knowledge gap^[3]	The crew lacks the expertise to diagnose or repair a failure — the information-closure problem made concrete.	Skills-coverage audit; repair success rate.	Broad cross-training (the academy), repair documentation/automation, remote Earth expert support (across light-lag), retained local knowledge.
Recycling-loop contamination^[1]	Remanufacturing from failed parts reintroduces contaminants or degraded material, propagating defects into "repaired" components.	Material/quality verification of remanufactured parts.	Material testing, quality control on remanufacture, segregate recycle streams, witness testing.

Mars adjustments

Reliability alone is not enough^[2]

Impact: The central finding of Mars-logistics analysis: for 26-month-gap missions, no achievable component reliability avoids large spares mass — so in-situ repair and manufacturing are not optional extras but a core requirement.

Mitigation: Pair high reliability with deep local repair/remanufacturing and redundancy; budget spares with Poisson math against the gap.

Repair beats stockpiling for makeable parts^[3]

Impact: Because spare mass scales with the long resupply gap, the large fraction of parts that machining + AM can remake is far cheaper to repair locally than to ship and store as spares.

Mitigation: Invest in the machine-tools/AM/recycling base; reserve imported spares for the un-makeable (chips, membranes, catalysts, seeds).

Dust accelerates wear everywhere^[4]

Impact: Pervasive abrasive dust raises failure rates on seals, bearings, and mechanisms across the colony — λ is higher on Mars, inflating both spares and repair demand.

Mitigation: Dust-tolerant design, sealed mechanisms, and maintenance intervals set by dust exposure (the recurring lesson across nodes).

Availability, not capacity, is what survives^[1]

Impact: A plant's nameplate capacity is meaningless if it's down; on Mars, sustained availability of life-critical functions is the real figure of merit, and maintenance is what delivers it.

Mitigation: Design and operate to availability targets; redundancy + fast local repair for life-critical systems.

Maintenance is the multiplier on the whole tree^[2]

Impact: Every other node's value is conditional on being kept running. A reactor, chemistry plant, or life-support system that can't be maintained for 26 months is not an asset — maintenance is what turns capability into sustained capability.

Mitigation: Treat the maintenance/repair/spares economy as foundational infrastructure, co-equal with the production nodes it sustains.

Alternatives & substitutes

Frequent resupply (Earth as the warehouse)^[2]

No local repair needed; just ship replacements

26-month gap makes this impossible for anything that fails between windows; massive cargo cost

When preferred: Never sufficient alone on Mars; supplements, not replaces, local capability.

Extreme reliability / overdesign^[2]

Fewer failures to fix; longer MTBF reduces spares

Reliability alone is provably insufficient for Mass-class durations; overdesign adds mass; surprises still happen

When preferred: Combined with repair and spares — not as a sole strategy (Owens' central finding).

Design for disposability + bulk spares^[1]

Simple modules swapped wholesale rather than repaired

High spares mass; wasteful of irreplaceable content

When preferred: Cheap, locally-makeable modules; not for high-value imported assemblies.

Requires

Inputs

References

O'Connor, P. D. T., & Kleyner, A. (2012). Practical Reliability Engineering, 5th Edition. Wiley. ISBN 978-0-470-97981-5. — Reliability engineering fundamentals: failure rates, MTBF, availability, maintainability, and spares provisioning.
Owens, A. C., & de Weck, O. L. (2015). Limitations of reliability for long-endurance human spaceflight. AIAA SPACE 2015 Conference, AIAA 2015-4611. doi:10.2514/6.2015-4611 — Quantifies the spares-mass problem for Mars-class missions: the 26-month resupply gap drives large spare inventories or in-situ repair/manufacturing.
Freitas, R. A., & Merkle, R. C. (2004). Kinematic Self-Replicating Machines. Landes Bioscience. ISBN 978-1-57059-690-2. — The definitive survey of self-replication theory and engineering: replication closure, the closure-fraction metric, and feedstock/parts/information closure.
Gaier, J. R., Ellis, S., & Hanks, N. C. (2002). Aeolian removal of dust types from photovoltaic surfaces on Mars. NASA Glenn Research Center, NASA/TM-2002-211837. NASA/TM-2002-211837. — Mars dust deposition + removal mechanisms on optical / radiator surfaces; α_s and ε degradation rates.

Governing equations

Key constants & quantities

Operating envelope

Mass balance

Inputs

Outputs

Variants & trade-offs

Failure modes

Mars adjustments

Reliability alone is not enough[2]

Repair beats stockpiling for makeable parts[3]

Dust accelerates wear everywhere[4]

Availability, not capacity, is what survives[1]

Maintenance is the multiplier on the whole tree[2]

Alternatives & substitutes

Frequent resupply (Earth as the warehouse)[2]

Extreme reliability / overdesign[2]

Design for disposability + bulk spares[1]

Requires

Inputs

References

Reliability alone is not enough^[2]

Repair beats stockpiling for makeable parts^[3]

Dust accelerates wear everywhere^[4]

Availability, not capacity, is what survives^[1]

Maintenance is the multiplier on the whole tree^[2]

Frequent resupply (Earth as the warehouse)^[2]

Extreme reliability / overdesign^[2]

Design for disposability + bulk spares^[1]