In this article we propose an online fault detection and avoidance framework for distributed multi-agent systems. The main premise of our work follows the recent notions in autonomic and recovery oriented computing - not all faults can be determined and removed at the time of system design and testing. Thus some level of intelligence must be embedded into each agent's controller to ensure higher degree of system dependability. We assume that faults will eventually translate into a time-out condition in one or more agents. The proposed paradigm is illustrated for the case of unknown deadlock conditions in manufacturing applications.