CUCM DRF Failures

Posted: 16th January 2012 by Rob in Cisco, CUCM

Last week we had a RAID Controller fail one of our Cisco C200 UCS servers.  It happens.  For us it seemed to be just a mere annoyance at first, it took down two of our CUCM subscribers and our TFTP server.  Phones failed over to anothe subscriber and our DR TFTP started serving up TFTP requests.  After a quick a quick call to TAC Cisco had our replacement RAID controller to us in about an two hours.  We powered the server down, replaced the controller and we were on our merry way.  Or so we thought…

The next morning we started getting alerts from RTMT that our backs were failing for two of the CUCM servers that were affected by the outage.  I tried the usual process of restarting DRF services on those two subscribers and kicked off a manual backup.  Failure.  Working with TAC for about 15 minutes they had it pegged.  Corrupt File system on these servers.  Solution?  CUCM recovery CD.  A quick fsck with the recovery cd and all was right with the world again.

Here is the error we were getting:

Reason : DRF was unable to backup component PLATFORM.Error : Unknown PLATFORM Error
AppID : Cisco DRF Master
ClusterID :
NodeID : ahpucm1020
TimeStamp : Fri Jan 13 09:13:53 CST 2012.
The alarm is generated on Fri Jan 13 09:13:53 CST 2012.

Hopes this helps someone who runs across this error in the future!