Archive Info

You are currently browsing the PolyMicro Systems weblog archives for 'Personal' category

Macworld | How the iPhone is killing the ‘Net

Will the Indies soon be out of a job?

According to this article, it could very well be if the end points of the net are closed systems.

Macworld | How the iPhone is killing the ‘Net

macosxhints.com - 10.5: Revert Help Viewer to 10.4-like behavior

Yet another article to get Help back to its normal behavior.

macosxhints.com - 10.5: Revert Help Viewer to 10.4-like behavior

Joel on Software

Something to consider

Joel on Software

To reach this sweet spot, we borrowed an idea from Sakichi Toyoda, the founder of Toyota. He calls it Five Whys. When something goes wrong, you ask why, again and again, until you ferret out the root cause. Then you fix the root cause, not the symptoms.

Since this fit well with our idea of fixing everything two ways, we decided to start using five whys ourselves. Here’s what Michael came up with:

  • Our link to Peer1 NY went down
  • Why? – Our switch appears to have put the port in a failed state
  • Why? – After some discussion with the Peer1 NOC, we speculate that it was quite possibly caused by an Ethernet speed / duplex mismatch
  • Why? – The switch interface was set to auto-negotiate instead of being manually configured
  • Why? – We were fully aware of problems like this, and have been for many years.  But - we do not have a written standard and verification process for production switch configurations.
  • Why? – Documentation is often thought of as an aid for when the sysadmin isn’t around or for other members of the operations team, whereas, it should really be thought of as a checklist.

“Had we produced a written standard prior to deploying the switch and subsequently reviewed our work to match the standard, this outage would not have occurred,” Michael wrote. “Or, it would occur once, and the standard would get updated as appropriate.”

After some internal discussion we all agreed that rather than imposing a statistically meaningless measurement and hoping that the mere measurement of something meaningless would cause it to get better, what we really needed was a process of continuous improvement. Instead of setting up a SLA for our customers, we set up a blog where we would document every outage in real time, provide complete post-mortems, ask the five whys, get to the root cause, and tell our customers what we’re doing to prevent that problem in the future. In this case, the change is that our internal documentation will include detailed checklists for all operational procedures in the live environment.