The importance of data security highlights Daewooâ€™s commitment to improving storage efficiency

Story One: Lessons, "Blood Case" Triggered by Data Loss

Time: Summer 2006; Venue: Suzhou; Cause: A city-level monitoring project uses a digital-analog method. A front-end analog camera accesses an encoder and uploads the encoded video to a monitoring center for centralized management, control, and storage through video platform software. . The city adopts a multi-level networking mode, with each administrative district as a sub-center, and a unified city-level monitoring center to manage the city-wide sub-centers. The sub-centers respectively video-recorded 7*24 hours, using the IPSAN mode. One of the sub-centers that emerged from this event was the High-Tech Zone Center.

One day, in the Hidden HD Center, system integration company resident personnel found in a routine inspection that the video within a certain time period could not be found at all, and the iSCSI volume mounted on the server also disappeared. People realized that the situation was complicated and they needed to immediately contact the superior and the factory to the site for support.

After the personnel arrived at each other and checked each device one by one, the problem was found on the disk array. In this accident, the entire system plan and equipment selection, the safety and reliability of the disk array configuration is not the best, but it is certainly in the forefront, but the problem lies in this equipment, can be thought However, the pressure on the shoulders of equipment manufacturers and system integrators at that time was very great.

After the incident was ascertained, a piece of hard disk on the iSCSI disk array had a damaged hard disk. After the hot spare disk was replaced and recovered, the bad hard disk could not be replaced in time. As a result, the subsequent hard disk could not be restored due to continuous hard disk damage, resulting in two hard disks. Offline, raid5 is damaged. So far, the facts are clearer, but deep-rooted reasons need to be further explored.

First, the apparent cause of the accident was damage to the hard disk. However, we know that hard drives are "consumables" in the field of large-capacity data storage. Hard disk damage is inevitable, and what we need to do is replace them in time. How can it be "timely"? This tests the device's alarm mechanism. For the maintenance of equipment, we usually take both active and passive approaches. For active relative equipment, there is a need for a comprehensive warning mechanism to report factors such as faults, problems, and events. The passive type requires manual intervention, establishes a complete inspection mechanism, discovers various hidden troubles and points of failure in time, and restores them. Active and passive combination can make the system run stably. From this accident we can see that the entire branch center has problems with the maintenance of the equipment. Device alarm mechanism is not perfect, resulting in hard disk damage failed to alarm in time and failed to cause management personnel attention; hot spare disk is not updated, the device does not report this hidden event to the administrator, resulting in a hot spare disk vacuum period, resulting in the event as a whole Walking into the abyss of raid damage.

Second, the center has problems with the choice of equipment. In general, in order to ensure stability and reliability, centralized storage devices will select professional products, use controller architecture, modular design products to ensure that there is no single point of failure, device control of hard disks and raid through professional hardware chips, and equipment The active alarm uploading can be performed in various ways (such as SNMP, Email, Syslog, Windows Messenger, etc.), and various events and device status can be presented in the system in a timely manner to provide the most timely help for maintenance work. On the other hand, the subcenter's equipment only provides a means for the hard disk and raid alarm mechanism, which is very thin, and causes certain obstacles to system maintenance.

Finally, the center equipment is a typical entry-level product in China. It adopts a PC server architecture and fails to implement a modular cableless connection internally. The system has a single point of difficulty in the management of hard disks and raids, increasing the risk of dropped hard disks. , resulting in overall stability is not satisfactory, but also caused a major hidden danger of this accident.

After the accident, Party A's customers were very sad about the loss of the video, and the trust of system integration companies and equipment suppliers fell to the bottom, resulting in the subsequent period of expansion projects no longer consider the brand products, and in the entire region The brand product is also not recommended in similar projects within. For equipment providers, this is a bloody lesson.

Komatsu Other Parts

Komatsu Other Parts,Komatsu Used Parts,OEM Komatsu Parts,Komatsu High Copy Parts

JINING SHANTE SONGZHENG CONSTRUCTION MACHINERY CO.LTD , https://www.sdkomatsuloaderparts.com