Chapter V. System redundancy

At the design stage of a solar power plant, in order to ensure the required reliability, in many cases it is necessary to at least duplicate individual elements and even individual systems, i.e. use reservation.

Redundancy is characterized by the fact that it allows increasing the reliability of the system compared to the reliability of its constituent elements. Increasing the reliability of individual elements requires large material costs. Under these conditions, redundancy, for example, through the introduction of additional elements, is an effective means of ensuring the required reliability of systems.

If, when connecting elements in series, the overall reliability of the system (i.e., the probability of failure-free operation) is lower than the reliability of the most unreliable element, then with redundancy, the overall reliability of the system can be higher than the reliability of the most reliable element.

Redundancy is achieved by introducing redundancy. Depending on the nature of the latter, reservation is:

Structural (hardware);

Informational;

Temporary.

Structural redundancy lies in the fact that additional elements, devices are introduced into the minimum required version of a system consisting of basic elements, or even instead of one system, the use of several identical systems is provided.

Information backup involves the use of redundant information. Its simplest example is the repeated transmission of the same message over a communication channel. Another example is the codes used in control computers to detect and correct errors resulting from hardware malfunctions and failures.

Temporary reservation involves the use of excess time. Resuming the functioning of the system, interrupted as a result of a failure, occurs by restoring it if there is a certain amount of time.

There are two methods for increasing system reliability through structural redundancy:

1) general redundancy, in which the system as a whole is redundant;

2) separate (element-by-element) redundancy, in which individual parts (elements) of the system are reserved.

Schemes of general and separate structural redundancy are presented in Fig. 1, respectively. 5.3 and 5.4, where n is the number of consecutive elements in the circuit, m is the number of backup circuits (with general redundancy) or reserve elements for each main one (with separate redundancy)

When m=1 there is duplication, and when m=2 there is tripling. Usually they strive to use separate redundancy whenever possible, since the gain in reliability is often achieved at significantly lower costs than with general redundancy.

Depending on the method of inclusion of reserve elements, a distinction is made between permanent reservation, replacement reservation and sliding reservation.

Permanent reservation – This is a reservation in which backup elements participate in the operation of the facility along with the main ones. In the event of a failure of the main element, no special devices are required to activate the backup element, since it is put into operation simultaneously with the main one.

Reservation by substitution – This is a redundancy in which the functions of the primary element are transferred to the backup one only after the failure of the main one. When redundant by replacement, monitoring and switching devices are needed to detect the failure of the main element and switch from the main to the backup.

Rolling reservation – is a type of reservation by replacement, in which the main elements of an object are backed up by elements, each of which can replace any failed element.

Both types of reservation (permanent and replacement) have their advantages and disadvantages.

The advantage of permanent reservation is its simplicity, because in this case, monitoring and switching devices are not required, which reduce the reliability of the system as a whole, and, most importantly, there is no interruption in operation. The disadvantage of constant redundancy is the disruption of the operation mode of the backup elements in the event of failure of the main ones.

Enabling a reserve by replacement has the following advantage: it does not disrupt the operating mode of reserve elements, preserves the reliability of reserve elements to a greater extent, and allows the use of one reserve element for several workers (with sliding reservation).

Depending on the operating mode of the reserve elements, a distinction is made between loaded (hot) and unloaded (cold) reserve.

Loaded (hot) spare in the energy industry it is also called rotating or switched on. In this mode, the backup element is in the same mode as the main one. The resource of reserve elements begins to be consumed from the moment the entire system is put into operation, and the probability of failure-free operation of reserve elements in this case does not depend on at what point in time they are put into operation.

Lightweight (warm) reserve characterized by the fact that the reserve element is in a less loaded mode than the main one. Therefore, although the resource of the reserve elements also begins to be consumed from the moment the entire system is turned on, the rate of resource consumption of the reserve elements until they are turned on instead of the failed ones is significantly lower than in operating conditions. This type of reserve is usually placed on units operating at idle speed, and, therefore, in this case, the resource of the reserve elements is used less compared to operating conditions when the units carry a load. The probability of failure-free operation of the reserve elements in the case of this type of reserve will depend on both the moment of their inclusion in work, and on how different the laws of probability distribution of their failure-free operation are in working and standby conditions.

When unloaded (cold) reserve backup elements begin to consume their resources from the moment they are put into operation instead of the main ones. In the energy sector, this type of reserve is usually used by disconnected units.

Reliability calculations for systems with parallel-connected elements depend on the redundancy method.

SYSTEM RELIABILITY WITH CONSTANT GENERAL REDUNDANCY

We will assume that the reserved and backup elements are equally reliable, i.e.
And
. For convenience, the probability of failure-free operation and the occurrence of failures of individual elements is denoted in capital letters in this and the following sections.

Taking into account the equivalent circuit (Figure 5.5) and formula (5.18), the probability of failure of a system with m backup circuits can be calculated as follows:

, (5.22)

Where (t) – probability of failure of the main circuit,
– probability of failure of the i-th backup circuit.

Accordingly, the probability of failure-free operation of the system

(5.23)

In accordance with formula (5 8) we have

(5.24)

With equal probabilities of failures of the main and backup circuits
formulas (5 22) and (5 23) take the form:

, (5.25)

(5.26)

Average system uptime with general redundancy

(5.27)

Where – system failure rate,
, – failure rate of any of the (m+1) circuits, – failure rate of the i-th element

For a system of two parallel circuits (m=1), formula (5.27) takes the form:

(5.28)

The average system recovery time in the general case is determined by the formula

(5.29)

Where – average recovery time of the i-th chain.

For the special case m=1, formula (5.29) takes the form:

Example 5.2.

Calculate the probability of failure-free operation for 3 months, the failure rate, the average time between failures of a single-circuit overhead line with a length of l = 35 km together with a 110/10 kV step-down transformer and switching equipment (Figure 5.6).

The reliability equivalent circuit of the SES under consideration is a sequential structure (Figure 5.7)

Failure rates of elements are taken from Table 3.2:

;

According to formula (5.7), we determine the failure rate of the power supply circuit

This calculation shows that the dominant influence on circuit failure is the damage of the overhead line. Mean time between failures of a power supply circuit

Probability of failure-free operation of the circuit for t=0.25 years

Example 5.3.

Determine how much higher the reliability indicators of a 110/10 kV step-down transformer substation are with constant joint operation of both transformers for 6 months compared to a single-transformer substation. We neglect switching device failures and intentional shutdowns.

Initial data taken from table. 3.2 are as follows:

;

Probability of failure-free operation for one transformer for 6 months

Average time between failures of one transformer

Probability of failure-free operation of a two-transformer substation, calculated using formula (5.20):

Average time between failures of a two-transformer substation, calculated according to formula (5.28):

years

Failure rate of a two-transformer substation

Average recovery time of a two-transformer substation (see formula (5.30))

Analysis of the results shows that the reliability of a two-transformer substation is much higher than the reliability of a single-transformer substation.

Example 5.4.

Let's consider a 6 kV switchgear section, from which 18 outgoing lines are powered (Fig. 5.8). The failure rate of switches accompanied by short circuits is estimated by the value = 0,003
, failure rate with

short circuits for busbars per connection
(see table 3 2). Determine the intensity of short-term redemptions of the switchgear section, assuming the absolute reliability of the automatic transfer switch (ATI) and the Q2 switch that reserves power for the section.

Classification of reservation methods. One of the main means of ensuring the required level of reliability and, above all, the failure-free operation of an object or electrical system with insufficiently reliable elements is redundancy.

Under reservation refers to the use of additional means and capabilities in order to maintain the operational state of the electrical system in the event of failure of one or more of its elements. Redundancy is an effective way to create electrical systems whose reliability is higher than the reliability of the elements included in the system.

When making a reservation they differ essential elements structures necessary for the system to perform the required functions in the absence of failures of its elements, and reserve elements, designed to perform the functions of the main elements in the event of their failure.

Ratio of number of reserve elements etc systems to the number of basic elements they reserve By, expressed as an unreduced fraction is called the reserve ratio

m p = n p /n o .

Redundancy with a reserve ratio of one to one m р = 1/1 is called duplication.

Additional tools and capabilities used in redundancy include elements added to the system structure as backup, the use of functional and information tools and capabilities, the use of excess time and reserves of load capacity. Accordingly, according to the type of additional funds, they distinguish structural redundancy using reserve elements of the object structure, functional using functional reserves, informational using information reserves, temporary using time reserves and load using load reserves (Fig. 3.28).

In ES, structural redundancy is most often used; other types of redundancy are also used. Thus, with functional redundancy, multifunctional elements of automation equipment are sometimes used, and if they fail, they can be used in a given system for other purposes; functional redundancy is also carried out with different methods of operation, for example, by transmitting information in different ways depending on what elements of the system remained operational. Information redundancy is used in systems where the occurrence of a failure leads to the loss or distortion of some part of the processed or transmitted information. Temporary reservation can be carried out by increasing the productivity of the object, the inertia of its elements, and repeating individual operations with a time shift. Load redundancy is expressed in ensuring optimal reserves of the ability of elements to withstand the loads acting on them or in introducing additional protective or unloading elements into the system to protect some of the main elements of the system from the loads acting on them.

Based on the method of switching on the reserve, a distinction is made between permanent and dynamic reserve. Permanent reservation is carried out without restructuring the structure of the system when a failure of its element occurs, and dynamic reservation- with the restructuring of the system structure in the event of: failure of its element.

In the simplest case, with constant redundancy, a parallel or serial connection of elements is performed without switching devices, and with dynamic redundancy, switching devices are required that respond to element failures.

Dynamic redundancy is often a redundancy substitution, in which the functions of the main element are transferred to the backup one only after the failure of the main element.

A common type of replacement redundancy is rolling redundancy, in which a group of primary system elements is backed up by one or more redundant elements, each of which can replace any failed primary element in the group.

The operation mode of the reserve elements before failure of the main element differs loaded reserve(one or more backup elements are in primary element mode), light reserve(one or more backup elements are in a less loaded mode than the main element) and unloaded reserve(one or more reserve elements are in an unloaded mode before they begin to perform the functions of the main element).

The concepts of loaded lightweight and unloaded reserve are used to distinguish reserve elements by their level of reliability. Elements of a loaded reserve have the same level of reliability (failure-free operation, durability and storability) as the main elements of the object they reserve, since the resource of the reserve elements is consumed in the same way as the main elements. Light reserve elements have a higher level of reliability, since the intensity of resource consumption of reserve elements until they are switched on instead of failed ones is significantly lower than that of the main ones. With an unloaded reserve, the resource of the reserve elements begins to be consumed almost only from the moment they are switched on instead of the failed elements.

Fig.3.28. Classification scheme of types of reservation

According to the method of reserving an object (element of an object), a distinction is made between general and separate reservation. At general reservation the object as a whole is reserved; instead of one object, the simultaneous operation of two or more objects of the same type or similar in functions is provided. The method is simple and widely used in practice when backing up the most critical systems. At separate reservation Redundant are individual elements of an object or their groups, which are usually built into the object; both individual elements of the system and fairly large parts of it (blocks) can be separately reserved.

Dynamic redundancy can be separate and general and allows the use of reserve elements not only in loaded, but also in light and unloaded reserves, which allows saving the resource of reserve elements, increasing the reliability of the electrical system as a whole and reducing energy consumption.

When redundant by replacement, sliding redundancy can be used to ensure the required reliability of the system at low cost and a slight increase in its weight and dimensions.

The disadvantages of dynamic redundancy by replacement include the need for switching devices and interruptions in operation when switching to backup elements, as well as a system for searching for a failed element or block, which reduces the reliability of the entire redundant system. Redundancy by substitution is advisable to use for redundancy of fairly large functional units and blocks of complex electrical systems.

Permanent redundancy, which involves constant connection of elements with the main ones, is simple; switching devices are not needed. If the main element fails, the system continues to operate normally without interruption and without switching. The disadvantages of constant redundancy are the increased resource consumption of backup elements and changes in the parameters of the redundant node when elements fail.

Permanent redundancy is used in critical systems for which even a short-term interruption in operation is unacceptable, and when reserving relatively small elements - units, blocks and elements of ESA electronic equipment (resistors, capacitors, diodes, etc.).

Redundancy of electrical and radio elements included in the ESA, the failure of which can lead to particularly dangerous consequences, is carried out taking into account the possibility of both short circuits and element breaks. In case of element breaks, redundancy is performed by connecting them in parallel, and in case of short circuits - by connecting elements in series, assuming that the element fails, but the electrical circuit of other elements connected in series with it is not disrupted. For example, permanent separate redundancy of a diode with a loaded reserve in case of failure as a result of a short circuit (SC), open circuit or short circuit and open circuit is carried out by switching on the backup diodes respectively in series, parallel and series-parallel to the main one (Fig. 3.29, a, c).

General permanent rectifier redundancy UD the loaded reserve is performed by parallel connection of the reserve, and diodes are used to prevent the current of the reserve rectifier from flowing through the output circuit of the failed one (Fig. 3.29, G). General redundancy of the rectifier with an unloaded reserve is carried out using the device A switching, which receives the CO signal about failure and supplies the control signal DC to the switch QW to turn off the failed rectifier and turn on the backup one (Fig. 3.29, d).

Permanent reservation. Such redundancy can be carried out by parallel or serial connection to the main element (system) of one or more backup ones, performing the same functions as the main element (system). Such redundancy is performed, for example, during parallel operation of generators, computers, ESA units, resistors, etc., as well as when connecting diodes, break contacts, capacitors, etc. in series. d.

Electrical systems with permanent backup are manufactured so that failed elements do not affect the operation of the system as a whole. The consequences of failure of elements with constant redundancy in extreme cases can be: short circuit or break of one or more elements, which must be taken into account when designing the system. To do this, limiting resistances are introduced, the

Rice. 3.29. Typical structural redundancy schemes:

a B C - diode V.D. respectively, in case of failure such as short circuit, open circuit, short circuit and open circuit;

g, d - rectifier UD respectively with loaded and unloaded reserve

dividing transformers, and also increase the tolerances of individual system parameters, etc.

Permanent reservation provides for a loaded reserve and can be general or separate; In the block diagram, to calculate reliability, the main and reserve elements are connected in parallel (Fig. 3.30).

Rice. 3.30. Schemes of general (a) and separate (b) permanent reservation

An electrical system with general redundancy (Fig. 3.30, a) will function normally if at least one of them remains operational t+1 parallel circuits consisting of series-connected elements. Probability of failure-free operation of each i-th circuit with P series-connected elements taking into account (3.68) in time t(to simplify the recordings, the time is not indicated in the future)

P i =(3.95)

Where Р ij- probability of failure-free operation of the j-th element of the i-th circuit. The probability of failure-free operation of a system with general redundancy of m + 1 parallel circuits is found taking into account (3.72) and (3.95):

R s.o = (3.96)

With equal reliability of all elements Р ij = Р e, formula (3.96) will take the form

P s.o = 1 - (1 - P e n) m +1. (3.97)

For a given probability of failure-free operation of the electrical system s.o based on (3.97) we can determine the required value T, under which the condition с.о = Р с.о is satisfied, i.e.

t o =

With an exponential distribution law for the elements of the system P e = exp(- λ e t) the probability of failure-free operation (3.97) and the average time to failure of the system are determined by the formulas

P с.о (t) = 1 - m +1 ;

where = pλ e - circuit failure rate from P elements; T av = 1/ - average time to failure of one circuit.

A wind farm with separate redundancy assumes the constant inclusion of reserve elements in individual sections of the system (Fig. 3.30.6).

Probability of failure-free operation of a separate redundant system element

and the entire system with separate redundancy

(3.99)

If all elements are equally reliable, (3.99) will take the form

Р с.р = n, (3.100)

from where, for a given probability of failure-free operation of the system, the corresponding value is determined

With the exponential distribution law of equally reliable elements P e = exp (-λ e t) the probability of failure-free operation

R s.r (t) = (1 - m +1 ) n (3.101)

and average time to failure of the system

Where v i = (i + 1) /(m + 1); λ = λ e.

The increase in ES reliability as a result of redundancy can be assessed by the ratio of the probability of failure of the main non-redundant system

and redundant system

With equal reliability of the main and backup systems

γ pe з = l/Q i m = l/Q o m .

An important conclusion follows from the obtained ratio: the greater the probability of system failure (the less its reliability), the less the effect of redundancy. From this conclusion, sometimes called the paradox of reservation, we can conclude the following:

the possibility of redundancy does not remove the task of increasing the reliability of redundant elements and systems;

general system redundancy, other things being equal, is less profitable than separate one, since the probability of failure of part of the system is less than the probability of failure of the entire system.

Under the exponential distribution law of time to failure, the probability of failure of a redundant system

Q p (t)=Q o m+1 (t)= m+l ,

where λ o = const is the failure rate of one redundant system.

In practice, usually λ o t< 0,1 тогда

Q o (t)≈ λ o t = t/T cp And

Q P (t) ≈ (λ o t) m +1 = (t/T cp) m +1,

where T av =1/λ o - average time to failure of the redundant system.

Taking into account the above ratios, the gain from reservation can be represented in the form

γ res ≈ (T avg /t) m.

It follows that the gain from reservation decreases as the required time increases t system operation.

The reliability of redundant ES is greatly influenced by the restoration of the main or backup systems (circuits) immediately after their failure. In steady state operation, the probability of circuit operability with an average recovery time Tv. av and mean time between failures That at an arbitrary point in time (except for planned periods during which its intended use is not envisaged) represents the availability factor of the circuit.

TO g =

since in most practical problems T v.sr /T o<< 1.

Accordingly, the probability of circuit failure can be defined as the probability of inoperability

Q o (t) = 1 - K T ≈ T c. cf /T o .

Then increasing the reliability of the redundant ES with recovery immediately after the failure of the main or backup systems

γ pe з = l/Q o m ≈ (T o /T in. с p) m ≈ const.

As you can see, the qualitative difference between redundancy with restoration and redundancy without restoration is that when restoring, the cut, to a first approximation, does not depend on the operating time t. Consequently, the benefits of redundancy with recovery increase compared to redundancy without recovery as the required operating time increases. t. At the same time, it should be borne in mind that recovery immediately after a failure can be realized with constant monitoring, the technical means of which should have a probability of failure significantly less than that of the monitored system.

Separate redundancy is more effective in terms of increasing the reliability of the ES, especially at large n (Fig. 3.31). This is explained by the fact that for a system failure with general redundancy, it is enough that one element from each circuit fails, and with separate redundancy, it is enough for all elements in any group to fail.

Of practical interest is the question of choosing a rational way to increase the reliability of an electrical system: using redundancy or by selecting highly reliable elements. If from the point of view of weight, dimensions and cost both ways are equivalent, then when solving this issue the most important is the required duration of continuous operation of the system t.

The influence of time t for trouble-free operation P c . p(t) The ES of two identical blocks, working and backup, with a loaded reserve can be determined using formulas (3.98) with m = 1 and n = 1:

R s.r (t) = 2exp (-t/T avg.b)- exp (-2t/T cp . 6);

T avg = 1.5 T avg. b, (3.103)

Rice. 3.31. Dependences of the probability of failure-free operation of electrical systems with common (1) and separate (2) redundancy depends on the number of reserve elements with different numbers of consecutive elements

Rice. 3.32. Dependence of the probability of failure-free operation of the system on time with a loaded reserve (1) and with increased reliability of the unit (2)

where T avg.b = 1/λ 6 - average time to failure of one unit; λ b- failure rate of one unit of the redundant system.

For a non-redundant electrical system of one unit of increased reliability with the same mean time to failure T Wed, like a redundant system (3.103), the probability of failure-free operation will be

P sn (t) = exp[- t/(1.5T av. b)]. (3.104)

Dependencies (3.103) and (3.104) show that redundancy is more effective than directly increasing the reliability of the unit in the initial period of system operation t< 2Т ср.б, при t >> 2T c r.b., on the contrary, it is more effective to increase the reliability of the block (Fig. 3.32).

Constant series-parallel connection of mutually redundant elements is used in cases where failures such as short circuits and open circuits are possible. For example, a capacitor may fail due to loss of capacitance as a result of an open circuit or due to breakdown due to a short circuit; relay contacts may fail due to their oxidation (break) or due to their “welding” or “sticking” (SC), etc. (see Table 3.7).

Taking into account the possibility of failures such as open circuits and short circuits, in many cases a permanent series-parallel connection of four mutually redundant elements is used (Fig. 3.33). When short-circuit element failures predominate

Q short (t) > Q o 6 (t),

Rice. 3.33. Constant series-parallel connection of mutually redundant elements in case of failures mainly: short circuit type (A) and cliff (b)

where Q kz (t) and Q o 6 (t) - the probability of an element failure of the short circuit type and breakage, respectively, series-parallel switching circuits without a jumper are used (Fig. 3.33, a), and when failures of the breakage type predominate

Q short (t)< Q об (t) -

Series-parallel circuits with a jumper (Fig. 3.33, b).

Probability of failure of the redundant circuit in case of failures such as open Q r.b. (t) and short circuit type Q r.kz (t) for the required period of operation t is a function of element failure probabilities Q short (t) And Q o b (t) and depends on the redundancy scheme used and the type of failure (Table 3.13).

From those given in table. 3.13 relations it follows that the efficiency γ res of series-parallel redundancy decreases as the probability of failure of a circuit element increases. At a certain critical value Q short (t) or Q about (t) the probability of failure of the redundant circuit becomes greater than the probability of failure of one element, then the use of series-parallel redundancy becomes impractical. Taking into account the reliability and accuracy of a priori information about the reliability of elements, it is usually recommended to use series-parallel redundancy in cases where the probability of failure of a circuit element is Q short ( t) 0,l and Q o 6 (t) 0,l.

Table 3.13.

Design ratios for series-parallel connection

four elements

Rice. 3.34. General (a) and separate (b) dynamic reservation schemes

with switching devices

Dynamic reservation. With such redundancy, it becomes possible to use a lightweight or unloaded reserve, if the interruptions in the operation of the ES necessary to enable the reserve are acceptable, and there is a need to use additional elements - switching devices to connect the reserve. The switching on of reserve elements can be done manually or automatically; switching devices can be separate or common for parallel-connected elements or circuits (blocks) of the electrical system (Fig. 3.34).

If we neglect the influence of switching devices and consider them absolutely reliable, then with a loaded reserve, the reliability of an ES with dynamic redundancy will be equal to the reliability of a system with a constantly switched on reserve. In light-weight and unloaded reserves, dynamic redundancy increases system reliability.

The influence of the reliability of switching devices on the reliability of a redundant system is quite simply taken into account for systems with a loaded reserve.

WPP with general redundancy and loaded redundancy in normal mode, all switches are turned on, both the main and backup circuits are switched on P elements are under load. If the main circuit fails, switch K . turns it off; in case of failure of the first backup circuit, it is turned off by switch K1, etc.

System failure occurs when the main and all backup circuits, consisting of P elements and switch TO each. Assuming that switches and system elements fail independently, we can find the probability of failure-free operation of one circuit from P elements

and the probability of failure-free operation of the entire system of m + 1 such parallel circuits

R s.o = ,(3.105)

Where Pki- probability of failure-free operation of the circuit breaker of the i-th circuit.

With equal reliability for all P elements Р e and equal reliability of switches P k, formula (3.105) will take the form

P s.o = 1 - (1 - P k P e n) m +1 . (3.106)

From (3.106) for a given value Р с.о = find the required value of the number of backup circuits

With an exponential distribution law for elements P e = exp(- λ e t) and switches P k = exp(- λ k t) systems, the average time to failure and the probability of failure-free operation of the system are determined by formulas (3.98), in which in this case the failure rate of the circuit is calculated by the formula

WPP with separate redundancy and loaded redundancy, all switches TO during the initial period of system operation, they are turned on; if any main or reserve element fails, the corresponding switch turns off this failed element. System failure occurs when any main element j (or its switch K) and all its redundant elements fail i(or all their switches K i).

Probability of failure-free operation of the entire system with separate redundancy, taking into account the probability of failure-free operation of switches

(3.107)

For a system with equally reliable elements and switches, expression (3.107) will take the form

R s.r = n. (3.108)

With an exponential distribution law for elements λ e = const and switches λ k = const, the values of T av.r and P a.r. are calculated using formulas (3.101) and (3.102), in which in this case they take

λ = λ e + λ k.

From the obtained formulas it is clear that with dynamic redundancy with a loaded reserve due to the presence of switching devices K, the system reliability indicators are lower compared to constant redundancy. It is advisable to use dynamic redundancy with a loaded reserve in cases where interruptions in system operation are unacceptable and the failed element (system) must be turned off so that there is no sudden change in the operating mode of the redundant system.

Calculations using formulas (3.106) and (3.108), which determine the probability of failure-free operation of the systems presented in Fig. 3.34, show that with the same reliability of the elements and the same fairly high reliability of the switches at the same values P And T the probability of failure-free operation of an ES with separate redundancy and a switch in each element is higher than that of an ES with general redundancy and a switch in each circuit.

Thus, separate reservation is more efficient than general reservation, and in the case of dynamic reservation.

The effectiveness of dynamic reservation increases when it is implemented in the form of replacement reservation with an unloaded or lightweight reserve. Reservation by replacement with an unloaded reserve is considered below; It is obvious that reliability indicators with a light reserve will have intermediate values between those with a loaded and unloaded reserve.

In a redundant system with general redundancy and unloaded standby, the main circuit with the breaker operates first TO(Fig. 3.34, A), if it fails, it is turned on by a switch instead K i one of the backup circuits. There can be no more such substitutions T;(m+1)-failure leads to failure of the system as a whole.

To simplify the analysis, we consider a system with an exponential distribution law for elements P ij (t) = exp(-λ j t) and switches Pki(t)= exp(- λ ki t). Then the probability of failure-free operation of one circuit from P elements with switch

P i (t) = (3.109)

where λ i = λ j n + λ k - failure rate of the i-th circuit of the redundant system.

The average time to failure of the i-th circuit, taking into account (3.109), will be

T avg. i =

At each interval t i Only one circuit works and can fail, so the average time to failure of the entire system will be

Tcp. o = T cp . i(m+1). (3.110)

Probability of failure-free operation of a redundant ES with an unloaded reserve over time t can be determined on the assumption that if one circuit fails, there is an instantaneous switch to one of the backup circuits, and the system failure will occur after the failure of the main circuit and all T backup circuits. Then the probability that one chain of P elements and switch TO, having a failure rate λ i over time t will fail ztimes (taking into account the possibility of its replacement by backup ones), can be determined by Poisson’s law

P z (t) = (λ i t) z /z! exp(-λ i t), (3.111)

Where λ i t- average number of circuit failures over time t.

Entire system redundant over time t will work without failure if during this time at least one of the following incompatible events occurs: C o - all circuits of the system worked without failure, C 1 - one circuit has failed, C z - refused z chains of (t+1); S t - refused T chains of (m+1).

Thus, the probability of failure-free operation of the entire redundant system will be determined according to the theorem of addition of probabilities of the complete group of incompatible events C, taking into account (3.111)

Р с.о (t) = (3.112)

From a comparison of the obtained formulas (3.110) and (3.112) with the corresponding formulas for a loaded reserve, it follows that with an unloaded reserve, the probability of failure-free operation and the average time to failure increase.

At the same time, it is almost impossible to achieve an increase in mean time to failure by more than an order of magnitude due to such redundancy due to the presence of switching devices and auxiliary equipment. With an increase in the number of redundant elements (units, systems), the weight, dimensions and cost of auxiliary equipment significantly limit the achievable level of reliability during redundancy, allowing in practice to use redundancy with m ≤ 2 ... 3.

If the ES consists of groups of identical elements, then it is advisable to use sliding reservation by replacement, when one or more reserve elements (blocks) T systems can replace any of the failed main elements (blocks) of the system (Fig. 3.35).

Rice. 3.35. Rolling reservation scheme

If sliding redundancy is with an unloaded reserve, failures of elements are independent and have an exponential distribution, the device for searching for a failed element and switching on a backup element instead (switch) is absolutely reliable, then the probability of failure-free operation of the system during time t, i.e. the probability of failure during this time no more T elements, is determined according to Poisson’s law similarly to (3.112)

P c . c(t)= (3.113)

Where λ e - element failure rate.

The average time to failure of the system, i.e. the mathematical expectation of the time of occurrence of the (m+1)th failure is determined in the usual way:

T av.s =1/(pλ e)+t/(pλ e) = (t+1)(pλ e).(3.114)

The efficiency of sliding redundancy of an electrical system can be assessed by comparing dependencies (3.113) and (3.114) for a system with sliding redundancy with the corresponding dependencies P c = exp (- nλ e t) And T av =1/(nλ e) for non-redundant system

(t) = P c . c (t)/P c (t) = 1+ nλ e t + (nλ e t) 2 /2! + . . .+ (nλ e t) m /m!;

(t) = T cp . c /T cp = (m+1).(3.115)

From (3.115) it follows that from the point of view of increasing the probability of failure-free operation and the mean time to failure of an ES, the efficiency of sliding redundancy compared to the corresponding non-redundant system increases with the increase in the number of reserve elements, the increase in the operating time of the system and the number of reserved main elements (blocks) of the system.

Rolling redundancy can be more economically advantageous, since it is implemented with a smaller number of reserve elements than the main ones.

Optimal redundancy. In the practical implementation of ES redundancy, the problem of optimal redundancy arises, i.e., ensuring the required system reliability at the lowest cost.

The number and nomenclature of reserve elements (blocks) of the ES can be determined based on the following two formulations of the optimal reservation problem:

1) a given probability of failure-free operation of the system must be ensured at minimal cost With mi n to reserve elements, i.e. at C min ;

2) for given costs for backup elements, it is necessary to ensure the maximum possible probability of failure-free operation of the system R s. m ah, i.e. at R s. m ah.

To solve both problems, first determine the number of system redundancy elements (sections), calculate the probabilities of failure-free operation of each section and the system as a whole, and determine the cost of each section.

Then, to solve the first problem, the minimum of the function C = must be found given that P s = Where WITH - cost of a redundant system, C i - the cost of one reserve element of the i-th section of the system; C 0 i - initial cost of the i-th section of the system; m i - number of reserve elements in the i-th section; P i (m i) - the probability of failure-free operation of the i-th section of the system if it has m i -reserve elements.

The solution to the second problem of optimal redundancy comes down to finding the maximum of the function Р с = under the condition C =

Calculation of the optimal redundant ES is a multi-step process. At the first step, we find such a reservation section, adding one reserve section to which gives the greatest increase in the probability of failure-free operation of the system in terms of unit cost. At the second step, the next section is determined (including the previously reserved section), adding one reserve section to which gives the greatest increase in the probability of failure-free operation of the system, etc. Calculations are performed in tabular form; the calculation stops at this step

M = , when the condition for the first task is met P c (M-1)< (М), а для второй задачи - С(М)

The classification of existing reservation methods is presented in Fig.

Reservation

Above we described the essence of types of redundancy. Note that currently structural redundancy is most widespread in technical systems.

The essence of structural redundancy is that one or more additional (backup) elements are attached to the main element (i.e., the minimum required to perform specified functions), designed to ensure the operability of the object in the event of a failure of the main element).

Based on the volume of reservation, the following types are distinguished:

- general, providing for reservation of the entire object
- separate, in which individual elements or their groups are reserved
- mixed, combining different types of reservation.

A reserve, just like technical systems, can be restored or non-recoverable. The first of these is used on maintained systems, and its recovery strategy is built in such a way that the security of the system does not decrease below a given level. On maintained systems (non-returnable spacecraft, automatic weather stations, etc.), the reserve, as a rule, is completely used and cannot be restored.

Redundant elements can be in different modes:

Loaded, lightened and unloaded.

In unloaded mode, the redundant elements are in the same state as the main element, that is, all elements operate simultaneously under the same conditions.

Light reserve mode means that the load of the reserve elements is less than that of the main element.

An unloaded reserve boils down to a situation in which the redundant elements have no load until the main element fails.

According to the nature of the connection, they are distinguished:

- permanent reservation, in which backup elements participate in the functioning of the facility along with the main ones:
- replacement, when the function of the main element is transferred to the backup one only after the failure of the main one
- sliding, in which any failed element can be replaced by a backup one.

USSR STATE COMMITTEE ON STANDARDS
(Gosstandart of the USSR)

ALL-UNION RESEARCH INSTITUTE
ON NORMALIZATION IN MECHANICAL ENGINEERING
(VNIINMASH)

Approved

By order of VNIINMASH

No. 260 of September 22, 1988

Reliability in technology

Selecting reservation methods and methods

R 50-54-82-88

These recommendations (R) apply to technical devices (products) manufactured by various industries and having increased reliability requirements, which cannot be ensured only by choosing highly reliable elements.

R establish general principles and a unified methodology for selecting backup methods and methods, with the exception of issues of the formation and use of spare parts and accessories. Intended for use in the design process of technical devices and in the development of industry normative and technical documents. Designed for employees of enterprise reliability services and development engineers who know the basics of reliability theory.

1 . BASIC POINTS

1.1. Redundancy is a method of ensuring reliability, consisting in the use of additional means and capabilities in order to maintain the operability of an object in the event of failure of one or more of its elements or disruption of connections between them. Most often, redundancy is used in cases where other methods (reducing the failure rate of elements, improving maintainability) are insufficient or cannot be fully used due to limitations arising during the design and operation of systems.

1.2. The basis of redundancy is the introduction of redundancy: additional elements, time, information, product reserves, productivity reserves, algorithmic flexibility, etc. In this regard, according to the source and physical nature, the following types of redundancy can be distinguished: structural, temporary, functional, informational, load, algorithmic , software, mode. The introduction of redundancy does not create reserve and does not necessarily lead to increased reliability. In order for the introduction of redundancy to lead to redundancy, a number of additional conditions and technical measures must be met:

monitoring the performance and technical condition of equipment and equipment; installation of transfer switches that meet certain requirements for response time and reliability; dynamic redistribution of the functional load of elements when the structure of the system changes, ensuring the possibility of parallelizing work in systems with a parallel structure; inclusion in the systems of algorithms and reconfiguration tools (restructuring the structure), which make it possible to organize efficient resources to complete the task.

1.3. Redundancy in all systems is associated with an increase in the total flow of failures. By increasing the standardized reliability indicator, it leads to an increase not only in the cost of the product, overall weight characteristics, energy consumption and some other characteristics, but also to an increase in operating costs and consumption of spare parts, and an increase in maintenance and repair personnel. Therefore, redundancy should be considered as a necessary means of increasing reliability when other possibilities have already been exhausted and do not provide the required level of reliability.

In systems where, according to the conditions of application, reliability requirements may change during the period of operation depending on the type of tasks being solved, it is recommended to use an operating mode with a variable depth of redundancy. This allows for more efficient use of excess resources and improves the technical and economic performance of the system.

1.4. For each type of equipment, the possibilities of redundancy as a means of increasing reliability are determined to a large extent by the technical feasibility of redundancy methods. Therefore, when designing, only such redundancy methods should be used, the technical feasibility of which is ensured by known circuitry and technological solutions or can be confirmed by development work within an acceptable time frame.

1.5. Failure of a redundant system is an event consisting of a violation of at least one of the established requirements for the output characteristics of the system (performance, accuracy, reliability, material intensity, energy intensity, etc.). Under certain conditions, when it is possible to identify the minimum values of various resources necessary for the system to perform a specified task, a failure of a redundant system can be defined as an event consisting of a violation of the requirements for the value and state of all necessary resources. The occurrence of a failure is recorded using criteria, which are deterministic rules for deciding whether the system state belongs to the class of operable or inoperable states.

1.6. The main criterion for the failure of a redundant system is a functional sign, with the help of which the boundary of the area in the space of the output characteristics of the system is determined, the intersection of which is considered as a system failure.

1.7. In complex systems that have several operating modes and a number of functions performed, it is possible to form several functional failure criteria - failure when performing each function. By grouping failure criteria for each function, functional failure criteria are formed for any set of functions. In a complex system, several levels of functioning can be distinguished, each of which corresponds to a functional criterion.

1.8. Based on the functional criterion, a structural failure criterion is formed, which determines which state of the set of technical means the system failure corresponds to. If such a criterion can be formed, then the set of operable and inoperable states can be described in the form of a structural-reliability diagram or a logical function of the operability (inoperability) of the system.

1.9. For systems with several types of redundancy, it is not always possible to formulate a structural criterion that is adequate to the functional criterion, since the state of the system’s operability is determined not only by the totality of the states of its elements. In this case, it is necessary to develop a technical failure criterion, which, in addition to the state of the elements, includes the values of product reserves and productivity reserves, the permissible time spent in a partially operational state, and the state of the maintenance system.

Established trouble-free operating time t y;

Probability of failure-free operation P(t) during a given operating time;

System availability factor K g;

Technical utilization coefficient Kti;

Operational readiness coefficient K og (t);

Efficiency retention coefficient K e.

In a redundant system, there are many operational states, of which one is fully operational. It occurs when all elements are operational and all additional resources allocated for redundancy are at the level of standard values, characterized by vector parameter A. Other operational states arise when some elements fail or resources decrease below standard values.

An operational state in which the current parameter values are at such a level that the failure of one element can lead to system failure is called a pre-failure state. In the sequence of states of a redundant system, between a fully operational state and a pre-failure state, there is usually one or more intermediate states. The number of element failures that brings the system from a fully operational state to a pre-failure state is an important characteristic of the degree of redundancy in the system. In general, this number varies depending on the sequence of element failures and on what part of the system they occur. The minimum number of failures corresponding to the most unfortunate combination of element failures can be used not only as a characteristic of the redundancy level, but also as a deterministic indicator of reliability, called d - reliability:

where d i is the number of failed elements during the transition from a fully operational to a pre-failure state along the i-th path.

The level of redundancy is also characterized by the maximum number of element failures at which a system failure does not yet occur. This number can be used as a deterministic indicator of reliability, called m - reliability:

where m i is the number of element failures upon transition to a pre-failure state along the i-th path. Note that the path here may contain several pre-failure states.

Comparison of m and d allows us to evaluate the flexibility properties of resources used to improve reliability. If there is a large difference between these numbers, the maneuverability of resources is low, and if the difference is small, it is high. When m = d, maneuverability is absolute.

1.11. The reliability indicator used for non-redundant systems - mean time to failure T cf - can also be calculated for a redundant system. However, this indicator poorly reflects the basic properties of the latter, since it characterizes the behavior of the system over the entire operating interval, when the probability of failure-free operation differs from zero. For highly reliable systems, such as usually redundant systems, this interval is quite large and significantly exceeds the standard operating time. This means that T cf also determines the interval at which the system no longer operates and where, due to the gradual decrease in redundancy and degradation of the system, reliability decreases and may be lower than the level of reliability of a non-redundant system. Therefore, the efficiency of redundancy, assessed by the increment of average operating time, turns out, as a rule, to be significantly lower than when assessed by the degree of reduction in the probability of failure. For this reason, mean time to failure is not recommended as an indicator of the reliability of a redundant system. Instead of the average operating time, the conditional average time to failure is used if the operating time does not exceed the operating interval.

1.12. The efficiency retention coefficient expresses the relative decrease in a certain efficiency indicator (productivity, throughput, power, quantity of manufactured products) due to failures of system elements. The peculiarity of K e as an indicator of reliability is that to calculate it it is not necessary to introduce the concept and criteria for system failure. Therefore, K e is used when assessing the reliability of complex systems in which it is not possible to divide all states into two classes (workable and inoperable) and which have several levels of efficiency. However, it can also be used in systems where the concept and criteria of failure are formulated if the operational states differ in the values of the efficiency indicator. If they are the same, then the coefficient of efficiency conservation quantitatively coincides with the coefficient of technical use.

1.13. When calculating the established failure-free operating time t y, the probability of its provision is determined as the probability of failure-free operation during t y.

2 . CLASSIFICATION OF RESERVATION TYPES

2.1. Regardless of the purpose and field of technology, five types of redundancy should be distinguished: structural, temporary, functional, informational, load. According to these types of redundancy, five types of redundancy are distinguished. To these should be added algorithmic and semantic redundancy, which can be considered as types of functional and information redundancy, respectively. However, they have certain specifics and can be considered separately.

2.2. Structural redundancy is carried out by introducing into the structure of technical means additional (backup) elements capable of performing the functions of the main elements in the event of their failure. Removing these elements from the system while the main ones are in working condition does not impair the system’s ability to perform the required functions in the given modes and conditions of use.

2.3. Functional redundancy takes place in multifunctional systems in which individual elements or groups of elements have the ability to take over the functions of other failed elements for the duration of their restoration without significantly reducing the technical and economic performance of the system. With functional redundancy, unlike structural redundancy, there are no reserve elements, i.e. such elements that can be permanently removed without violating the requirements for the technical characteristics of the system.

Functional redundancy is provided by:

Establishing additional connections between elements;

Flexibility and efficiency of reconfiguring multifunctional elements to perform a given function;

Changing the operating mode.

2.4. Temporary reservation consists of creating some additional time for individual elements, groups of elements or the system as a whole, which can be used to restore technical characteristics without violating the requirements for the output parameters of the system.

Temporary reservation is provided:

Creating a performance reserve by increasing the speed (throughput) of elements;

Creating a productivity reserve by parallel inclusion of devices of the same purpose;

Creation of product inventories in intermediate or output storage;

Reducing the rate of development of adverse consequences of failures and the rate of deterioration of system output parameters.

2.5. Information backup consists of the formation of several semantically adequate sources of information or copies of information arrays, the introduction of additional information intended to restore the main one in case of its distortion.

Information backup is provided by:

Noise-resistant coding of information;

Data duplication on different devices;

Correlation of physical field measurement data;

Using data that satisfies invariant relations;

Using redundancy in algorithmic or natural language.

2.6. Load backup consists of ensuring reserves of performance when exposed to various loads (electrical, mechanical, thermal, etc.) during operation. Load redundancy is provided by:

Creating a safety margin in order to protect against increased shock and vibration loads;

Use of elements with increased permissible electrical power dissipation;

Use of heat-resistant materials;

Reducing the utilization rate of the product with useful work.

2.7. The main characteristics of the types of reservation, which determine the size of the input resources and the rules for their use, are:

Reservation frequency;

Area of use of reserve resources;

Reservation discipline;

Discipline of resource recovery;

The number of levels of the reservation hierarchy.

2.8. The redundancy ratio is defined as the ratio of the number of reserve resources to the number of main resources. The multiplicity of structural redundancy is presented as an irreducible fraction in which the numerator is the number of reserve elements, and the denominator is the number of main elements. The multiplicity of functional redundancy is determined by the number of different ways in which a given function can be performed. The time reservation ratio is defined as the ratio of the reserve time to the main task execution time. The multiplicity of information redundancy during noise-resistant coding coincides with the relative redundancy of the code; when encoding arrays, it coincides with the number of backup copies, and in the general case, the multiplicity is defined as the ratio of the number of units of backup and main information. The load redundancy factor is defined as the ratio of the performance reserve for a given type of load to the rated load value, measured in the same units.

2.9. According to the area of use of reserve resources, there are general, group and element-by-element reservations. The general reserve is capable of fending off failures in any of the system elements. Group reserve prevents failures only in elements of a given group and cannot be used in case of failures of elements outside this group. Element-by-element reserve is intended to prevent failures of only elements of a given type. Each of these reservation methods can be characterized by the reservation ratio.

2.10. The redundancy discipline establishes the procedure for using redundant resources that are introduced into the system to implement various redundancy methods, and depends on what types and methods of redundancy are implemented in the system and in what mode the system operates at the time the failure occurs. With structural redundancy, element-by-element reserves are usually used first, then group reserves, and lastly, general reserves. With structural and time reservation in some modes, the structural reserve is first used, and then the time reserve. In other operating modes, the order of using the reserve may be reversed; the functional reserve is usually used after the structural reserve has been exhausted, since the transition to another method of performing a function is often associated with a slight decrease in the quality of functioning. Since the achieved reliability of a redundant system depends on the redundancy discipline, it is necessary to search for the optimal redundancy discipline.

2.11. The discipline of resource recovery determines the order of maintenance, the discipline of technical and information recovery, replenishment of product stocks, working capacity reserves and time reserves. The recovery discipline should define:

The moment of the beginning of recovery;

Changing the operating mode of the system during recovery;

Source of replenishment of resources;

The order of work to restore resources;

The procedure for returning hardware, software and information to the system after their restoration is completed;

Standard values of resources, upon reaching which the restoration process stops or the mode of operation of the main system and maintenance system changes;

Maintenance and restoration strategy.

2.12. The hierarchy of redundancy means is created in accordance with the hierarchy of technical means. In this regard, several levels of the reservation hierarchy can be distinguished:

Elemental level (I);

Level of modules and nodes (II);

Device level (III);

Subsystem level (IV);

System level (V);

Based on the functional principle, the following levels of the reservation hierarchy can be distinguished:

Microoperation level (I);

Operation parts level (II);

Operations level (III);

Subtask level (IV);

Task level (V);

Function level (VI);

Level of multifunctional tasks (VII).

According to the method of implementing redundancy, there are three levels of hierarchy:

Technological (I);

Constructive (II);

Functional (III).

The number of hierarchy levels is a classification and technical characteristic of backup tools.

3 . SELECTING A RESERVATION TYPE

3.1. The choice of reservation type is determined by:

Conditions for using the system;

Limitations on the total costs of means of increasing reliability;

Limitations due to requirements for other technical characteristics (dimensions, weight, energy consumption, operating costs, maintenance subsystems);

Acceptable deterioration in the quality of functioning and a reduction in the scope of functions performed during system degradation;

Technical feasibility of redundancy methods;

Level of development of monitoring and diagnostic tools;

Maintainability characteristics;

The degree of equipment unification;

The level of production technology and its characteristics (stability, flexibility, accuracy).

3.2. Structural redundancy takes advantage of systems whose application conditions are characterized by the following features:

Short permissible interruption time;

High cost of failure (severe consequences of failure);

It is inadmissible to reduce the quality of operation due to system degradation;

A developed system of hardware monitoring and diagnostics that does not allow significant delays in detecting failures;

Organization of maintenance, in which it is possible to disconnect a failed device, restore it and put it back into operation without interrupting the functioning of the rest of the system.

Structural redundancy methods can be divided into three main groups:

Built-in redundancy with permanent backup;

Built-in redundancy by replacement with automatic or automated switching on of the reserve;

Unloaded redundancy by replacing inoperative elements with functional ones from spare parts.

In the latter case, the multiplicity and method of reservation are determined by the nomenclature and number of spare elements, the structure of spare parts (single, group).

3.3. Functional redundancy is used in cases where structural redundancy is unacceptable due to the large number of equipment or other reasons. It is, as a rule, more economical than structural redundancy, but efficiency is achieved at the expense of some reduction in the quality of function execution, for example, due to deterioration in accuracy, increased execution time of functions, decreased productivity, decreased readability of output results, etc.

Another form of functional redundancy is the complete restoration of the main functions by stopping the execution of secondary functions and transferring the freed resources to perform the main ones.

Features of functional redundancy:

Higher system reliability when using a backup method of performing functions using simplified algorithms;

A developed resource management system and their high mobility, which means that resources can be connected quickly enough and in a variety of configurations to perform basic functions;

A developed performance monitoring system that allows you to reliably assess the technical condition of all resources and promptly supply the resource management system with the necessary information;

The ability to quickly return to the main option for performing functions after restoring the functionality of failed devices;

No devaluing refusals;

The fundamental absence of replication of design errors in the implementation of algorithms for the functioning of devices that back up each other.

3.4. Temporary redundancy as a method of increasing reliability becomes effective and gains an advantage over other types of redundancy in systems with the following features:

The system allows interruptions in operation for a time exceeding the time required to eliminate the failure and its consequences;

The quality of system operation is assessed by integral characteristics over a fairly large period of time (shift, day, week, month, quarter, year);

The system has a finite and relatively low rate of transition from an operational state to an inoperable state in the event of failures of its individual elements;

A system that transmits or processes material, energy or information flows has the ability to accumulate the required quantities of product in intermediate and output storage devices to counter failures and their consequences;

It is not possible to completely eliminate depreciating failures in the system, and therefore part of the operating time requires repetition;

Periods of latent failure occur in the system, requiring repetition of some work after the failure is detected;

The system allows a short-term decrease in performance, compensated by the performance margin;

The system has a cumulative effect that allows, over additional time, to improve the output characteristics (accuracy, reliability, strength, stability, stability) that determine its performance.

3.5. Information redundancy is a specific type of redundancy used in communication, control, measuring, information, computing systems and other systems for collecting and processing information.

It is used in cases where the consequences of loss and distortion of information are severe, and therefore such violations are either unacceptable or should be unlikely. The main conditions and prerequisites for using information backup are:

Insufficient reliability of storage media;

The impossibility of prompt restoration by algorithmic means of information distortions during processing;

Impossibility of renewing information using primary sources;

The system provides the necessary hardware and time resources to implement information backup, and the operating algorithms provide for the use of redundant information.

Information redundancy is usually used in combination with structural, functional and temporary redundancy, since storing copies of information arrays and additional information during error-resistant coding requires additional storage capacity and additional equipment for processing information, and additional time is required to read copies and operate information recovery tools . A common method of information redundancy is the installation of additional sensors in the measurement field, which allows the simultaneous use of functional redundancy (first form).

3.6. Load redundancy is used in cases where the product is maintenance-free or when eliminating a failure requires a lot of time and high operating costs. At the same time, the use of structural redundancy is difficult or impossible for technical or economic reasons. Load redundancy can also be used when structural redundancy is not effective and to increase its efficiency it is necessary to reduce the failure rate of the product or its redundant part. The main conditions for the successful use of this type of reservation:

Availability of suitable elements that have the required performance margin in various parameters relative to the nominal operating mode of the product;

Acceptability of the degree of increase in other technical and economic characteristics (dimensions, energy consumption, cost, etc.) in relation to the prototype, due to the creation of a performance reserve;

The ability to simultaneously unload all or most elements in order to create an “equally strong” system.

Load backup methods include:

The use of elements with increased permissible power dissipation;

Reducing the packing density of elements to create a favorable thermal regime;

Reducing the speed of movement of mechanical elements to reduce mechanical loads;

Reducing the intensity of input information flows in information systems in order to prevent failures and failures;

Facilitation of technological regimes in technological systems in order to expand the range of operability in case of deviations of technological parameters from nominal values.

Load redundancy is often used in combination with other types of redundancy. The possibility of short-term additional loading allows the use of functional redundancy. By reducing the information load, idle periods can be used as a reserve of time. When unloading power, short-term boosting of the mode is used in order to partially or fully compensate for downtime or deterioration of system output parameters due to failures.

4 . SELECTION OF METHODS AND METHODS OF STRUCTURAL RESERVATION

4.1. Methods and methods of structural redundancy

Depending on the method of connecting the reserve, its state and frequency, structural redundancy can be: general and separate, with a constantly switched-on reserve and by the replacement method, with whole and fractional multiplicity. This classification of methods and methods of structural redundancy is given in the table.

Reliability-functional diagrams (RFD) of structural redundancy of multiplicity m c are shown in Fig. 1 .

In addition to the main types shown in the table and in Fig. 1 , structural reservation can be mixed, sliding and of a special type when the NFS is not reduced to a series-parallel structure.

Mixed redundancy is formed when, to increase the reliability of a complex system, various types and methods of structural redundancy of its individual devices are used.

A sliding reservation is such a reservation when one or more devices can replace any of the failed devices of the main system.

Rice. 1. Reliability-functional diagrams of structural redundancy of multiplicity m c

In the practical implementation of structural redundancy, it is often impossible to implement the NFS shown in Fig. 1 . This is explained by the fact that in a redundant system with a large number of elements, the failure of one of them can lead to a change in the basic parameters of other elements, which leads to a deterioration in the performance of the entire system. In such cases, the failure of several elements in different places of the system can lead to such changes in output characteristics that the system ceases to perform its functions with a given efficiency.

Here, the functioning of the system in terms of its reliability is not reduced to a series-parallel structure.

This most often occurs when backing up electrical and electronic circuits, logic elements, communication systems, and computer networks.

4.2. Methods for increasing the efficiency of redundancy.

One of the main criteria for the effectiveness of redundancy is the gain in reliability. The reliability gain is the ratio of the reliability indicator of a redundant system to the same reliability indicator of a non-redundant system.

Knowing the properties of various methods and methods of structural redundancy, you can qualitatively assess their effectiveness, as well as wisely choose the type of redundancy.

Structural redundancy has a number of properties, the main ones:

With an increase in the redundancy ratio with a constantly switched on reserve, the weight, dimensions and cost of the system grow more quickly than the reliability grows;

Structurally redundant technical devices are aging devices when their failure rate(t) increases over time;

The gain in reliability at?(t) = const decreases over time;

The reliability gain with structural redundancy significantly depends on the type of distribution law of the time until failure of the main and backup devices: the faster the failure rate increases? (t), the smaller the reliability gain;

The failure rate of the redundant system at t = 0 is also zero and over time tends to the failure rate of the non-redundant system;

The backup efficiency of a recoverable system is always higher than that of a non-recoverable system if recovery of failed elements is possible during system operation;

The shorter the recovery time, the higher the backup efficiency, all other things being equal;

The higher the multiplicity of the same type of redundancy, the higher the cost, weight, dimensions of the system, the greater the required volume of spare parts, the cost of operation, as well as the cost of one system failure.

These properties limit the use of redundancy to improve the reliability of complex systems with a long operating time. You can increase the efficiency of redundancy in the following ways.

1. Application of sliding reservation, with a changing structure, with automatic control of the reserve status.

2. Introduction of redundancy with fractional multiplicity to increase the reliability of discrete equipment in the presence of failures.

3. The use of special redundant circuits that allow repair of failed backup devices without shutting down the system.

4. Construction of circuits when the failure of the main or backup elements (devices) does not change or changes the main output characteristics of the system within acceptable limits.

5. Application of systems for continuous and reliable monitoring of the reliability of the system and its devices in order to detect failure and reduce its recovery time.

6. Increasing the maintainability of the system in order to reduce the recovery time of the redundant system.

Sliding redundancy, under certain conditions, can significantly increase the reliability of a complex system with a slight increase in weight, dimensions, and cost. So, for example, general redundancy of multiplicity m c when redundant by the replacement method is equivalent in terms of reliability to sliding redundancy with the number of redundant elements equal to the number of redundant systems; such a significant gain can only be obtained if the main system consists of similar replacement elements.

Reservation with a fractional multiplicity, for example, according to a two out of three scheme, allows you to compare two or three simultaneously obtained measurement or calculation results without a significant loss of time. This makes it possible to significantly increase the reliability of measuring systems and computers in the event of failures in them. Such redundancy can lead to a decrease in reliability from sudden failures such as breakdowns, breaks and short circuits in electrical circuits.

The reliability of structurally redundant systems can be most significantly improved when the system design allows repair of failed devices without shutting down the system. If the repair time is short compared to the average time between failures, then redundancy with recovery allows you to increase the time between failures by hundreds and thousands of times compared to a non-redundant system, even with a redundancy factor of m c = 1, that is, with duplication.

4.3. System reliability models for structural redundancy

Models of reliability of technical systems with structural redundancy are determined mainly by the type of redundancy and the discipline of maintenance.

4.3.1. Reliability models of non-restorable technical systems.

In the absence of repair of failed elements of structurally redundant systems, in a large number of cases the following assumptions will be valid:

There is no aftereffect of element failures;

All elements work simultaneously;

Element failures are independent events.

Under these assumptions, for all methods and methods of structural redundancy shown in Fig. 1 , a model of parallel-series circuits for reliability calculations should be used. Such a model makes it possible to estimate the probability of failure-free operation of a structurally redundant system using well-known theorems of probability theory (addition, multiplication) and the total probability formula.

Through the probability of failure-free operation P(t), you can obtain other reliability indicators using the following formulas:

Time to first failure

Probability of failure

Q(t) = 1 - P(t), (2)

Failure rate (distribution density of time to failure)

F(t) = Q"(t), (3)

Failure rate

This model can also be applied to the case of structurally redundant non-recoverable systems, the functioning of which is not reduced to series-parallel circuits.

4.3.2. Model of reliability of non-repairable technical systems of complex structure.

If the functioning of a structurally redundant system is not reduced to a series-parallel structure, then to assess its reliability it is necessary to compile a matrix of favorable hypotheses and calculate the sum of their probabilities. Computational procedures are simplified if the functioning of the system is described by the functions of logical algebra. The use of logical-probabilistic models makes it possible to formalize computational procedures and significantly simplify them.

The probability of failure-free operation of a system with a complex structure is calculated using the formula

(5)

where P i (t) is the probability of the i-th favorable hypothesis, n is the number of favorable hypotheses.

Other reliability indicators are calculated using the formulas ( 1 ) - (4 ).

4.3.3. Reliability models of restored structurally redundant systems.

The most common model is the queuing type. In this case, the flow of requests for service is formed by systems that failed at a random moment in time, and the servicing authority is a repair shop or maintenance personnel.

In this model, various service disciplines are possible: with direct, reverse and assigned priority. With direct priority, failed devices are serviced in the order they are received for repair; with reverse priority, the device that failed last is serviced first. With assigned priority, the sequence of repair of failed devices is assigned in advance.

The queuing type model allows you to analyze structurally redundant systems with a different number of service bodies. In this case, the system can be easily described by equations such as queuing the functioning of the system for any method and method of redundancy, if the failure and recovery flows are the simplest (Markov model). If the failure flows are not the simplest (semi-Markov model), then reliability analysis is practically possible only for relatively simple cases of redundancy, for example, general redundancy with an integer multiplicity.

When analyzing the reliability of complex highly reliable systems, the mean time between failures usually significantly exceeds the mean time to recovery, that is. If where? - recovery intensity, then maintenance discipline has a minor impact on system reliability.

4.4. Calculation of the reliability of systems with structural redundancy.

4.4.1. Reliability indicators.

Reliability indicators for non-recoverable redundant systems can be:

P(t) - probability of failure-free operation over time;

T 1 - average time between failures (average time to first failure);

F(t) - failure rate (distribution density of time until the first failure);

?(t) - failure rate.

Reliability indicators for restored redundant systems are:

K r (t) is the readiness function (the probability that at the moment of time the system is in good condition);

Availability factor;

T - time between failures;

?(t) - failure flow parameter.

There are clear dependencies between the indicated reliability indicators of both non-recoverable and recoverable systems, although they can be difficult to establish for certain types of redundancy. Therefore, in practice there is no need to calculate the reliability of the system according to all indicators. One or two indicators are enough.

It is most appropriate to assess the reliability of redundant, non-recoverable systems using the probability P(t). This indicator allows you to most fully assess reliability, it is quite clear and relatively easy to calculate for the main methods and methods of redundancy shown in Fig. 1 .

MTBF T 1 should not be used to assess the reliability of redundant systems for the following reasons:

The law of distribution of time until the first failure of a redundant system is multi-parameter; in this case, the mathematical expectation T 1 of a random variable - the time until the first failure - does not fully evaluate the random variable itself;

T 1 is an integral indicator and is calculated by the formula

from which it can be seen that the probability of failure-free operation is integrated along the entire time axis. If the system is designed for short operating time t, then the formula ( 1 ) does not take this into account.

Failure frequency F(t) and failure intensity?(t) are not sufficiently clear; they are not included in other more general indicators of systems, such as efficiency, quality, therefore these indicators are used as auxiliary indicators in reliability calculations.

It is advisable to evaluate the reliability of redundant recoverable systems by the readiness function K r (t) or the availability coefficient K g. The first is used to assess the reliability of redundant systems with a short operating time, the second - with a long operating time. To analyze the reliability of redundant, recoverable systems of long-term use, you can also use mean time between failures.

Complex systems usually operate in different modes. In one mode they may not allow repair, in another they may be repairable. When performing some functions, the system may not be redundant; when performing other functions, it may be structurally redundant. For example, the control system of an aircraft in flight is practically not repairable, but after landing it is completely recoverable. In such cases, reliability analysis should be performed using multiple criteria. For example, in the case of an aircraft control system, by the probability of failure-free operation during the flight time and by the availability factor. Since all reliability indicators have unambiguous dependencies among themselves, there is one among many criteria, the satisfaction of which leads to the provision of all reliability indicators.

In multicriteria systems, it is advisable to use generalized criteria. In the case of an aircraft control system, a general reliability criterion can be the probability that the control system is ready for operation at any arbitrary time t and will not fail during the flight time.

4.4.2. Calculation of the reliability of redundant non-recoverable systems.

Calculation of the reliability of redundant systems shown in Fig. 1 , is carried out according to the following formulas.

a) General redundancy with always-on reserve:

(7)

(8)

where T 0 is the time until the first failure of one non-redundant system;

P(t) is the probability of failure-free operation during time t of one non-redundant system; m - reservation ratio.

b) General reservation by replacement:

(9)

(10)

Where? - failure rate of one non-redundant device.

c) Separate redundancy with always-on reserve:

(11)

where P i (t) is the probability of failure-free operation during time t of one element of the i-th redundant node; m - number of reserved nodes.

d) Separate reservation with replacement:

(12)

The reliability analysis of redundant non-recoverable devices with mixed types of redundancy is also carried out using formulas. For cases when the structural diagram of operation is reduced to a series-parallel formula, they can be obtained using the well-known theorems of addition and multiplication of probabilities and the total probability formula.

If the operation of the system is not reduced to a series-parallel circuit, then the probability of failure-free operation should be calculated using the formula

where P i (t) is the probability of the i-th favorable hypothesis;

N is the number of favorable hypotheses.

To describe the functioning of the system in this case and calculate P(t), it is advisable to use logical-probabilistic methods.

4.4.3. Calculation of the reliability of restored redundant systems.

Calculation formulas for obtaining indicators K g (t), K g and T can be obtained only for simple cases of redundancy with a limited multiplicity m c. In general, a queuing model is used. The calculation method is as follows.

1. A structural diagram of reliability calculation is drawn up. The failure rates and recovery rates of each device are indicated.

2. A graph of system states is constructed taking into account the specified maintenance discipline.

3. A system of differential equations of the queuing type is compiled.

4. The system of equations is solved on a computer using standard programs.

In the case when the number of system states is very large (several hundred or more), the presented method does not allow one to find reliability indicators with the required accuracy. In such cases, you can use one of the following techniques:

a) unification (enlargement) of system states;

b) combining paths of the state graph;

c) shortening the state graph.

These techniques make it possible to assess the reliability of a complex system from above and below.

The following technique may also be effective.

1. A structural diagram of reliability calculation is drawn up.

2. The scheme is divided into separate independent restoration sections.

3. State graphs are constructed for all independent sections.

4. A system of differential equations of the queuing type is compiled for each of the sections.

5. A system of equations is solved on a computer, and the reliability indicators K g (t), K g and T are found for individual independent sections.

6. System reliability indicators are calculated using known reliability indicators of sections using the formulas

(14)

where K g i is the availability factor of the i-th independent section;

T i - time between failures of the i-th independent section;

K is the number of independent sections.

If the system's mean time between failures significantly exceeds the mean recovery time, then . When we can assume that the priority of maintenance has virtually no effect on the reliability of a complex system. It is then reasonable to assume that system maintenance is carried out with reverse priority.

With this service discipline, the functioning of a complex system is described by a tree-type graph, and the solution can be obtained in the form of analytical expressions. For a large number of states, the solution can be obtained by numerical methods using a computer.

4.5. Selection of system structure for given reliability requirements.

When choosing a system structure that satisfies reliability requirements, one should use the principle of equal strength of the system in the sense of its reliability. Based on this principle, equally complex parts of the system should be equally reliable. It follows from this that if a system does not meet reliability requirements, then it is necessary to first improve the reliability of the least reliable parts of the system. In this case, it is necessary to take into account restrictions on the physical feasibility of types and methods of reservation.

The structure of the system obtained in this way will not be optimal in terms of weight, cost, and dimensions. To obtain an optimal structure, it is necessary to formulate and solve an optimization problem. This problem reduces to an optimal reservation problem with restrictions on physical feasibility.

When selecting a redundant system structure during the design process, it is helpful to consider the following guidelines.

1. The gain in system reliability for any type of structural redundancy is higher, the more reliable the devices are redundant. From this main contradiction of structural redundancy it follows that its use is advisable in the case when all other methods of increasing the reliability of elements and devices of a complex system are accepted.

2. The greatest gain in reliability comes from sliding reservation, then separate with replacement, separate with always-on reserve, and finally general, separate and general with replacement. This statement is true without taking into account the physical feasibility of structural redundancy, which requires additional technical devices. For example, sliding redundancy requires similar elements of the main system and continuous monitoring of their status, which can only be ensured with the help of control and communication machines. Such machines can be quite complex and unreliable, and rolling reservations can be less effective than other types.

With separate structural redundancy with always-on reserve, it can be difficult to ensure stability of the system output characteristics. The output characteristics in the event of a failure of the main or reserve elements can change so much that a system failure occurs. All this must be taken into account when choosing the type of reservation. A reasonable choice of the type of structural redundancy can only be made as a result of a comparative analysis of possible options.

3. To increase the reliability of complex non-recoverable systems designed to operate for a short time (several hours), the most effective method of ensuring reliability is redundancy with always-on backup. Moreover, in a large number of cases it is enough to protect the system from only one failure, i.e. apply split duplication. Systems of this type can be aircraft control systems, protection systems, etc.

4. In complex systems with a high failure rate, as well as in various measurement systems, it is useful to use fractional redundancy, usually implemented using matching (“voting”) circuits, to increase reliability.

5. To increase the reliability of restored systems, it is most advisable to have redundancy with the ability to restore failed devices without shutting down the system. At the same time, the most effective way to increase reliability is to reduce the recovery time of failed elements.

5 . SELECTION OF WAYS AND METHODS OF TEMPORARY RESERVATION

5.1. Basic methods and methods of temporary reservation.

5.1.1. Increased system operating time.

The system has a target date T for completing a specific task. The time interval between the moment of receipt of the task t o and the target date T of its completion is the operational time of the system t = T - t o. The excess of operational time t over the minimum required constitutes a non-replenishable time reserve t p = t - t z. If the amount of work during a task can be calculated in advance, then the slack time is a known quantity. If the amount of work is unknown and is a random variable, then the slack time will be a random variable. Increasing operational time improves the probabilistic characteristics of task completion, but reduces actual productivity. The losses associated with this will become the costs of ensuring reliability, which should be compared with the costs of backup equipment for structural redundancy.

5.1.2. Increase in productivity.

If the performance of the system C o is such that the planned amount of work is completed exactly within the allocated operational time t, then there is no reserve time. If you increase productivity by the amount DC = С o - С, then the same volume of the task can be completed in time t з = tC / С o, and then the remaining time t р = t - t з = tDC / С o forms a time reserve. Redundancy costs are associated with incomplete use of nominal capacity and a possible increase in the total flow of element failures.

5.1.3. Multi-channel connection of elements.

Several structural elements of the system, each of which has a capacity C, can be connected in parallel to perform a common task. There are two types of parallel connection. With a backup connection, some of the elements, called the main ones, are included in useful work and provide the system with some C o performance. The other part of the elements, called reserve ones, is intended to maintain the operation of the system and stabilize the nominal performance at the C o level. Reserve elements are included in useful work after the failure of the main elements. Another type of parallel connection is a multi-channel connection, in which all functional elements perform useful work, increasing system performance. The productivity reserve creates a reserve of time. Multi-channel systems include multi-thread automatic lines in mechanical engineering, multi-thread pipeline transport systems in the energy sector, multi-processor computing systems, multi-channel communication systems, multi-channel measuring systems.

The performance of a system with m operational channels is determined by the formula C o = K m mc, where K m is the parallelism coefficient, taking into account the inherent productivity losses for organizing parallel work and the adaptability of the task to parallelization (1 / m ? K m ? 1). The maximum amount of work that can be completed in operational time t is equal to V m = C o t. If the volume of work V< V m , то образуется резерв времени t р = t (1 - V / V m).

If V / V m > (m - 1) K m-1 / mK m, then none of the parallel working elements can be removed from the system for the entire duration of the task, even if the others work flawlessly. This is one of the characteristic features that distinguishes a multi-channel connection from a backup one.

5.1.4. Creation of product inventories in storage units.

In systems whose main performance criterion is the arrival of finished products at the output at a given rhythm, intermediate or output product storage devices can be used to increase reliability. Failures of any devices located between the system input and a storage device containing a supply of products do not lead to a system failure until the reserves in all storage devices between the failed device and the system output are exhausted. If only one drive is installed in the system at the system output, then it creates a time reserve equal to the time interval during which the shortage of output products due to a failure can be compensated for by the reserves in the drive. Such a reserve, in terms of the degree of influence on the reliability of the system, is equivalent to the general non-renewable time reserve discussed in paragraph. 5.1.1 . The stock of products in the intermediate storage creates a reserve for a group of devices located between the storage and the system input. That is why it is called group reserve.

The subsystem located between the input of a sequential system and the nearest drive, between adjacent drives, between the last intermediate drive and the output of the system is called a phase or section of the system. A system containing at least one intermediate storage device is called multiphase (multi-section). Each phase can be single-channel or multi-channel. Variants of structures of multiphase systems are shown in Fig. 2 .

Replenishment occurs in one of three ways:

Due to periodic external deliveries of products;

Due to the input phase performance reserve;

In case of failures of the output phase due to the receipt of products from the input phase.

Reservation costs are associated with the installation of drives, storage of reserves in them, maintenance of drives, and temporary exclusion of products forming the reserve from the production cycle.

Rice. 2. Variants of structures of multiphase systems.

Rice. 3. Recursive construction of structures of multi-channel multi-phase systems

5.1.5. Creation of functional inertia of systems.

Functional inertia lies in the fact that when individual elements fail, the change in the state of the system, determined by the set of output parameters and represented by a point in the multidimensional space of permissible parameter values, the transition to a new stable state does not occur instantly, but at a certain finite speed. If the final state is inoperable, then when transitioning to a new state, the boundary of the operability area is crossed, interpreted as a system failure. The time interval from the moment an element fails to occur until the system fails forms a reserve time. The amount of time reserve can be regulated by technical means, in particular, by suppressing external disturbances leading to changes in the output parameters of the system, using noise-resistant operating algorithms, changing (facilitating) the operating mode, filtering high-frequency components of the system’s movement, turning on corrective actions that reduce the rate of change of parameters or increasing the length of the movement trajectory within the performance area. It follows that creating a time reserve requires certain hardware costs and more flexible control of system operation. These costs can be compared with the costs of other types and methods of reservation. The most effective application of this method is in control systems for continuous technological processes, heat supply systems, thermal stabilization, life support, mechanical systems with gradual parametric failures, etc. The time reserve can be used to eliminate element failure. If a failure is eliminated before the time reserve expires, then it does not turn into a system failure, which ensures sifting the flow of failures and increasing reliability.

5.1.6. Using periods of idleness of the system and its individual devices to restore technical characteristics. Partial system loading is a type of load backup. Periods of idleness are used as a reserve of time to restore operability, perform control procedures, and restore inventories to standard levels. Given a certain pattern of receipt of requests for work, the value of the time reserve also depends on the performance of the system. You can further increase the reserve time by using parallel connections of elements.

5.1.7. Mixed methods. All of the previously listed time reservation methods can be used in various combinations. The most commonly used methods are:

Increasing productivity and creating functional inertia;

Multi-channel connection and creation of product inventories;

Increase productivity and take advantage of idle periods.

In the first and second methods, a combined time reserve with non-replenishable and replenished components is created in the systems. With the third method, a combined time reserve is also created, but the replenished part is used only in pauses between task execution intervals.

5.2. Methods for increasing the efficiency of time reserve.

5.2.1. Improved maintainability. Reducing the recovery time allows you to increase the average number of recoveries performed during the slack time and the probability of recovery during the slack time. Accordingly, the number of parable failures increases and all reliability indicators increase.

5.2.2. Reducing the share of depreciating failures and the value of depreciating operating time.

The depreciation of operating time occurs due to the irreversible adverse consequences of failures, leading to the loss of some quality by the object (item) of processing (defects in the machining of parts, violation of metal smelting technology, undetected distortion of information). Depreciation of work results in the need to repeat all or part of the work. The time required for this is subtracted from the reserve time, reducing the effectiveness of the time reserve. To reduce the proportion of invalidated failures, various means of protection are used: traps for errors in information processing, preventing the uncontrolled spread of the consequences of information distortion; blocking devices that prevent mechanical damage to the processed units; devices for switching operating modes in automated process control systems, transferring the technological process to an acceptable non-emergency state in the event of failures in control equipment. To reduce the amount of depreciated operating time, control points are established in information and computing systems, from which it is possible to repeat the current stage of work. When a task is divided into a large number of stages, the depreciated operating time can decrease several times.

5.2.3. Organization of mutual assistance and interchangeability of channels in a multichannel system.

The simplest form of organizing the work of a multichannel system is to divide the overall task into several autonomous parts performed by separate channels. However, when links fail, the execution of one part of a job can be significantly delayed and delay the completion of the entire job. The time required to complete a task can be reduced by organizing interaction and mutual assistance between channels. Then the channels that completed their part of the task earlier than others can take part in completing another part of the task. With such interchangeability of channels, none of the working channels will be idle until the entire task is completed.

5.2.4. Creation of effective algorithms for parallelizing tasks.

A job executed sequentially on a single-channel system can be executed on a multi-channel system under conditions of temporary redundancy if an efficient parallelization algorithm is found. In the most favorable case, the task execution time is reduced in inverse proportion to the number of channels, regardless of the number of channels. Such a task is called infinitely divisible. In the least favorable case, when it is not possible to create a parallel algorithm, the task execution time does not decrease compared to a single-channel system (the case of an indivisible task). All other cases fall between these extremes. The efficiency of parallelization is assessed by the coefficient K, equal to the ratio of the difference between the task execution times in the least favorable case and when using this algorithm to the difference in the task execution times in the most and least favorable cases. This coefficient varies from 0 to 1. The time reserve is maximum at K = 1.

5.2.5. Organization of effective monitoring and diagnostics.

With limited completeness of control, periods of hidden failure arise, since failures not detected by control means are detected by secondary manifestations with some delay. These periods reduce the time reserve. In addition, operation with failed equipment can lead to depreciation of operating time and an additional reduction in reserve time. Unreliability of control causes false activation of diagnostic procedures and additional consumption of reserve time. On the other hand, an increase in the completeness and reliability of control is achieved by using additional resources and causes a decrease in reliability. Accordingly, the total recovery time and the consumption of reserve time increase on average. To increase the efficiency of temporary reservation, it is necessary to optimize the parameters of control and diagnostic tools. Then the total consumption of reserve time will be minimal.

5.2.6. Improved maintainability.

Recovery time constitutes the bulk of the spent reserve time in many systems. Therefore, improving maintainability is equivalent to increasing slack time. The effectiveness of backup is determined not by the absolute value of the slack time, but by its ratio to the average recovery time.

5.2.7. Using mixed reservations. The main property of redundancy is also observed in systems with a time reserve: the gain in reliability from the introduction of a time reserve is higher, the more reliable the original system is. Therefore, to increase the efficiency of temporary reservation, structural reservation can be used. The total gain in reliability exceeds the product of the gains obtained separately for both types of redundancy.

5.3. Classification of systems with time reserve and reliability calculation schemes.

5.3.1. The main classification scheme contains two groups of classification characteristics: the type of distribution law and structural parameters.

5.3.2. To indicate the type of distribution law, two digits are used, in which the laws of distribution of operating time and recovery time are indicated. To indicate standard distributions, the following notations are used: M - exponential, E - Erlang, N - normal, W - Weibull-Gnedenko, D - degenerate, ?M - hyperexponential, G - pro5.3.3. The number of channels m and the number of phases n are used as structural parameters. In this case, one of these parameters is written in parentheses, thereby indicating which connection (parallel or serial) is external. When writing m(n), the parallel connection is external (Fig. 3 , a), and when writing m(n) - sequential (Fig. 3 , b).

5.3.4. Each channel of each phase, in turn, can have a series-parallel connection of the m 1 (n 1) or (m 1)n 1 type. A joint indication of the system structure and channel structure leads to the entries: m(n(m 1 (n 1))), m(n((m 1)n 1)), (m(m 1 (n 1)))n, (m((m 1)n 1))n (Fig. 3 , c - e).

From Fig. 3 it is clear that the circuit m(n((m 1)n 1)) is equivalent to m(nn 1 (m 1)), and the circuit (m(m 1 (n 1)))n is equivalent to the circuit (mm 1 (n 1) )n. If necessary, recursive complication of the structure is possible.

5.3.5. In each class, five auxiliary characteristics can be additionally indicated, according to which subclasses X 1 X 2 X 3 X 4 X 5 are formed. Digit X 1 means the type of time reserve (0 - non-replenishable, 1 - replenished, 2 - combined, 3 - with complex restrictions), digit X 2 - type of failure according to the consequences (0 - non-depreciating, 1 - depreciating, 2 - both types). In the X 3 category, the presence of other types of redundancy is recorded (0 - no other types of redundancy, 1 - structural redundancy, 2 - other). In the X 4 category, the type of performance monitoring used is indicated (0 - continuous, 1 - periodic, 2 - mixed). Row X 5 reflects the type of system load (0 - continuous, 1 - variable or random). When subclasses are enlarged in some categories, the indifference sign X is set.

Let's consider two examples of notations: MM1(1)(00000) - single-channel single-phase system with exponential distributions of operating time and recovery time, non-replenishable time reserve, non-depreciating failures, without other types of redundancy, with continuous monitoring and continuous loading;

WE m (1)(22111) is a multi-channel single-phase system with Weibull operating time distribution, Erlang recovery time distribution, combined time reserve, two types of failures, periodic monitoring and random loading.

If necessary, in the X 5 category, you can additionally use the indices ij to record the number of stages of the task (or the duration of the stage) and the type of distribution of the volume of the task or stage of the task.

5.4. Initial data for calculating reliability and parameters for selection during system synthesis.

The following information is required as input data for calculating reliability.

arbitrary distribution.

5.4.1. Classification criteria compiled in accordance with the rules of paragraph. 5.3 and having the form GG mn (X 1 X 2 X 3 X 4 X 5).

5.4.2. Reliability and maintainability characteristics. For a system of type MM mn, vectors L and M of failure and recovery rates are indicated, for a system EE mn - two sets of Erlang distribution parameters: (m i, ? i) and (k i, ? i), . Parameters for other distributions are entered similarly.

5.4.3. Channel performance where i is the phase number, j is the channel number. If each channel has a complex structure, then the number of indices and the number of elements of the set increases.

5.4.4. Storage capacity where i is the phase number, j is the channel number. In particular, there can be a storage device of practically unlimited capacity. Then introduce the notation

5.4.5. Initial filling of drives. It's obvious that

5.4.6. The permissible lower value of the performance of each phase is C in. If the phase is multi-channel, then the permissible number of channels m i * is set such that the phase completely loses its functionality when the number of operational channels is less than m i *, despite the presence of a time reserve.

5.4.7. Maximum permissible intensity (speed) of replenishment and consumption of reserves in storage tanks and . A device failure at the drive input cannot be completely counteracted using reserves in the drive if C ij > q ij . Similarly, if the performance of a failed device at the output of the drive is C ij >? ij, then the replenishment of reserves will occur with intensity? ij, not C ij. If then they talk about unlimited storage capacity.

5.4.8. The value of the instantly replenished time reserve In the general case, it is set separately for each channel of each phase, since the conditions under which a technology disruption occurs if restoration of functionality is delayed are taken into account. Thus, the instantly replenished reserve is element-by-element. But in principle it may be that t qij = t q - all values are the same.

5.4.9. The volume of the task V s, determined by the amount of output product that must be produced by the system. In more complex cases, instead of V z, the task volume is specified for each stage V z i or each busy period and each stage V z ij. Based on the given V and performance values, you can determine the minimum time that the system will need to complete the task in a fully operational state. This time is called the job duration. If the task volume is a random variable, then the task volume distribution function D v (V) = P (V 3< V).

5.4.10. General non-replenishable reserve time t p or operational time t. If the task volume is fixed and equal to V 3, then the relation t = t 3 + t p is satisfied. If the task volume is a random variable, then one of the values t p or t is specified, and the other will also be a random variable.

5.4.11. Share of depreciating refusals. This value is set in the general case for each element of the system and is interpreted in such a way that each failure that occurs is likely to be depreciating.

5.4.12. The degree of interchangeability of channels? i, where i is the phase number. With complete interchangeability? i = 1. In the absence of interchangeability? i = 0. In general, 0 ? ? i ?1. At? i< 1 часть остатка задания? i t з i может быть выполнена другими работоспособными каналами, а другая часть (1 - ? i t з i) должна быть выполнена именно отказавшим каналом после восстановления работоспособности.

5.4.13. Control and recovery options:

Completeness of hardware control in the j-th channel of the i-th phase (K = 1, if the failures are non-impairing, K = 2, if depreciating);

Completeness of test software control;

t ij - period between control points (return points);

t kij - time spent on the formation of a control point;

Ij is the duration of the stage controlled by repeated counting;

t rij - testing time during software control.

5.5. Engineering methods for analyzing the reliability of systems with time reserve.

5.5.1. Indicators of reliability of systems with time reserve are probabilistic characteristics of the following random variables:

T o (A) - operating time before system failure;

T(A) - system operating time to failure;

T p (A) - useful time before system failure;

T in (t 3) - time of task completion with duration t 3;

T? (t) total operating time in a given time interval (0, t);

T 1 (t 3) - total downtime before the task is completed.

Here A is a vector of system parameters that determine the value of the time reserve and the conditions for its use and replenishment. The specific content of these parameters is given in paragraph. 5.4 . In systems with non-depreciating failures T(A) = T n (A), and with a constant task duration T in (t 3) = T 1 (t 3) + t 3.

The main indicator of reliability is the probability of failure-free operation

Equivalent ( 16 ) are the definitions:

Considering ( 18 ), the probability of failure-free operation is also called the probability of task completion.

Other reliability indicators:

probability of system failure (probability of task failure, probability of failure of operation)

failure rate

(20)

mean time to failure

(21)

average task completion time

availability factor

where e is the initial inoperative state;

(24)

- acceptable recovery time; T in (e) - recovery time at the initial state e; E 1 - set of inoperative states; operational readiness ratio

where P e (t 3 , t p , A) is the probability of completing the task at the initial state e; E is the set of all states of the system.

5.3.2. Method for calculating the reliability of a multi-channel system with non-depreciating failures. 17 All states of the system are divided into subsets E i , each of which is characterized by performance C i and relative performance where C o is performance in a fully operational state. The functioning process is reduced to a semi-Markov process specified by the set of functions Рij(t) of the probabilities of transitions from state i to state j. The system performs a task of duration t 3 in the presence of a non-replenishable time reserve t p . Introducing into probabilities (

) index of the initial state, we compose a system of integral equations

(26)

Average time to failure is determined from the system of equations

The average task completion time is the sum of the task duration and the average total idle time before the task is completed, determined from the system of equations 25 ) - (27 For m = 1 (

) go into equations for single-channel systems.

5.5.3. Method for calculating the reliability of multi-channel systems with depreciating failures.

The average time to failure and the average downtime before completing a task are found using the formulas:

5.5.4. Method for calculating the reliability of a multi-channel system with a combined reserve time.

The system consists of m channels, each including N elements connected in series. To complete a task of duration t 3 , the system has a non-replenishable time reserve t p = t - t 3 . In addition, does each element in the channel have a failure rate? i , average recovery time and has its own instantly replenished time reserve t qi , . The reliability calculation is carried out in two stages. At the first stage, the probability of failure-free operation of one channel is calculated using the formula

(30)

At the second stage, the equations ( 25 ) - (27 ), where E k = e k, i.e. all sets have one state, and the probability of transitions

Here z is the number of repair teams; Р к j (t) = 0 for j ? k - 1, k + 1. For m = 1, the methods outlined in paragraphs. 5.5.2 - 5.5.4 can be used to calculate the reliability of single-channel systems.

5.5.5. Method for calculating the reliability of two-phase single-channel systems with internal reserves.

The system has a network structure and a built-in storage device for storing product inventories. Inventories are replenished if the quantity produced exceeds the quantity issued, and consumed if there is a shortage of the product produced. The intensity of replenishment and consumption of inventories depends both on the nominal performance of the system and the consumption schedule, and on the state of performance of the system elements. To calculate reliability, a model is created in which the states take into account the operability of the elements and the level of inventory in the drive. We introduce the probabilities P i (t) that at moment t the system is in the i-th state, and the storage is empty or full, and the probability density P i (t, z) that at moment t the system is in the i-th state , and the drive is filled to level z, 0< z < z 0 , z 0 - емкость накопителя. Эти функции находятся из системы уравнении в частных производных, записанной в векторной форме:

(32)

boundary conditions: p (t, z o) = c 1 P (t), p (t, 0) = c 2 P (t);

initial conditions: P (0) = P 0, P (0, z) = P 0 (z)

Elements of vectors A, B, B 1, c 1, c 2 are constant numbers and characterize the reliability of system elements and system performance in various states.

When finding the probability of failure-free operation, the system of equations ( 32 ) is compiled only for operational states. Solution ( 32 ) allows you to find the desired probability:

(33)

where E 01 and E 02 are subsets of operational states with boundary and intermediate filling of the storage device.

When calculating the availability factor of the equation ( 32 ) are made up for all system states, including inoperative ones. At large t, the time derivatives vanish, and in the boundary conditions, the probabilities on the right sides become final probabilities. The resulting system of ordinary differential equations must be solved with respect to the probabilities P i and the probability densities P i (z). Availability factor

(34)

5.6. Calculation ratios for basic redundancy schemes.

5.6.1. Single-channel system with non-depreciating failures and instantly replenished time reserve.

Single-channel system with series connection of elements having a failure rate? i and the distribution of recovery time F in (t), has an instantly replenished time reserve T d, with a given distribution D(t). Then the probability of failure-free operation

(35)

Mean time between failures

(36)

Average recovery time

(37)

Availability factor

5.6.2. Single-channel system with non-depreciating failures and non-renewable time reserve.

A system with a series connection of elements that have characteristics? i and?, to complete a task of duration t 3 has a non-replenishable time reserve t p . Then the probability of completing the task is determined by approximate formulas:

The formulas give a guaranteed accuracy of 10 -4.

Mean time to failure

(40)

Average task completion time

Availability factor

If the elements have different values, then the probability of completing the task is estimated using upper and lower estimates:

1 - solution to the equation x = a (1 - exp (- x)), P 0 (?, ?) is determined by the formula ( 39 ).

5.6.3. Multichannel system with interchangeable channels, non-depreciating failures and non-renewable time reserve.

A non-recoverable system has m channels with failure rates? and performs an infinitely divisible task of duration, having a non-replenishable reserve t 3 . According to the operating conditions, it is allowed to reduce the system performance to zero if the reserve time is not used up. Then the probability of completing the task can be found using the formula:

First part ( 44 ) is convenient for large values of the time reserve, and the second, on the contrary, for small values. Mean time to failure

From the formula it follows that with increasing reserve time, the average operating time increases from 1/m? up to 1/?.

5.6.4. Two-phase system with storage of unlimited capacity and performance reserve.

The system consists of two subsystems that have failure rates? 1 , ? 2 and average recovery times . An unlimited capacity storage device with an operating time distribution function is installed between the subsystems . The productivity of subsystems 1 and 2 is such that there is a certain reserve of productivity of the first subsystem, provided for creating reserves. Then the probability of failure-free operation in a certain time interval (0, t) is calculated:

Mean time to failure

With an increase in reserves, the average operating time increases from 1/? up to 1/? 2n, i.e. the drive prevents failures of the input subsystem.

With limited storage capacity V o = z o min (c 1 , c 2); a = 1; ?n = 0; ?" i = ? i , where?" i is the failure rate of the i-th subsystem during downtime in an operational state, the availability factor is calculated using the formula

As the storage capacity increases, the availability factor increases from the value of K g1, K g2 at z o = 0 to min (K g1, K g2) with infinite storage capacity.

5.7. Properties of temporary reservation.

5.7.1. Time reservation is a universal method of increasing reliability. This follows from the graphs of the dependence of the probability of completing a task of a single-channel system on the given value of the time reserve? = ? t p . By increasing the time reserve, it is possible to provide any required probability value (Fig. 4 , A). Necessary to achieve a given probability? The time reserve can be set according to the graphs in Fig. 4 , b. At? = a t 3 ? 0.6 for probability values? ? 0.995 slack time is several values of the average recovery time. For quickly recoverable systems, it is only a few percent of the main time. For example, probability? = 0.99 is achieved when setting the duration? t 3 = 0.2 and 0.5, if the reserve time is 8 and 4.4% of the main time, when T av / T in = 200, and 1.6 and 0.88%, when T av / T in = 1000.

5.7.2. With a constant multiplicity of time reservation m t = t p / t 3, as the task duration increases, the time reserve also increases. Therefore, the dependence of the probability of completing a task on the duration of the task changes qualitatively. When the probability increases, it still decreases monotonically, approaching zero. But at first it falls, but then, having reached a minimum, it begins to grow, approaching unity. Thus, maintaining the redundancy ratio at a constant level provides a guaranteed value for the probability of task completion (Fig. 4 , V):

This value is achieved when . With others? probability P > P o . Start with some values?, the redundancy ratio required to ensure probability 1 - Q (Fig. 4 , d), weakly depends on?. If the time reserve is created due to the productivity reserve and the reservation multiplicity, then using the above graphs you can establish the required productivity reserve.

Rice. 4. Reliability indicators of single-channel SVR with exponential distributions of operating time and recovery time

7.3. The basic property of redundancy, first discovered in systems with structural reserve, is also observed in systems with time reserve. The greatest gain in reliability from the introduction of time reserve G Q is achieved in highly reliable systems (Fig. 4 , d), i.e. at large? and small ones?. Not enough then. For a fixed reservation multiplicity, the gain increases with increasing?, if (Fig. 4 , e). When the gain is maximum in the vicinity of the point .

5.7.4. The time reserve, equivalent to the structural reserve, is the value of t p e at which for the same values? and t 3 both types of reservation provide the same probabilities of task completion. Calculations show that the reduced value of the time reserve?t p e, equivalent to the total loaded duplication, is not large: with b = ? / ? = 100 and? ? 5 it does not exceed 10 (Fig. 5 , a, c), and the faster the system recovers (the larger b), the greater it is. This does not mean at all that as maintainability improves, the absolute value of the equivalent reserve increases. On the contrary, the larger b, the smaller it is (Fig. 5 , b, d). The multiplicity of the time reserve m t e, equivalent to the structural reserve with a multiplicity of m c = 1, in a significant range? and b turns out to be less than one. With the same multiplicity, namely with m t = m c = 1, temporary reservation is more effective than structural reservation, if? > ?* (b), where?* (50) ? 0.7, ?* (100) ? 0.33, ?*(300) ? 0.14 (Fig. 5 , d, f).

Rice. 5. Time reserve and time reserve ratio equivalent to loaded duplication

5.7.5. One of the important properties of redundancy is the degree of influence of the non-exponential laws of distribution of operating time and recovery time on the reliability indicators of the redundant system. Knowing the degree of this influence allows us to find out: the need to determine the distribution law when collecting statistical data or the ability to limit ourselves to estimating the average value of a random variable; the possibility of equivalent replacement of calculation formulas with simpler ones obtained for exponential distributions; the tendency of changes in reliability indicators during the transition from the running-in section to the normal operation section, and from the latter to the aging section. When approximating the empirical distribution of operating time using the Weibull distribution F (t) = 1 - exp (-(?t) m), the dependence of the probability of completing a task on the form parameter m for small tasks is small and can be ignored. For large tasks (? > 0.4), the differences are more noticeable, but for m< 1 и в этом случае можно пользоваться формулами для экспоненциального распределения, чтобы получить оценку снизу, т.к. ошибка идет в «запас расчета» (рис. 6 , a, b, c, d, fig. 7 , A). The transition to exponential distribution occurs on the basis of equality of probabilities of completing a task in the absence of time reserve: P (t 3, 0, m) = P (t 3, 0.1) with the Weibull distribution and P (t 3, 0, K 1) = P (t 3, 0.1) with gamma distribution of operating time. Hence the equivalent parameter? e = - ln P (t 3, 0, m) / t 3. With this method of calculation? e replacement of a non-exponential distribution with an exponential one does not eliminate the need to estimate the value of the form parameter m or K 1. If the type of distribution law is not known, then the parameter? e is determined on the basis of equality of average operating time, and then? e = 1 / T avg. To evaluate the influence of the shape parameter with such a replacement, it is necessary to express T avg in explicit form through the parameters of the non-exponential distribution. In particular, with a Weibull distribution? e = ? / G (1 + 1/m). Calculations show that when using the equality of average operating time, the dependence on the shape parameter is significant and cannot be neglected even at small t 3 (Fig. 6 , G). The introduction of a time reserve into systems with identical probabilities P (t 3, 0) creates a tendency for the distribution of time to failure to “age” (Fig. 6 , e), and the smaller the shape parameter, the larger it is.

5.7.6. The dependence of the probability of completing a task on the type of recovery law F in (t) is weak if the calculation of equivalent parameters is carried out on the basis of equality of recovery probabilities during the reserve time:

With such a transition to an exponential distribution, the error in the calculation leads to some overestimation of reliability, at least for small ones? (rice. 7 , b). If the law F in (t) is not known and the calculation? e is carried out on the basis of the equality of average recovery times according to the formula, then the influence of the recovery law becomes significant (Fig. 7 , V). The error in the probability of task failure can reach 100% or higher.

Rice. 6. Reliability characteristics of SVR with Weibull operating time distribution

Rice. 7. Characteristics of the reliability of SVR under non-exponential laws of distribution of operating time and recovery time:

a, b, c, d, f - gamma, d - Weibull

5.7.7. The average total downtime before completing a task T10, and therefore the average time for completing a task, depend on the parameter of the form of the operating time distribution (m for the Weibull distribution and K1 for the gamma distribution) (Fig. 7 , Where). Calculation? e, based on the equality of average operating time, gives an error in determining T10, which increases with increasing . The dependence of the average time to failure of a system with a reserve time T av (t p) on the type of recovery law is insignificant and can be completely neglected (Fig. 7 , d).

5.7.8. Structural redundancy stabilizes the actual performance of the system and significantly increases the technical utilization coefficient Kti (?), guaranteed with a given probability. The value K ti (?) = t 3 / t is found by solving the equation

where t = t 3 + t p, and the expression for P is taken from the formula ( 39 ). According to the graphs in Fig. 8 , and for b = ? / ? = 20 and?t = 1 with probability? = 0.9 K ti? 0.87 in the absence of structural redundancy and Kti? 0.985 with total duplication (m c = 1). If?t = 5, then under the same conditions (b = 20, m c = 1, ? = 0.9) K ti? 0.993. With the introduction of a time reserve, the efficiency of structural reserve increases sharply, assessed by the amount of gain in reliability, just as the efficiency of time reserve increases with the introduction of a structural reserve. For example, with b = 20 and?t = 1, loaded duplication gives a gain in reliability in terms of the probability of task failure G Q 1 = 7.7, if there is no time reserve (Fig. 8 , b at? = 1), without a structural reserve, creating a 5% productivity reserve (? = 0.95) gives

gain G Q2 = 1.9. In the presence of both reserves, the gain G Q 3 = 25. This is significantly greater than the product G Q 1? G Q 2 = 14.6.

Rice. 8. Reliability characteristics of a system with structural and time redundancy (- P exact, --- P pr according to the formula ( 39 )).

5.7.9. A multi-channel system with interchangeable channels at small values of the planned load factor K з = t 3 / t is an almost ideally reliable system, since the probability of failure to complete the task Q (t 3, t p)< 0,1. Коэффициент К з, можно трактовать как гамма-процентный коэффициент технического использования, удовлетворяющий соотношению Р (К ти t, (1 - К ти) t, m) = ?. Чем больше число каналов m, тем больше диапазон значений К з, для которых выполняется неравенство Q < 0,01. При?t = 1 и b = 10 оно верно для К з? 0,6 при m = 2 и для К з? 0,75 при m = 6 (рис. 9 , A). In the area of large loads, even a small increase in the task leads to a sharp increase in the probability of its failure. So in a two-channel system, an increase in K from 0.82 to 0.90 increases from 0.1 to 0.3 (Fig. 9 , b). If the operating time of systems with different numbers of channels is the same, then with small Ks, systems with a large number of channels have higher reliability, although they perform a larger amount of work. At large Ks (close to unity), on the contrary, a single-channel system provides a greater probability of completing the task.

5.7.10. The actual quantile performance of the m-channel system according to the probability level is calculated by the formula, where C 0 (m) is the nominal performance of the system during failure-free operation. If the parallelism coefficient K p = 1, then C 0 = mc, and the given actual performance increases almost linearly with the number of channels (Fig. 9 , V). The gamma-percentage technical utilization coefficient K ti (?), equal in this case to the gamma-percentage relative productivity, monotonically increases with m, gradually stabilizing at a level close to the availability factor of one channel K g = 1 / (1 + b), and the faster the larger b (Fig. 9 , G).

Rice. 9. Reliability characteristics of a multi-channel system with non-depreciating failures

5.7.11. A comparison of m-channel and structurally redundant systems with the same number of devices shows that when performing a task of the same size, a multichannel system achieves the probability of task completion provided by a structurally redundant system with a time reservation multiplicity m t = t p / t 3, significantly less than m c = ( m - k) / k, where k is the number of main ones, and m - k is the number of backup devices.

In particular, the dual-channel system achieves the task completion probability provided by the duplicated system (m c = 1) with m t = 0.26 for? = ?t 3 = 0.1 and b = 50 and at m t = 0.08 for ?t 3 = 0.5 and b = 50 (Fig. 9 , d, f).

5.7.12. At low integration costs (the value of the parallelism coefficient Kn is close to unity), a multi-channel system with a flexible structure and interchangeable channels always provides higher reliability indicators than a system with a structural reserve and a time reserve that performs the same task in the same operational time interval, regardless of whether the reserve is loaded (LR) or unloaded (NLR) (Fig. 10 , A). At the same time, multichannel connection alone is not yet sufficient to provide an advantage over structurally redundant systems. If in a multi-channel system there is no interchangeability of channels and all channels perform individual tasks (IT), then it becomes less reliable than a redundant system.

5.7.13. A multichannel system with a rigid structure (RS) is significantly inferior to a system with a flexible structure (GS). This can be seen from a comparison of the graphs (Fig. 10 , b), calculated for three-channel systems without structural reserve and two-channel systems with one device in a loaded reserve.

Rice. 10. Reliability characteristics of multi-channel systems for various methods of organizing the structure (a, b - recoverable, c, d, e - non-recoverable)

5.7.14. The non-exponential distribution of channel time to failure significantly affects the probability of a task being completed by a multi-channel system. This can be observed from the graphs (Fig. 11 , a, b), calculated for two- and ten-channel systems with a gamma distribution of operating time I (k, ?t) and transition to an equivalent exponential distribution based on the equality of average time to failure without time reserve. The relative gamma-percentage productivity values also differ significantly for different distributions of operating time (Fig. 11 , V). If the compared systems have the same probabilities of failure in the absence of a time reserve, then the trend of changes in the probability of task failure with an increase in the form parameter K with the introduction of a time reserve remains the same (the more K, the less Q), but the differences in probability values are significantly less (Fig. 11 , G). Therefore, we can move on to an equivalent exponential model, keeping in mind that for K > 1 such a replacement gives a lower estimate for the probability of completing the task.

5.7.15. The influence of the type of channel recovery time distribution law on the probability of task failure in a wide range of parameter values is small and it decreases significantly with increasing number of channels (Fig. 11 , d, f). Therefore, in reliability assessments, it is quite possible to use the hypothesis about the exponentiality of distributions, even if in fact the distribution is non-exponential.

5.7.16. With a fixed reserve time, an increase in the number of channels leads to a decrease in the average time to failure of the system (Fig. 11 , and). This means that the average total operating time of all channels grows more slowly than the number of channels, due to the formation of a queue for restoration and an increase in the total failure rate of channels. When the reserve time changes, the mean time to failure is determined mainly by the given value and weakly depends on the parameter of the exponential distribution of recovery time (Fig. 11 , h).

Rice. 11. Reliability characteristics of multi-channel systems with non-exponential distributions of operating time and recovery time

5.7.17. The reliability of a multi-channel system is determined by the method of channel grouping. From a total number of N identical devices, K identical groups can be organized, each of which has m parallel channels and n backup devices, so that N = K (m + n). With group reservations, the structural reserve can only be used within a given group. Groups work without mutual assistance, and then each group completes 1/Kth part of the task, or with mutual assistance, and then the groups interact with each other as channels within the group. Such systems are characterized by three structural parameters: m, n, k. In particular, for four devices the following methods of group formation can be proposed (Fig. 12 ): four-channel system with mutual assistance of channels (4, 0) c and without mutual assistance (4, 0) b; four-channel system of two groups of two channels with mutual assistance of channels in the group and without mutual assistance between groups (2, 0, 2) c; three-channel system with mutual assistance of channels and one device in a common reserve (3, 1) into two-channel systems with mutual assistance and a common (2, 2) in or separate reserve (2, 2) VR; without mutual assistance with separate reserve (2, 2) br; single-channel system with a common reserve (1, 3) o. A comparison of these options in the absence of recovery shows (Fig. 13 , a, b, c), that the worst in the entire range of task volumes V = ct" 3 is option (4, 0) b. With a loaded reserve, the best option will be (3, 1) in case of small task volumes and option (2, 2) in case of large. With an unloaded reserve, the best option in the entire range of task volumes is option (1, 3). However, it must be borne in mind that a single-channel system spends the longest time on task execution if all compared systems are allocated the same operational time. , then the best system will be (4, 0) c. In addition, it must be taken into account that in a multi-channel system, part of the productivity is spent on organizing the interaction of channels, and this reduces the efficiency of the multi-channel connection. If these costs are high, then it may be advisable to organize several groups without. mutual assistance or transfer of several channels to a structural reserve. With the introduction of restoration, general patterns when comparing options for constructing systems are preserved, but reliability indicators are significantly improved. Thus, for eight devices, increasing the number of groups in a multi-channel system worsens its reliability (Fig. 13

5.7.18. The introduction of product reserves in two-phase systems with an intermediate storage reduces the system downtime coefficient K pr = 1 - K g due to the reduction of technologically related downtime. For equally reliable phases, the reduction occurs no more than twice, since technologically related downtimes do not exceed the output phase’s own downtimes. With equal phase capacities, the influence of the storage device depends significantly on the parameters b i = ? i/? i and? = b 2 / ? 1 (Fig. 14 ). The marginal gain in reliability from installing a drive is estimated by the value - the ratio of the downtime coefficient values for systems without a drive and with a drive of unlimited capacity. The payoff G to is maximum at? = 1 and increases as b decreases.

Rice. 14. Two-phase system with equal phase capacities

Rice. 14 (continued)

5.7.19. The presence of a productivity reserve at the input phase improves the use of product inventories and reduces technologically related downtime, and with them the system downtime rate (Fig. 15 , A). The less reliable the input phase is, the more significant the reduction (Fig. 15 , b). It follows from the graphs that creating a productivity reserve is always advisable.

5.7.20. If the creation of a performance reserve in the input phase of a two-phase system is accompanied by a decrease in reliability, then it becomes advisable only if the storage capacity is sufficient. For example, with a linear dependence? 1 is it advisable to create a productivity reserve of 10% only when? 2 z 0 > 1.7, i.e. when the reserves in the full accumulator ensure operation of the output phase for a time 1.7 times longer than the average recovery time (Fig. 16 ). There is a range of values of the parameters a and z 0 in which a two-phase system with a storage device has a availability factor less than in a system without a storage device and a performance reserve. In Fig. 16 these values correspond to the sections of the curves above the dotted line.

5.7.21. The probability of failure-free operation of a two-phase system with unlimited storage capacity and failure-free operation of the output phase and small tasks is close to the probability of failure-free operation of a non-redundant system (Fig. 17 ). As the task increases, the accumulated stock of products in the storage unit begins to influence, and the probability decreases more slowly than in a non-redundant system. At the probability function of failure-free operation never becomes smaller, so p 0 is a guaranteed probability regardless of the duration of the task, and it is quite high. For example, with a 10% performance margin and? / ? = 100 probability p 0 = 0.9. With an increase in parameter a, which can be interpreted as the reduced multiplicity of time reservation (item. 5.7.2

Rice. 17. Probability of failure-free operation of a two-phase system with unequal performance

5.7.22. The probability of failure-free operation of a two-phase system, taking into account the output phase at sufficiently large z 0 , a, and t, can be found approximately by the formula , where P 2 (t) is the probability of failure-free operation of the output phase. It follows that under the specified conditions, the installation of a storage device makes it possible to almost completely prevent failures of the first phase.

6 . SELECTION OF WAYS AND METHODS OF FUNCTIONAL RESERVATION

6.1. General provisions.

Functional redundancy is redundancy using functional reserves. With functional redundancy, it is typical to have multifunctional elements in an object such that a partial failure of each of them precludes its use for its main purpose with the performance of the main function, but allows it to be used for another purpose. Another typical case occurs when, if one element fails, its functions are taken over by another, multifunctional element.

When analyzing the possibilities for the manifestation of the effect of functional redundancy, it is necessary to distinguish between two situations.

1. In the event of failures of individual elements, functional redundancy ensures that the functionality of the object remains unchanged.

2. When elements fail, functional redundancy does not fully restore the properties of the object and its functionality becomes narrowed.

In technical systems, the second situation is more common. Functional redundancy can refer to an element, in which case it will be a consequence of its multifunctionality, but it can also refer to an object that includes similar elements. In the second case, functional redundancy is usually combined with other types of redundancy and becomes combined, for example, structural-functional, load-functional, etc. There are several standard redundancy schemes. In one of them, the elements of the system have the following properties: they are interchangeable despite their different functions in specific places, and any connections that seem appropriate or necessary can be established between them at will. If one of the elements fails, the remaining ones are connected in such a way that it makes it possible to satisfy all the requirements for the system. This order of interaction and rearrangement of elements can be considered as a certain formal model that corresponds to the real behavior of systems. Such models are suitable for describing the reliability properties of biological objects or teams of workers with many specialties. A similar model is built for technical systems consisting of blocks made up of many elements. If individual elements fail, the remaining elements can be exchanged between blocks to ensure the functioning of the system. In this case, the number of blocks is maintained or reduced. In the latter case, failed blocks are removed from the system, and their elements are disassembled into elements that are transferred to other blocks. When implementing such systems, it is necessary to solve a number of related problems associated with diagnosing conditions, changes in the connection of elements, moving elements in space, installing and securing them in new places.

6.2. Engineering methods of functional redundancy.

In most technical systems with functional redundancy, failures of elements cause a narrowing of functionality. Failure of an element puts an object or system into a faulty state, in which operation is allowed for a limited time, since the remaining elements operate with overload, which worsens their reliability and other indicators. Loss of functionality caused by transition to a faulty state is usually not regulated.

Another approach is that in the initial state, in the absence of failed elements, the system implements expanded functionality, which may not be regulated, and in the event of failures, well-defined capabilities are guaranteed, corresponding to regulatory and technical documentation, within a specified time.

Narrowing of functionality in the event of element failures can occur according to the following groups of indicators.

1. According to destination indicators. Failures of elements in a multifunctional (multipurpose) object lead to the impossibility of performing some functions.

2. According to quality indicators. When elements fail, accuracy, speed, and productivity may decrease.

3. According to the range of changes in input parameters: geometric areas, electrical parameters, etc.

4. According to the range of changes in influencing factors: ambient temperature, level of electromagnetic interference, fluctuations in supply voltage.

5. By level of automation. When elements fail, the load on operating and maintenance personnel can significantly increase.

Keeping in mind these directions for changing functionality, we can highlight the following most common options for functional redundancy.

1. Functional redundancy in machines, systems and complexes built on an aggregate-modular or block-modular principle. Technological equipment is built according to this principle, for example, modular machines, auxiliary equipment for production systems; industrial robots, in which modules can be assembled in various combinations, so that the resulting modifications differ in the geometric characteristics of the working area and the number of degrees of mobility; vehicles, in particular cars with various trailers; agricultural machines (tractors with mounted implements or units); computers with several memory blocks and various input/output devices; measuring and computing complexes with a set of measuring transducers, etc. The failure of one module or assembly means that some equipment modifications cannot be assembled, thereby reducing functionality, but the machine, system or complex can still be used for its main purpose.

2. Machines, systems or complexes, in addition to the main components that ensure the performance of basic functions, have various auxiliary subsystems or devices that facilitate setup and adjustment, selection of operating modes, diagnostics of conditions, replacement or repair of failed elements. These include automation subsystems, built-in systems for automatic troubleshooting, control of device operating modes, mode optimizers, search subsystems, etc. With a new development, it happens that the prototype of a machine, system or complex does not have such subsystems, but in general corresponds to its purpose . Complications serve the purpose of relieving the operator or giving him the opportunity to service more equipment. Then the failure of the subsystem reduces the new system to a prototype in terms of functionality, depriving it of the advantages characteristic of the new development.

3. High-level production units (for example, workshops), with good organization of production, have functional redundancy and functional redundancy is implemented in them.

This is expressed in the fact that there is technological equipment that is used only periodically and can be additionally loaded. It is often older, with less functionality, such as conventional universal machines compared to CNC machines. Or, let's say, primitive vehicles, for example, carts as opposed to conveyors or transport robots. A situation is possible when a worker takes the place of a failed robot at a machine. In all of the above examples, normal functioning in the event of equipment failures is ensured by the versatility of a person who takes on management, maintenance or direct production functions.

4. Computer systems can show greater flexibility in the event of failures of elements of the central part of external equipment. Thus, in the event of plotter failures, graphical information is output on an alphanumeric printing device using selected icons with a large discrete step. These images replace graphs in the roughest approximation, but often provide the required clarity. Information can be displayed in the event of a plotter failure in numerical form, but with a significant loss of quality. In the computing process, functional redundancy is realized through algorithmic redundancy with the help of additional branches of algorithms and additional connections between them, by correcting certain types of errors, and algorithmic methods for restoring lost information.

The given examples of the use of functional redundancy in specific classes of technical systems allow us to trace some general trends. The possibilities of functional redundancy are usually higher in high-level systems and complexes of great complexity. For example, in production systems, functional redundancy is more often used at the shop level than at the line or site level. The second feature is that functional redundancy is easier in those systems in which physical movements of elements are not required during failures, and changes in the structure are carried out solely through switching at the signal level. The most typical cases of functional redundancy are associated with the presence of a person in the system - the most flexible functional element of any technical system.

6.3. Problems of a formalized description of systems with functional redundancy.

A typical consequence of element failures is a reduction in the functionality of the system. Quantitative consideration of this factor is the specificity of constructing mathematical models of the reliability of systems with functional redundancy. In this case, two largely independent problems arise. The first task is to describe probabilistically the set of states of the system. When solving it, states are introduced: S 0 - fully operational state, when not a single element has failed; S i - state when the i-th element failed, ; S lj - states in which the l-th and j-th elements failed. The purpose of solving the first problem is to determine the probabilities of the introduced states: P 0 (t), P i (t), P lj (t). The second task is to determine in which of the introduced states the object remains operational due to the presence of functional reserve. Information about this is given by the initial non-formalized description of the functional redundancy capabilities or is obtained by solving the corresponding functional equations, which make it possible to set the values of the system’s output parameters and, with their help, determine the level of its performance. Models of the functioning process after the failure of elements are, as a rule, deterministic and do not contain probabilistic characteristics.

Determining the probabilities of states can be performed by any known methods: enumeration of hypotheses, solving equations of queuing theory, approximation of empirical data, etc. For the restoration of systems, the distribution of stationary probabilities of states p i is of particular interest. They can be calculated by any methods used in the analysis of structurally redundant systems. We consider the set of probabilities as independent characteristics, which can later be used to calculate performance indicators.

LITERATURE

1. Reliability of technical systems: Handbook / Ed. I.A. Ushakova - M.: Radio and Communications, 1985. - 608 p.

2. Reliability and efficiency in technology: Directory. T. 5. Design reliability analysis / Ed. IN AND. Patrusheva. - M.: Mechanical Engineering, 1988. - 316 p.

3. Reliability and efficiency in technology: Directory. T. I. Methodology, organization, terminology. /Ed. A.I. Repairs. - M.: Mechanical Engineering, 1986. - 224 p.

4. Questions of mathematical theory of reliability / Ed. B.V. Gnedenko. - M.: Radio and Communications, 1983. - 376 p.

INFORMATION DATA

DEVELOPED by LPI named after. M.I. Kalinina and VNIINMASH.

PERFORMERS: G.N. Cherkesov, A.M. Polovko, I.B. Chelpanov, A.I. Kubarev, V.L. Arshakuni, Yu.D. Litvinenko.

TOPIC: “Classification of reservation methods”

PLAN:

1. Redundancy and redundancy

2.Classification of reservation methods

In accordance with GOST 27.002-89, redundancy is the use of additional means and (or) capabilities in order to maintain the operable state of an object in the event of failure of one or more of its elements. Thus, redundancy is a method of increasing the reliability of an object by introducing redundancy.

In turn, redundancy is additional means and (or) capabilities beyond the minimum required for an object to perform specified functions. The purpose of introducing redundancy is to ensure the normal functioning of an object after a failure occurs in its elements.

There are various reservation methods. It is advisable to separate them according to the following criteria (Fig. 1): type of redundancy, method of connecting elements, multiplicity of redundancy, method of switching on the reserve, mode of operation of the reserve, restorability of the reserve.

The definition of the main element is not related to the concept of minimality of the main structure of the object, since an element that is the main one in some operating modes can serve as a backup in other conditions.

Redundant element - the main element, in case of failure of which a backup element is provided in the object

Time reservation is associated with the use of time reserves. In this case, it is assumed that the time allocated for the object to perform the necessary work is obviously greater than the minimum required. Time reserves can be created by increasing the productivity of an object, the inertia of its elements, etc.

Information redundancy is redundancy using information redundancy. Examples of information redundancy are multiple transmission of the same message over a communication channel; the use of various codes when transmitting information over communication channels that detect and correct errors that appear as a result of equipment failures and the influence of interference; introduction of redundant information symbols when processing, transmitting and displaying information. Excess information makes it possible to compensate, to one degree or another, for distortions in transmitted information or eliminate them.

Functional redundancy is redundancy in which a given function can be performed in various ways and by technical means. For example, the function of quickly shutting down a water-cooled power reactor can be achieved by inserting safety rods into the core or by injecting a boron solution. Or the function of transmitting information to an automated control system can be performed using radio channels, telegraph, telephone and other means of communication. Therefore, the usual average reliability indicators (mean time between failures, probability of failure-free operation, etc.) become uninformative and insufficiently suitable for use in this case. The most suitable indicators for assessing functional reliability: the probability of performing a given function, the average time to complete a function, the availability rate for performing a given function

Load redundancy is redundancy using load reserves. Load redundancy, first of all, consists of ensuring optimal reserves of the ability of elements to withstand the loads acting on them. With other methods of load backup, it is possible to introduce additional protective or unloading elements

According to the method of inclusion of reserve elements, they distinguish between permanent, dynamic, replacement reservation, sliding and majority reservation. Permanent redundancy is redundancy without restructuring the structure of an object in the event of a failure of its element. For permanent redundancy, it is essential that in the event of a failure of the main element, no special devices are required to activate the backup element, and there is no interruption in operation (Fig. 5.2 and 5.3).

Permanent redundancy in the simplest case is a parallel connection of elements without switching devices.

Dynamic redundancy is a redundancy with restructuring of the structure of an object when a failure of its element occurs. Dynamic reservation has a number of varieties.