618 passion is infinite, Mobye auto sales are rising steadily
First, preliminary preparation
1. Combine upstream and downstream dependencies. If the upstream interface is unavailable, confirm whether there is a corresponding plan for the upstream. If there is a corresponding plan in the downstream when our interface is unavailable.
2, combing the downgrade plan, redis, database, etc., mainly what are the redis clusters used, what are the core redis clusters and databases, this double 11 activities
The first two months of various problems frequently occurred, with the redis side to do cluster expansion, cluster construction independent partition, double 11 current redis is very stable.
3, off-hook pressure test, mainly to test the flow that can be carried by a single machine, theoretically multiplied by the number of containers is that the service can carry the flow, in fact, the whole time
External conditions such as the computer network and the physical machine cpu, memory, and network of the container will change. Therefore, single-machine pressure measurement is necessary, but it still needs to be
Downstream combined pressure test
4, performance optimization, performance problems will be found after pressure measurement, need to be optimized, otherwise the problem may be amplified after expansion, which affects system stability.
5. Traffic estimation, according to the 618 conference and upstream and downstream communication and current traffic estimation traffic.
6, on-line travel pressure test, downgrade plan drill, at this time the pressure test is mainly based on the estimated flow upstream and downstream pressure test, pressure test will be carried out multiple rounds, generally
The first limit pressure measured the maximum limit value, cpu, memory, etc. 90% or tp99 500 or more, the second round, after we downgrade, pressure measurement to see the limit value, the third time
After the pressure test, we resumed the downstream downgrade drill. Record the corresponding pressure measurement information after pressing the side.
7. Capacity expansion, capacity expansion based on flow estimation, combined pressure measurement, and capacity expansion. Each service needs to be pressure-tested according to the traffic conditions.
8. Combined pressure measurement, after the expansion, pressure measurement is also carried out according to the flow rate, and each business is adjusted according to the pressure measurement.
9. Redis stability, service logic, and on-line business review before the event begins. Redis, dependent services, and neighboring large-scale online services are required before the event.
Detailed review, to ensure that you do not introduce problems when you are on the line, the risk of introducing modifications is high.
Second, the event begins
1. Follow-up traffic in the early morning of June 1, online service performance, stability, and availability follow-up. Record indicators such as traffic.
2. On June 17th, the group was on duty, and started to track each hourly flow and system performance and stability indicators at 9:00. And according to the flow of 8:00 10:00 that night, the downgrade plan
Make adjustments and keep in touch with relevant personnel.
3. After 10:00 pm, the corresponding business can be restarted according to the business characteristics and the log operation can be closed, and the best service state will be greeted by 0.
4, the general downgrade is carried out at 23:55, 0:05 informs the online tour to resume, this time 618 did not carry out the downgrade operation.
Third, the activity summary
1. Summarize the corresponding problems in the preparation process and the start of the activity, and arrange the processing according to the priority.
2. Summarize the preparation process as a reference for the next preparation.
3, record the corresponding data, has made a follow-up big promotion, and synchronize everyone know.
Fourth, the 618 corresponding problem has been built a list of items, some items have been communicated, and some have been developed, and some things have been followed up. See attachments
Fifth, the important thing is to prepare for the early stage, as well as the various pressure tests and code reviews, and the important preliminary work, because the people will be nervous when they are promoted.