During the epidemic, most companies began to try online office work, and schools also conducted live teaching. In February, Tencent announced that during the epidemic, Tencent conferences, which can support 300 online meetings, will be opened for free to meet user needs.
With the geometric growth of business, cloud database products that provide solutions for Tencent meetings need to cope with rapidly increasing storage capacity and performance requirements, allowing users to achieve rapid and lossless online expansion without perception, and provide stable and reliable services. Operation and maintenance personnel are also required to always pay attention to the health of the system and quickly respond to various system problems, which can not only realize the refined operation of the business, but also meet the rapid support for the operation and maintenance needs of a large number of databases.
Such a task of "high standards and strict requirements" was completed with the cooperation of Tencent Cloud Redis and the self-developed domestic financial-grade distributed database TDSQL.
In large-scale user scenarios, Tencent Conference chose to use Tencent Cloud Redis as the caching service, and the Redis service of Tencent Cloud cluster architecture. Only a single cluster can provide a maximum of 4TB storage capacity and a concurrent access performance of 100,000-10 million. Guarantee to provide 1ms response delay within 99.99% of the water mark. In the face of massive requests, Redis clusters efficiently completed tens of times the scale of expansion within half an hour. The background processing time of the expansion process of a single cluster did not exceed 30 minutes, while maintaining 100% system availability. Tencent Cloud Redis is a domestic The only Redis database product with lossless capacity expansion. TDSQL escorts Tencent meetings through an automated operation and maintenance system, automatic and smooth switching of failures, and flexible horizontal expansion. This article will explain in detail the technical support of TDSQL behind the Tencent conference.
TDSQL is a financial-grade distributed database product independently developed by Tencent. It has strong consistency, high availability, global deployment architecture, distributed horizontal expansion, high performance, and enterprise-level security. It provides an automated operation management platform "Chitu" and The intelligent DBA diagnostic system "Bianque" helps users to perform database operation and maintenance.
The Tencent meeting completed more than 100,000 cloud hosts and a total of over one million cores in just one week from the resumption of work after the holiday. Resource expansion is being carried out almost every day, with an average daily expansion of nearly 15,000 cloud hosts. As one of Tencent's back-end database solutions for conferences, TDSQL is also facing explosive business read and write tests and needs to be expanded. Expansion means receiving a large number of service requests, which is also essential for the database to provide high-performance capabilities.
A SQL with performance problems may not cause any problems in the early stage, but as business requests grow, these slow SQLs are like a snowball, and eat into the performance of the database a little bit. At the same time, as the scale of the business becomes larger and larger, finding these performance SQLs from hundreds of millions of SQLs is like finding a needle in a haystack.
The TDSQL intelligent DBA diagnostic system "Bianque" solves this problem well. It can automatically grab SQL with performance problems, and perform intelligent analysis to provide index optimization suggestions, so that the performance problems of the database can be killed in the bud in time. After optimization, 99% of SQL has eliminated performance bottlenecks.
The "Bianque" system is a collection of intelligent tools provided by TDSQL, including data collection, real-time detection, automatic processing, performance detection and health assessment, SQL performance analysis, business diagnosis, etc. It adopts modular plug-in technology to seamlessly connect to various databases. With the help of Bian Que, the DBA can be relieved from the daily complicated database operation and maintenance work. The "Chitu" platform provides all the operation and maintenance functions of TDSQL and the display of hundreds of database status monitoring indicators from the perspective of administrators, allowing database administrators to perform more than 90% of their daily operations through the interface, and it is more convenient to locate and troubleshoot problems .
The combination of "Chitu" and "Bianque" not only meets the refined operation and maintenance of high-star business, but also can easily respond to a large number of common database operation and maintenance needs, and better help users reduce operation and maintenance costs.
The geometrical influx of massive users of Tencent conferences has made TDSQL clusters larger and larger, and node failures have become more and more sensitive to the impact of business, which in turn puts an increasingly higher test of TDSQL's disaster tolerance capabilities. TDSQL's consistent switching ensures that the switching can be completed smoothly in the case of several cluster node failures, and the impact on the business is minimized.
Most business systems usually need to use a high-availability solution to ensure uninterrupted operation of the system. As the bottom layer of the software stack, the database provides persistence and access services for data. If the high-availability is done well enough, the high-availability design of the business layer It can be made lighter and simpler.
The core of the data layer of the TDSQL high-availability solution is based on automatic detection logic and Tencent's self-developed strong synchronous replication of raft, which cooperates with the automatic scheduling of resources to realize automatic disaster recovery monitoring and second-level switching, ensuring that the system does not interrupt service and data for 7*24 hours Zero loss and high data consistency.
Each shard of TDSQL supports a high-availability solution based on strong synchronization and strong consistency, and provides 7X24 hours of continuous monitoring of the database and underlying physical devices. When a failure occurs, TDSQL will automatically restart the database and related processes. If the node crashes and cannot be recovered, it will automatically rebuild the node through the backup file.
As an important system foundation support for Tencent conferences, as traffic continues to skyrocket, TDSQL has performed a round of rapid horizontal expansion of database machines after optimization. TDSQL is based on a distributed architecture and a multi-tenant solution, and has inherently good elastic horizontal expansion capabilities. This means that the concurrent performance, processing power, and storage capacity of a database instance can grow linearly.
In the expansion practice of Tencent Conference, through the TDSQL strategy's rich read-write separation technology, the database level quickly responded to the ever-increasing capacity and performance requirements.
In order to separate read requests as much as possible and further reduce the impact on the master node, TDSQL separates pure read-only services through measures such as separation of read and write accounts and read-only instances for disaster recovery, further reducing the pressure on the master node and increasing overall throughput the amount. Finally, 25% of complex queries are sent to read-only instances according to the read-write separation strategy, which quickly reduces the load on the master node.
The read-write separation technology that supports multiple strategies is one of the elastic expansion features of TDSQL's self-developed evolution. TDSQL products support read-write separation by default. Each slave in the architecture can support read-only capability. If multiple slaves are configured , Will be automatically allocated by the SQL Engine cluster (SQL Engine) to low-load slave machines to support the read traffic of large applications. The advantage of TDSQL read-write separation technology is that it provides a variety of read-write separation solutions, and users do not need to pay attention to whether several slaves are completely alive, because the system can automatically schedule according to the strategy.
In addition to the read-write separation technology, TDSQL's flexible horizontal expansion includes multiple features to cope with different scenarios.
Finally, TDSQL has the support of robust distributed transaction capabilities, and has been continuously optimized for performance. As a coordinating node, SQLEngine is stateless and can be horizontally expanded almost unlimitedly. Each SET of the database is used as a data node. On the one hand, it stores normal business data, and also uses hash routing to store all global transaction logs. On the whole, each module can be horizontally expanded, which can meet the almost unlimited requirements of the business layer. The level of storage capacity requirements.
In addition, TDSQL adopts real-time lock diagnosis view measures for the difficult distributed lock problem of distributed transactions, and also has global deadlock detection to eliminate various problems of business layer locks.
From intelligent operation and maintenance, automatic failover, to distributed elastic horizontal expansion, with the support of TDSQL, Tencent Meetings can easily cope with the continuous growth of service requests, continuously provide users with clear and smooth meeting and live broadcast services, and realize Refined business system operation.