12/31/2023 0 Comments Prometheus statsd exporterThese are request counters and timings from the client side : Load testing this solution, we are able to track 30,000 requests per second with a simple F4s series VM in Azure running StatsD-exporter and Prometheus.įrom our Azure cloud load testing dashboard we can see our 30 000 requests per second. Prometheus has Kubernetes service discovery built in, so if configured correctly you can allow Prometheus to use the Kubernetes API to find your StatsD-exporter pods and start scraping them almost immediately when they becomes available. Running StatsD-exporters as a Kubernetes pod allows you to scale up easily. Prometheus will scrape the StatsD-exporter and make the metrics available in its time series database for reporting in Grafana. The stat would hit any StatsD-exporter and would be made available for scraping. HOST –> <– <– Ī microservice would send its statistics using StatsD client to a single endpoint which is load balanced. This proxy however becomes the new bottleneck so to overcome this you can run a few of these proxies by running one on each host as described in the Anomaly blog post.Ī service would make a call to the StatsD proxy on the same host which would pass on the metric to an external host running a StatsD server. To summarise that post, the StatsD community have built a cluster proxy to overcome the scaling issues by using clever hashrings to ensure metrics go to the same StatsD backend and are aggregated correctly. Folks at Anomaly wrote a great post about three ways to scale StatsD. Our first intentions were to scale the StatsD server, however when you run multiple instances of StatsD, your aggregation will split among these instances and your metrics will become skewed. The first problem we faced with our StatsD server was that it became overloaded because it is a single threaded NodeJS server and aggregation on this server was CPU bound, so metrics were dropped. To operate at a very large transaction volumes, we will need a scalable a metrics system. Timers are important to track how much time incoming requests, internal and external requests take to complete. Counters are important for us to monitor throughput of each microservice, i.e. StatsD is able to track counters, timers and gauges. There are many client libraries so it works across multiple platforms. StatsD is a powerful stats aggregation service that got our attention because it’s very simple to deploy and operate. More importantly and for this post: How do we track each individual request and report on it without adding massive contention to the system. Fanning out connections from internal services to external providers from whom we get hotel content from.Fanning out connections from the edge to different microservices within the platform.Handling high volume of incoming requests and more importantly how to handle socket connections on the platforms edge.We have had to tackle many of them so far, including: There are several challenges with designing this type of platform. In this post, I’d like to do a deep dive into how we designed a metrics platform that is able to scale, aggregate and feed in to a monitoring solution to support reporting on anything in the system that a developer or operator wants to track.Īs our engineers at Webjet build out our distributed services over a global landscape, our recent goal has been to scale these systems linearly so we are able to achieve 30,000 requests per second with latency as low as 2 seconds for each request. There are many challenges in building modern distributed system and monitoring these systems can be particularly tricky. Building a microservice platform that is scalable and can handle hundreds and thousands of requests per second is a difficult task even on modern cloud platforms.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |