Setting up Validator Monitoring for Aptos

This is a detailed tutorial on how to set up validator monitoring for the Aptos with Prometheus and Grafana using Ubuntu Linux.

This tutorial is for people who want to quickly set up basic monitoring for their validators. The finer points of monitoring will not be addressed in this guide. The ownership will be on you to make the decision to dive deeper into this subject and research advanced methods.

The first thing that you will need to do is set up a Prometheus server. This will act as the central nervous system for your monitoring set up. Prometheus is a time series database that has very robust data scraping capabilities which will allow you to slurp data off of your nodes in near-real time and then archive it.

Our preference is to provision a dedicated server to run Prometheus, but you can certainly run this along with other programs. It is not recommended that you run Prometheus on the same server as a sentry or validator. You may drop blocks due to system resource competition.

Once you have your server, update it and install fail2ban to get some basic security. There are much better ways to improve server security than just fail2ban, but this is not a Linux security tutorial. Trust us, it’s better than nothing.

As these packages install and update you will occasionally see a purple screen. Just hit the ENTER button.

Once Linux has updated, create a prometheus user which will be used to run Prometheus.

Now do some file system housekeeping and then download and install Prometheus.

Once the download is complete and Prometheus is unpacked, check to make sure that both Prometheus and Promtool are operational. You will see version numbers for both if you successfully completed the previous steps.

One more bit of housekeeping, we are moving some files around like this:

Finally, let’s set up Prometheus as a service so that it runs all of the time!

Some more housekeeping:

Now lets tell systemctl that we added the Prometheus service and then launch it.

If Prometheus is running succesfully you should have a status screen that looks like this. Press CTL+C to exit the systemctl status screen.

Update your firewall, this assumes you are using port 22 for SSH. If you are not using port 22, then change the command below or you will lock yourself out of your server.

Congratulations! You now have a Prometheus server running. We will come back to this for additional configuration. For now, enjoy this moment.

The next step is to fire up a Grafana instance which will enable you to visualize the data within Prometheus from your computer and, more importantly, your smartphone! Contrary to popular belief, validator operators do have lives. Well, at least now you have a chance to have a life. Just make sure that you don’t obsess over your validators performance from your phone all of the time. Talk to other people sometimes, it can be interesting.

We recommend that you launch a dedicated server for Grafana. Ultimately Grafana will expose a webserver to the public internet and you may not want to comingle your Prometheus server. If you can, we recommend that you set up your Prometheus server and Grafana server on the same intranet. That way you can connect them privately without any need to use the public internet.

Anyways, once your server is ready for Grafana go ahead and install fail2ban. Again, this server is open to the internet so consider additional security measures like MFA and no root login.

Now install some dependencies for Grafana

Now update your package repos

Install Grafana and upgrade your packages

Now go ahead and run Grafana as a service. This is much easier than Prometheus:

A successful install will look like this. Press CTL+C to exit the systemctl status screen.

Update your firewall, this assumes you are using port 22 for SSH. If you are not using port 22, then change the command below or you will lock yourself out of your server.

Now open a web browser and navigate to http://your.grafana.ip.address:3000 and you should see the Grafana logo start to bounce as the page loads. Your username is admin and your password is admin. If you don't see the screen below, then your firewall is probably not open on port 3000 or you made a mistake somewhere in the previous steps.

Once you log in, click on the little avatar icon on the bottom left of the screen and then change your password. Please do this. Please.

After you change your password, open a new browser window and navigate here to download a most excellent Grafana dashboard from Rhino Stake. You will want to download the JSON file. Leave a review while you are at it and say thank you!

Go back to the Grafana page and then click on Configuration and then Data Sources

Now click on Add data source then select the Prometheus data source.

Now enter the IP address with port 9090. If you decided to run Prometheus and Grafana on the same server that’s fine. Just remember that we told you not to. Go ahead with http://localhost:9090

If you went the path of having two servers, then enter the Prometheus IP address. For example http://100.200.300.400:9090

Scroll to the bottom and click the Save & test button.

If you entered the correct IP and your Prometheus firewall is open on port 9090 then you will see a connection success indicator.

Ok, that was fun. Now click on Dashboardand then Manage

Now click on Import and then Upload JSON File

Upload the JSON file that you downloaded earlier and then select the Prometheus datasource that you just set up and then click on the Import button.

Congratulations! Now you have a pretty dashboard with absolutely no data!

We have one last bit of configuration to do before this starts to populate with data.

Jump back in to your Prometheus server and edit the prometheus.yml file. Sorry if you are a vim lover, but we use nano here at Artifact. 😜

Paste in the following parameters and the yml file should look like this

The IP address should be for the validator node that you are scraping data from.

Once you correctly paste your job in, press Ctl + X, then the Y key, then the ENTER key.

Restart the Prometheus service and it will start scraping data

Log back in to your Aptos node and install the Prometheus Node Exporter package.

Now, poke a hole in your node’s firewall so that your Prometheus server can scrape the ports.

Go back to your Grafana dashboard and the data will begin to populate within a few minutes.

Congratulations! You now have real time monitoring on your node!

--

--

Artifact Staking is a cutting edge, forward leaning blockchain infrastructure provider.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Artifact Staking

Artifact Staking is a cutting edge, forward leaning blockchain infrastructure provider.