As a bridge between developers and operations it is very easy to end up in a support hot line for your colleagues because you created the infrastructure, or you know how to move in between a complex network. This is an important skill today where systems have a very high number of nodes since day one. But it is not something I want to do.

Ten years ago small companies used to have a few servers in a basement and that’s it. In my current company in less than two years we moved three cloud providers and a few regions. We created a VPC peering since day one and so on. The scenario is very different.

It is ok for developers who want to code to feel intimidated or to not care about all of that complexity. Furthermore, it is a skill that needs to be picked up and selected accordingly to goals. I worked with friends who spent their life learning about complex databases and data structure, they do not want to mess up with various cloud providers and their console, they for sure have the skills to pick it up, but probably not the best use of their time and even more they just don’t care, as I don’t care about NextJS (sorry). Same for people like me who spent many years trying to figure out how to properly run any kind of codebase.

The best we can do is to build tooling and figure out how to make them independent. I worked for an early stage startup where infrastructure repo was not even visible, the team manage infrastructure locked all the developers out. Did it work? Probably only because we didn’t have much going on in the SaaS area of product. That was a bit too much.

In another company our engineers relayed a lot on profiling data. Locally they used such mechanism to figure out what was going on, back then I contributed to a tool called profefe that is not well maintained anymore but still out there to do continuous profiling. Now there are better alternatives and SaaS for such purpose but back then it was not available.

I am not speaking about dashboards, logs and so on here, I think we can take that for granted at this point, but logs can be unfamiliar for people that don’t watch them for an hour every day. Google Cloud with Stack Driver has an easy bookmarking system that be used to save useful query, a solution like that is simple but effective. You can do a lot more, for example you can integrate logs split by user in an administrative panel if you have one.

Do you have any fun story related to utilities and troubleshooting techniques you had to facilitate in production?