Available protocols are better than your own protocols until they are not

Published August 22, 2024

Many as a service software that needs to deal with taxes, paychecks have their own way to receive and send files. Usually you upload them via some sort of forms, and then you can download paychecks them from some webpage.

Everything is cool and sweet when you don’t have many files to download. Today I was downloading two years of pay lisp because I didn’t back them up to the folder where I keep all them since forever and the process was not that nice. I had to remember the one I already downloaded from the list, move in between years and the generated file names were not that helpful.

I am sure they got to this point from a pile of poor decisions but why not exposing such information as remote file system? Furthermore, I mean, an FTP server, Samba whatever I can hook to it with my file manager. With today technology it is not that complicated. I am for sure not a traditional user, and I am not saying that should be the only option, but international hiring facility like remote.com, deel.com I am sure are full of customers that will enjoy a solution like that.

Recently I had to develop a streaming pipeline to feed an AI system, instead of building yet again a web socket endpoint or something on top of gRPC/HTTP2 I onboarded Apache Flight because it gave me the framework I needed. A way to list available streams, a way to delete them and apply actions, put and get new data. An input and output format, a strong set of middlewares and limits to safeguard the underline system, and I can keep going. At the end I just had to tell the consumers that the streaming pipeline was Flight compatible, some wrote their own client because they didn’t like the python one, others included the Rust one and started consuming.

I can come up with examples that are not Apache Arrow related but a few months ago we had to figure out a quick and dirty way to expose small datasets and since the system used Apache Arrow I just had to hook DataFusion, and now I have a lot more than just a way to download a small dataset, we now have an entire SQL ecosystem built on top. Obviously it was easy because we are speaking about a few GBs of data, strongly partitioned but DataFusion provides the various hooks we need to deal with a lot more data if needed.

How many times for data collection an email can be a lot more effective than any form? Hook an email to each user or entities and build around that. Almost like GitHub does when it comes to interact to issues or PR. Yep it is limited, but I use it a lot more than their UI.

I really like to start from protocols that are well known, in this way I don’t need to spend time teaching consumers how to interact with the system. Sometimes there is nothing available or the system grows so much that such protocols cracks but very often those longstanding and well tested solutions are helpful to figure out when you are going out of track building something too complex or not well scoped.

Are you having trouble figuring out your way to building automation, release and troubleshoot your software? Let's get actionables lessons learned straight to you via email.