FIFTY SHADES OF CLOUD COMPUTING.
What they have not told you while building services.
In this article, I will try to present my observations about some “common practices” that are sponsored, backed and “a la mode” on the cloud computing platforms and how harmful they can turn out to be for their clients when it comes to the terms of scalability.
Clouds are not evil by construct, we are responsible for the spider webs which we weave and fall in because of ignorance.
Software as a Service (SaaS) is the holy grail of I.T. industry encompassing a spectrum of start-ups to full scale corporations. The numerous advantages over traditional software delivery methods transformed enterprises and created new ones, making cloud computing the dominant market of the decade.
Cloud management technologies, though many solutions exist, are somewhat getting behind smoke and mirrors. Each cloud platform tends to build a “lingo” and “workflow” of its own by giving different hyped names [1] and sometimes weird procedural flows to the same or very close functionalities.
So to use the cloud platform the client “learns” its flow and jargon or hire an “expert” of that cloud.
This; is an intentional trap for the platform clients, turning the choice of the cloud platform from a moderate service investment into a “marriage”.
The companies who can invest in devops personnel dodge this by internal education or hiring more experts. Nowadays it’s a new trend, bigger companies go “multi-cloud” for more freedom and cost effectiveness[2, 3].
On the other hand, in general, SaaS startups do not have that chance. Because of their limited resources, every single choice on the way can turn into a “startup killer”.
How does this happen? Let’s have a look:
Now let’s assume at the end of a cloud project:
Application is proper, not a minimum viable product (MVP).
There are no internal critical failure points in the code.
Application can handle service replication properly.
Among the others bad enough, the worst crisis comes at the “beautiful moment” that the SaaS application gains popularity. It was a rat race, I know, but the product is finally on the cloud and popular, bam!
The bliss of 1000 hits per second, can suddenly become the nightmare of 100000+ hits per second, the only way to process the SaaS client surge is service replication. Cloud platform, if configured that way, kindly replicate the SaaS services under heavy load, and of course, calculate the bill for it (emphasized very very intentionally).
There comes the point that you pay too much to the cloud platform because of so many replications. Nearly killing all the profit, even the job.
So how to avoid this?
1] A SaaS company, even at startup, must have a cloud expert.
A devops personnel who can speak cloudese language of whichever cloud platform targeted will make a change.
A simple beasty requirement in your operation becoming a carousel of browsing through the cloud manuals, google searches, stackoverflow pages, kubernetes tutorials, deception, coffee, more coffee, and one month later resolving it with no idea how it works is not, I mean it: It is not productive. Trust me on this, I have no conflict of interest, I am definitely not a cloud devops.
That expert will manage to fulfill that requirement from fifteen minutes up to one day. Love them. Really.
2] The four horsemen of the application, should concur :
Engineering Manager,
Project Manager,
Senior developer and
Cloud expert.
In some tight budget schemes mainly at startups, there might be some poor guy who carries more than one of the roles above. Though being a demigod will demand and suck out the power of a demigod, that is possible.
Before a single line of code is written, the company must share the vision of the SaaS product with these people.
It is critical.
What will the product do, expected number of clients, what is doable and what is impossible should be clearly stated.
For the concern of this article, this meeting will follow other mini meetings which will shape the SaaS application’s cloud architecture.
These four people should decide and agree on cloud architecture before proceeding. The information flow among them should be fully transparent and flawless.
The choices made at this stage will determine the fate of the application and success of the company. For instance if the maximum number of expected clients per seconds is high and crucial, it affects the coding language, therefore the recruitment, the code, and the infrastructure interfaces, etc.
3] Serverless is not so…
As of this writing (2021) serverless is not a mature technology. With a funny misnomer, it uses a server at every invocation. It boots the service for requests, then kills the service after all the requests are handled. Each boot consumes time. Not for fast things.
It is considered cost effective for rare used functions coded as microservices. So going serverless for some functionality is ok, but must be chosen wisely.
“Zero boot time” is and will be a white lie.
In the future if it advances into :
a] Predicting the need for a sub serverless request and booting it preemptively via monitoring upper level (i.e: edge) requests,
b] Pulling a cached memory image of a previously booted service from a storage into RAM of a server and jumping into its message loop,
things may change.
4] Scripting a high hit service is like engraving a tombstone.
As long as you do not OWN your cloud;
Never use a script as a high hit service to avoid development costs.
(The rest of this entry is assuming that the service code written in a compiled language is proper. Extreme sloppiness in code is another kind of, erm … success?)
I admit that there exists niche functionalities where a script can be more effective in terms of runtime and cost, like a short or even one liner python service using a compiled C or C++ library. With the exception of these, only low hit services like administrative ones, are candidates to be written in script languages.
Internally after booting, script texts are compiled into pseudo code (p-code). If they need external libraries, they are loaded, bound and ready. These happen once and are not the main culprit (except for serverless systems doing this at every service boot).
The problem is the speed of the Runtime Engine(RE) interpreting the p-code. A RE can not be faster than the machine it is written on.
A RE normally fetches the p-code instruction, collects the data involved and processes it by using a call or jump into the code required, using many many machine code instructions.
A CPU can be considered as a machine code interpreter on steroids, highly optimized with internal mechanisms like code pipelining and branch prediction, embedded into electronic hardware.
Even though the script is pseudo compiled, it will still be slower than machine code, since machine code has the advantage of runtime optimization and less code overhead.
When the hit rates climb, scripted services can not handle the requests as fast as compiled ones, causing higher replication, and higher service costs.
This is valid for all backend, not only for the edge services.
Using script languages, even no-code for prototyping is in fact a good idea. At the end of the sprint or whatever is whipping the team for fast development, stop there. Take the prototype, convert it into a compiled language.
I observe many big corporations, companies choosing script languages like JavaScript with node.js, Python and PHP for heavy duty high hit services.
My bet is:
They will choke. Slow response times, more replication, more cost. Also that is not green, it consumes too much power.
They will refactor high hit service codebase into a compiled language.
If… they find the opportunity. Cloud cost versus refactoring cost and also the fame-shame cycle of slow services will force them to do that. I am already observing the trend in some companies.
5] Take advantage of the cloud.
Oh! Did I mention that replicated or not SaaS services should be coded with subtle but critical differences? They should use the advantages of the cloud platform, or bring some internal mechanisms, for message, event and job queuing, storage and RAM caching when required.
Clouds are not evil by construct as I said before, so they have many services that will support a SaaS application’s needs. Developers should write layered interfaces, instead of writing services already given by the cloud. The layering here is a measure of fast code adaptation, if a cloud platform upgrade or migration becomes inevitable.
6] Architecture? What do you mean?
MVC, for instance, is not exactly an “architecture”, it is a framework or sometimes a library providing a deterministic infrastructure.
Depending on it as an architecture will fail in many ways. Following this example, using MVC as an infrastructure and building a solution specific architecture on it is a way better choice.
For a SaaS application, an architecture fitting the requirements of the solution including tests should be devised. As an example component of architecture, all backend services must have a common communication standard, otherwise translations would be required, consuming critical resources like time and CPU power and easily causing mayhem in code.
While we are talking about standards;
7] Convention over configuration:
As a lifesaving principle, I recommend it. A SaaS application big enough involves distributed development. Conventions accepted among all developers will increase productivity.
To keep everybody on the same page;
8] Document first. A ‘lil bit of bureaucracy:
Communication is essential, UML wizardry is not.
Before writing any service or frontend,
Four types of documentation will keep any SaaS project coherent:
a] Drawing the use case diagram, involving our dear stick-people, that envelopes the flow of information and things to be done with that for the customers or other systems, is recommended. This helps everyone understand what they are doing. Less prone to modification after creation if systems analysis is proper.
b] An application services schema, showing the backend services and their connections to each other and clients. This is the map of navigation for development. In general, this may change radically after the first or second iteration of coding, and then stabilizes.
c] Database and all other public object structures. This will see many updates.
d] The endpoints, the message they receive and how they respond including exception cases must be documented carefully. This documentation will serve backend developers generally.
The edge service documents help frontend developers to be in sync also. If written with enough detail, this really boosts development integrity and speed.
9] Developers should be educated about the cloud:
Being aware of some basic rules of coding for the cloud will help a lot. For instance;
10] Always use async — await for I/O tasks in service code.
That does not improve the speed so much. But since the I/O is waiting in the I/O threads, request threads are given back to the thread pools and more requests can be handled per service instead of simply blocking them, improving the throughput.
11] Think & organize backend microservices as abstraction layers.
This is a fine guide to divide and conquer a SaaS application.
Layered architecture also brings advantages of IoC and SOLID principles and high code reusability, less coding, less cost, in case of refactorings of any scale (comedy — drama — tragedy) in the services.
12] Use asynchronous backend calls whenever possible.
Message queue services (MQ) like rabbitMQ are built for that. Imagine a request only requiring an “Ok I understand” response to the client but a lot of cumbersome operations in the backend, including database writes, notifications etc. Build backend secondary service layers for such jobs and wire it to the message queue. Lets see what happens this way:
Edge starts processing the request in a thread.
Edge publishes the request to MQ for secondary service.
MQ queues the request and responds immediately.
Edge responds “Ok I understand” to client.
Edge thread released for another request.
(So Edge can handle more clients per second.)
Meanwhile ;
MQ sends the request to subscriber secondary service.
Secondary service processes the request which is time consuming.
Happy coding.
Volkan Töre
References:
[1] https://www.cloudhealthtech.com/blog/cloud-comparison-guide-glossary-aws-azure-gcp.
[2] https://en.wikipedia.org/wiki/Multicloud
[3] https://www.hashicorp.com/state-of-the-cloud