Completed guidelines with additional information

2025-10-24 15:19:17 +00:00 · 2024-04-25 12:51:30 +02:00
parent dcfca7b568
commit 1ea1f3c90e
8 changed files with 262 additions and 14 deletions
--- a/assets/eda_problem_statement_1.png
+++ b/assets/eda_problem_statement_1.png
--- a/assets/eda_problem_statement_2.png
+++ b/assets/eda_problem_statement_2.png
--- a/assets/sr_backward_compatibility.png
+++ b/assets/sr_backward_compatibility.png
--- a/assets/sr_forward_compat.png
+++ b/assets/sr_forward_compat.png
--- a/assets/sr_full_compat.png
+++ b/assets/sr_full_compat.png
--- a/asynchronous-api-guidelines/01_introduction/b_basic_concepts.md
+++ b/asynchronous-api-guidelines/01_introduction/b_basic_concepts.md
@@ -4,16 +4,173 @@

 ### Event-driven architectures

-An Event-Driven Architecture (EDA) uses events to trigger and communicate between services and is common in modern applications built with microservices. An event is a change in state, or an update, like adding a shopping item in a cart on an e-commerce website.
+#### What is an event-driven architecture
+
+Event-Driven Architectures (EDAs) are a paradigm that promotes the production, consumption and reaction to events.
+
+This architectural pattern may be applied by the design and implementation of applications and systems that transmit events amongst loosely coupled software components and services. 
+
+An event-driven system typically consists of event emitters (or agents), event consumers (or sinks), and event channels.
+
+- Producers (or publishers) are responsible for detecting, gathering and transferring events
+    - Are not aware of consumers
+    - Are not aware of how the events are consumed
+- Consumers (or subscribers) react to the events as soon as they are produced
+    - The reaction can be self-contained or it can be a composition of processes or components
+- Event channels are conduits in which events are transmited from emitters to consumers
+
+**Note** Producer and Consumer role is not exclusive. In other words, the same client or application can be producer and consumer at the same time.
+
+
+In most cases, EDAs are broker-centric, as seen in the diagram below. 

 ![EDA overview](../../assets/eda_overview.png)

-In most cases, EDAs are broker-centric, as seen in the diagram above. There are some new concepts in that diagram, so let's go through them now.
+#### Problem statement

-In Event-Driven Architecture (EDA), an application must be a producer, a consumer, or both. Applications must also use the protocols the server supports if they wish to connect and exchange messages.
+Typically, the architectural landscape of a big company grows in complexity and as a result of that it is possible to end up with a bunch of direct connections between a myriad of different components or modules.

-### Messages and events
+![EDA overview](../../assets/eda_problem_statement_1.png)
+
+By using streaming patterns, it is possible to get a much cleaner architecture
+
+![EDA overview](../../assets/eda_problem_statement_2.png)
+
+
+It is important to take into account that EDAs are not a silver bullet, and there are situations in which this kind of architectures might not fit very well. 
+
+One example is systems that heavily rely on transactional operations... of course it might be possible to use EDA but most probably the complexity of the resulting architecture would be too high.
+
+Also, it is important to note that it is possible to mix request-driven and event-driven protocols in the same system. For example,
+
+- Online services that interact directly with a user fits better into the synchronous communication but they also can generate events into Kafka.
+- On the other hand, offline services (billing, fulfillment, etc) are typically built purely with events.
+
+#### Kafka as the heard of EDAs
+
+There are several technologies to implement event-driven architectures, but this section is going to focus on the predominant technology on this subject : Apache Kafka.
+
+**Apache Kafka** can be considered as a Streaming Platform which relies on the several concepts:
+
+- Super high-performance, scalable, highly-available cluster of brokers
+    - Availability
+        - Replication of partitions across different brokers
+    - Scalability
+        - Partitions
+        - Ability to rebalance partitions across consumers automatically when adding/removing them
+    - Performance
+        - Partitioned, replayable log (collection of messages appended sequentially to a file)
+        - Data copied directly from disk buffer to network buffer (zero copy) without even being imported to the JVM
+        - Extreme throughput by using the concept of consumer group and
+    - Security
+        - Secure encrypted connections using TLS client certificates
+        - Multi-tenant management through quotas/acls
+    - Client APIs on different programming languages : Go, Scala, Python, REST, JAVA, ...
+    - Stream processing APIs (currently Kafka Streams and ksqlDB)
+    - Ecosystem of connectors to pull/push data from/to Kafka
+    - Clean-up processes for storage optimization
+        - Retention periods
+        - Compacted topics
+
+### Basic terminology
+
+#### Events
+
+An event is both a fact and a notification, something that already happened in the real world.
+
+- No expectation on any future action
+- Includes information about a status change that just happened
+- Travels in one direction and it never expects a response (fire and forget)
+- Very useful when...
+    - Loose coupling is important
+    - When the same piece of information is used by several services
+    - When data needs to be replicated across application
+
+A message in general is any interaction between an emitter and a receiver to exchange information. This implies that any event can be considered a messages but not the other way around.
+
+#### Commands
+
+A command is a special type of message which represents just an action, something that will change the state of a given system.
+
+- Typically synchronous
+- There is a clear expectation about a state change that needs to take place in the future
+- When returning a response indicate completion
+- Optionally they can include a result in the response
+- Very common to see them in orchestration components 
+
+#### Query
+
+It is a special type of message which represents a request to look something up.
+
+- They are always free of side effects (leaves the system unchanged)
+- They always require a response (with the requested data)
+
+#### Coupling
+
+The term coupling can be understood as the impact that a change in one component will have on other components. In the end, it is related to the amount of things that a given component shares with others. The more is shared, the more tight is the coupling.
+
+**Note** A tighter coupling is not necessarily a bad thing, it depends on the situation. It will be necessary to assess the tradeoff between provide as much information as possible and to avoid having to change several components as a result of something changing in other component.
+
+The coupling of a single component is actually a function of these factors:
+
+- Information exposed (Interface surface area)
+- Number of users
+- Operational stability and performance
+- Frequency of change 
+
+Messaging helps bulding loosely coupled services because it moves pure data from a highly coupled location (the source) and puts it into a loosely coupled location (the subscriber). 
+
+Any operations that need to be performed on the data are done in each subscriber and never at the source. This way, messaging technologies (like Kafka) take most of the operational issues off the table.
+
+All business systems in larger organizations need a base level of essential data coupling. In other words, functional couplings are optional, but core data couplings are essential.
+
+#### Bounded context
+
+A bounded context is a small group of services that share the same domain model, are usually deployed together and collaborate closely.
+
+It is possible to put an analogy here with a hierarchic organization inside a company : 
+
+- Different departments are loosely coupled
+- Inside departments there will be a lot more interactions across services and the coupling will be tighter
+
+One of the big ideas of Domain-Driven Design (DDD) was to create boundaries around areas of a business domain and model them separately. So within the same bounded context the domain model is shared and everything is available for everyone there. 
+
+However, different bounded contexts don't share the same model and if they need to interact they will do it through more restricted interfaces.
+
+#### Stream processing
+
+It can be understood as the capability of processing data directly as it is produced or received (hence, in real-time or near to real-time).
+
+[Review]

 A message carries information from one application to another, while an event is a message that provides details of something that has already occurred. One important aspect to note is that depending on the type of information a message contains, it can fall under an event, query, or command.

-Overall, events are messages but not all messages are events.
+Overall, events are messages but not all messages are events.
+
+### Using events in an EDA
+
+There are several ways to use events in a EDA:
+
+- Events as notifications
+- Events to replicate data
+
+
+#### Events as notifications
+
+When a system uses events as notifications it becomes a pluggable system. The producers have no knowledge about the consumers and they don't really care about them, instead every consumer can decide if it is interested in the information included in the event. 
+
+This way, the number of consumers can be increased (or reduced) without changing anything on the producer side.
+
+This pluggability becomes increasily important as systems get more complex.
+
+#### Events to replicate data
+
+When events are used to replicate data across services, they include all the necessary information for the target system to keep it locally so that it can be queried with no external interactions. 
+
+This is usually called event-carried state transfer which in the end is a form of data integration. 
+
+The benefits are similar to the ones implied by the usage of a cache system
+
+- Better isolation and autonomy, as the data stays under service's control
+- Faster data access, as the data is local (particularly important when combining data from different services in different geographies)
+- Offline data availability
--- a/asynchronous-api-guidelines/02_asynchronous_api_guidelines/main.md
+++ b/asynchronous-api-guidelines/02_asynchronous_api_guidelines/main.md
@@ -2,11 +2,13 @@

 ## Asynchronous API guidelines

+This document is biased towards Kafka, which is the technology used in adidas for building Event Driven Architectures.
+
 ### Contract

-Approved API Design, represented by its API Description or schema, **MUST** represent the contract between API stakeholder, implementers, producers and consumers.
+The definition of an asynchronous API **MUST** represent a contract between API owners and the stakeholders. 

-That contract **MUST** contain enough information to use the API (servers, URIs, credentials, contact information, etc).
+That contract **MUST** contain enough information to use the API (servers, URIs, credentials, contact information, etc) and to identify which kind of information is being exchanged there.

 ### API First

@@ -36,11 +38,25 @@ The API types **MUST** adhere to the formats defined below:
 | Country Code | [ISO 3166-1 alpha-2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2) | DE <-> Germany |
 | Currency | [ISO 4217](https://en.wikipedia.org/wiki/ISO_4217) | EUR <-> Euro |

+### Automatic schema registration
+
+Applications **MUST NOT** enable automatic registration of schemas because FDP's operational model for the Schema Registry relies on GitOps (every operation is done through GIT PRs + automated pipelines)
+
 ### Schemas and data evolution

 All asynchronous APIs **SHOULD** leverage Schema Registry to ensure consistency across consumers/producers with regards to message structure and ensuring compatibility across different versions. 

-The default compatibility mode in Schema Registry is FULL_TRANSITIVE. This is the more restrictive compatibility mode, but others are also available.
+The default compatibility mode in Schema Registry is FULL_TRANSITIVE. This is the more restrictive compatibility mode, but others are also available. More on this on the subsection below.
+
+ #### Compatibility modes
+
+Once a given schema is defined, it is unavoidable that the schema evolves with time. Everytime this happens, downstream consumers need to be able to handle data with both old and new schemas seamlessly.
+
+Each new schema version is validated according to the configuration before being created as a new version. Namely, it is checked against the configured compatibility types (see below).
+
+**Important** The mere fact of enabling Schema Registry is not enough to ensure that there are no compatibility issues in a given integration. The right compatibility mode needs also to be selected and enforced.
+
+As a summary, the available compatibility types are listed below:

 | Mode | Description |
 |------|-------------|
@@ -52,20 +68,93 @@ The default compatibility mode in Schema Registry is FULL_TRANSITIVE. This is th
 |FULL_TRANSITIVE|both backward and forward compatibility with all schema versions|
 |NONE|schema compatibility checks are disabled|

-If for any reason you need to use a less strict compatibility mode in a topic, that compatibility mode **SHOULD NOT** be modified on the same topic. Instead, a new topic **SHOULD** be used to avoid unexpected behaviors or broken integrations.
+(info) To help visualizing these concepts, consider the flow of compatibility from the perspective of the consumer

-Applications **MUST NOT** enable automatic registration of schemas because FDP's operational model for the Schema Registry relies on GitOps (every operation is done through GIT PRs + automated pipelines)
+#### Backward compatibility

- Please refer to  [Kafka_Schema_Registry-Default_Requirements](https://confluence.tools.3stripes.net/display/FDP/Kafka_Schema_Registry-Default_Requirements) for more information about Schema Registry.
+There are two variants here:
+
+- BACKWARD - Consumers using a new version (X) of a schema can read data produced by the previous version (X - 1)
+- BACKWARD_TRANSITIVE - Consumers using a new version (X) of a schema can read data produced by any previous version (X - 1, X - 2, ....)
+
+The operations that preserve backward compatibility are:
+
+- Delete fields
+    - Consumers with the newer version will just ignore the non-existing fields
+- Add optional fields (with default values)
+    - Consumers will set the default value for the missing fields in their schema version
+
+![sr_backward](../../assets/sr_backward_compatibility.png)
+
+#### Forward compatibility
+
+Also two variants here:
+
+- FORWARD - Consumers with previous version of the schema (X - 1) can read data produced by Producers with a new schema version (X)
+- FORWARD_TRANSITIVE - Consumers with any previous version of the schema (X - 1, X - 2, ...) can read data produced by Producers with a new schema version (X)
+
+The operations that preserve forward compatibility are:
+
+- Adding a new field
+    - Consumers will ignore the fields that are not defined in their schema version
+- Delete optional fields (with default values)
+    - Consumers will use the default value for the missing fields defined in their schema version
+
+![sr_forward](../../assets/sr_forward_compat.png)
+ 
+#### Full compatibility
+ 
+This is a combination of both compatibility types (backward and forward). It also has 2 variants:
+
+- FULL - Backward and forward compatible between schemas X and X - 1.
+- FULL_TRANSITIVE - Backward and forward compatible between schemas X and all previous ones (X - 1, X - 2, ...)
+
+
+**Important** FULL_TRANSITIVE is the default compatibility mode in FDP, it is set at cluster level and all new schemas will inherit it
+
+This mode is preserved only if using the following operations
+
+- Adding optional fields (with default values)
+- Delete optional fields (with default values)
+
+#### Upgrading process of clients based on compatibility 
+
+Depending on the compatibility mode, the process of upgrading producers/consumers will be different based on the compatibility mode enabled.
+
+- NONE
+    - As there are no compatibility checks, no order will grant a smooth transition
+    - In most of the cases this lead to having to create a new topic for this evolution
+- BACKWARD / BACKWARD_TRANSITIVE
+    - Consumers **MUST** be upgraded first before producing new data
+    - No forward compatibility, meaning that there's no guarantee that the consumers with older schemas are going to be able to read data produced with a new version
+- FORWARD / FORWARD_TRANSITIVE
+    - Producers **MUST** be upgraded first and then after ensuring that no older data is present, upgrade the consumers
+    - No backward compatibility, meaning that there's no guarantee that the consumers with newer schemas are going to be able to read data produced with an older version
+-  FULL / FULL TRANSITIVE
+    - No restrictions on the order, anything will work
+
+ #### How to deal with breaking changes
+
+If for any reason you need to use a less strict compatibility mode in a topic, or you can't avoid breaking changes in a given situation, the compatibility mode **SHOULD NOT** be modified on the same topic. 
+
+Instead, a new topic **SHOULD** be used to avoid unexpected behaviors or broken integrations. This allows a smooth transitioning from clients to the definitive topic, and once all clients are migrated the original one can be decommissioned.
+
+Alternatively, instead of modifying existing fields it **MAY** be considered as an suboptimal approach to add the changes in new fields and have both coexisting. Take into account that this pollutes your topic and it can cause some confusion. 

 ### Key/Value message format

- Kafka messages **MAY** include a key, which needs to be properly designed to have a good partition balanceare key-value pairs.
+ Kafka messages **MAY** include a key, which needs to be properly designed to have a good balance of data across partitions.

-The message key and the payload (often called value) can be serialized independently and can have different formats. For example, the payload of the message can be sent in AVRO format, while the message key can be a primitive type (string). 
+The message key and the payload (often called value) can be serialized independently and can have different formats. For example, the value of the message can be sent in AVRO format, while the message key can be a primitive type (string). 

 Message keys **SHOULD** be kept as simple as possible and use a primitive type when possible.

+### Message headers
+
+In addition to the key and value, a Kafka message **MAY** include ***headers***, which allow to extend the information sent with some metadata as needed (for example, source of the data, routing or tracing information or any relevant information that could be useful without having to parse the message).
+
+Headers are just an ordered collection of key/value pairs, being the key a String and the value a serialized Object, the same as the message value itself.
+
 ### Naming conventions

 As general naming conventions, asynchronous APIs **MUST** adhere to the following conventions
--- a/asynchronous-api-guidelines/03_asyncapi_kafka_specs/b_guidelines.md
+++ b/asynchronous-api-guidelines/03_asyncapi_kafka_specs/b_guidelines.md
@@ -44,7 +44,9 @@ All AsyncAPI specs **SHOULD** include as much information as needed in order to

 AsyncAPI specs **MUST** include at least one main contact under the info.contact section. 

-The spec only allows to include one contact there, but it **MAY** also include additional contacts using extension fields. For example:
+The spec only allows to include one contact there, but it **MAY** also include additional contacts using extension fields. In case this is done, it **MUST** use the extension field *x-additional-responsibles*.
+
+For example:

 ```yaml
 ...