Apache Kafka crashes, failures, partition reassignment. Trasparency is a requirement for distributed systems

“…your partitions will be reassigned when your app crashes, so you won’t even be able to detect the failure later unless you carefully track it …”

This was about the Kafka simple consumer from here http://engineering.onlive.com/2013/12/12/didnt-use-kafka/

Also from official documentations: “… For manually assigning replicas to…”.

We look for comments and discussion about trasparency in Kafka.

It can be on these issues as well on other ones:

Performance transparency: Client programs should continue to perform satisfactorily while the load on the service varies within a specified range.
A major task of many middleware platforms is to make remote invocations look like local ones.
Is the programmer protected from the details of data representation and marshalling?
A common requirement when data are replicated is for replication transparency. That is, clients should not normally have to be aware that multiple physical copies of data exist. As far as clients are concerned, data are organized as individual logical objects and they identify only one item in each case when they request an operation to be performed. Furthermore, clients expect operations to return only one set of values.

Is this provided by Kafka?

Message loss and Apache Kafka

There exists only one way to avoid message loss: use the tools that banks use.
They loose the money and feel the pain if a message dies and are seriuos about this. If you want kafka with no message loss so you need to :

  • implement your own protection layer around Kafka
  • monitor everything and everywhere.

Monitor is a beast to do operations fast. The SLAs are out of its possibility.

Kafka documentation.
We find that there is a lot of unconsistency. The same concept is exposed more than one time in a different way at different places. Some descriptions are just impossible to understand. LinkedIn guys are good at creating a huge cloud of blogs , benchmarks and assertions that no one can check. They also assert that it fulfils all the three requirements of CAP theorem. And are completely confused where they explain this. It tries to be many things contemporary. No university department has made any research trying to classify and find its nature just because it is a bag containing many things. There is plenty of performance benchmark instead which are done in friendly environments where all the nodes, consumers, zookeeper , producers, sinks stay well and never fail. The benchmarks are done from Linkedin girls or are based on their assertions.