In our experience messaging uses are often comparatively low-throughput, but may require low end-to-end latency and often depend on the strongdurability guarantees Kafka provides.In this domain Kafka is comparable to traditional messaging systems such as ActiveMQ orRabbitMQ.Website Activity TrackingThe original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds.This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type.These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop oroffline data warehousing systems for offline processing and reporting.Activity tracking is often very high volume as many activity messages are generated for each user page view.MetricsKafka is often used for operational monitoring data.This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.Log AggregationMany people use Kafka as a replacement for a log aggregation solution.Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing.Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages.This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption.In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication,and much lower end-to-end latency.Stream ProcessingMany users of Kafka process data in processing pipelines consisting of multiple stages, where raw input data is consumed from Kafka topics and thenaggregated, enriched, or otherwise transformed into new topics for further consumption or follow-up processing.For example, a processing pipeline for recommending news articles might crawl article content from RSS feeds and publish it to an "articles" topic;further processing might normalize or deduplicate this content and publish the cleansed article content to a new topic;a final processing stage might attempt to recommend this content to users.Such processing pipelines create graphs of real-time data flows based on the individual topics.Starting in 0.10.0.0, a light-weight but powerful stream processing library called Kafka Streamsis available in Apache Kafka to perform such data processing as described above.Apart from Kafka Streams, alternative open source stream processing tools include Apache Storm andApache Samza.Event SourcingEvent sourcing is a style of application design where state changes are logged as atime-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.Commit LogKafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncingmechanism for failed nodes to restore their data.The log compaction feature in Kafka helps support this usage.In this usage Kafka is similar to Apache BookKeeper project. 1.3 Quick Start /*Licensed to the Apache Software Foundation (ASF) under one or morecontributor license agreements. See the NOTICE file distributed withthis work for additional information regarding copyright ownership.The ASF licenses this file to You under the Apache License, Version 2.0(the "License"); you may not use this file except in compliance withthe License. You may obtain a copy of the License at -2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.*/// Define variables for doc templatesvar context= "version": "11", "dotVersion": "1.1", "fullDotVersion": "1.1.0", "scalaVersion": "2.11";This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data.Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows platforms use bin\windows\ instead of bin/, and change the script extension to .bat.
Note: If you are willing to accept downtime, you can simply take all the brokers down, update the code and start all of them. They will start with the new protocol by default.Note: Bumping the protocol version and restarting can be done any time after the brokers were upgraded. It does not have to be immediately after.Potential breaking changes in 0.9.0.0 Java 1.6 is no longer supported.
Scala 2.9 is no longer supported.
Broker IDs above 1000 are now reserved by default to automatically assigned broker IDs. If your cluster has existing broker IDs above that threshold make sure to increase the reserved.broker.max.id broker configuration property accordingly.
Configuration parameter replica.lag.max.messages was removed. Partition leaders will no longer consider the number of lagging messages when deciding which replicas are in sync.
Configuration parameter replica.lag.time.max.ms now refers not just to the time passed since last fetch request from replica, but also to time since the replica last caught up. Replicas that are still fetching messages from leaders but did not catch up to the latest messages in replica.lag.time.max.ms will be considered out of sync.
Compacted topics no longer accept messages without key and an exception is thrown by the producer if this is attempted. In 0.8.x, a message without key would cause the log compaction thread to subsequently complain and quit (and stop compacting all compacted topics).
MirrorMaker no longer supports multiple target clusters. As a result it will only accept a single --consumer.config parameter. To mirror multiple source clusters, you will need at least one MirrorMaker instance per source cluster, each with its own consumer configuration.
Tools packaged under org.apache.kafka.clients.tools.* have been moved to org.apache.kafka.tools.*. All included scripts will still function as usual, only custom code directly importing these classes will be affected.
The default Kafka JVM performance options (KAFKA_JVM_PERFORMANCE_OPTS) have been changed in kafka-run-class.sh.
The kafka-topics.sh script (kafka.admin.TopicCommand) now exits with non-zero exit code on failure.
The kafka-topics.sh script (kafka.admin.TopicCommand) will now print a warning when topic names risk metric collisions due to the use of a '.' or '_' in the topic name, and error in the case of an actual collision.
The kafka-console-producer.sh script (kafka.tools.ConsoleProducer) will use the Java producer instead of the old Scala producer be default, and users have to specify 'old-producer' to use the old producer.
By default, all command line tools will print all logging messages to stderr instead of stdout.
Notable changes in 0.9.0.1 The new broker id generation feature can be disabled by setting broker.id.generation.enable to false.
Configuration parameter log.cleaner.enable is now true by default. This means topics with a cleanup.policy=compact will now be compacted by default, and 128 MB of heap will be allocated to the cleaner process via log.cleaner.dedupe.buffer.size. You may want to review log.cleaner.dedupe.buffer.size and the other log.cleaner configuration values based on your usage of compacted topics.
Default value of configuration parameter fetch.min.bytes for the new consumer is now 1 by default.
Deprecations in 0.9.0.0 Altering topic configuration from the kafka-topics.sh script (kafka.admin.TopicCommand) has been deprecated. Going forward, please use the kafka-configs.sh script (kafka.admin.ConfigCommand) for this functionality.
The kafka-consumer-offset-checker.sh (kafka.tools.ConsumerOffsetChecker) has been deprecated. Going forward, please use kafka-consumer-groups.sh (kafka.admin.ConsumerGroupCommand) for this functionality.
The kafka.tools.ProducerPerformance class has been deprecated. Going forward, please use org.apache.kafka.tools.ProducerPerformance for this functionality (kafka-producer-perf-test.sh will also be changed to use the new class).
The producer config block.on.buffer.full has been deprecated and will be removed in future release. Currently its default value has been changed to false. The KafkaProducer will no longer throw BufferExhaustedException but instead will use max.block.ms value to block, after which it will throw a TimeoutException. If block.on.buffer.full property is set to true explicitly, it will set the max.block.ms to Long.MAX_VALUE and metadata.fetch.timeout.ms will not be honoured
Upgrading from 0.8.1 to 0.8.20.8.2 is fully compatible with 0.8.1. The upgrade can be done one broker at a time by simply bringing it down, updating the code, and restarting it.Upgrading from 0.8.0 to 0.8.10.8.1 is fully compatible with 0.8. The upgrade can be done one broker at a time by simply bringing it down, updating the code, and restarting it.Upgrading from 0.7Release 0.7 is incompatible with newer releases. Major changes were made to the API, ZooKeeper data structures, and protocol, and configuration in order to add replication (Which was missing in 0.7). The upgrade from 0.7 to later versions requires a special tool for migration. This migration can be done without downtime. 2. APIs Kafka includes five core apis:The Producer API allows applications to send streams of data to topics in the Kafka cluster.The Consumer API allows applications to read streams of data from topics in the Kafka cluster.The Streams API allows transforming streams of data from input topics to output topics.The Connect API allows implementing connectors that continually pull from some source system or application into Kafka or push from Kafka into some sink system or application.The AdminClient API allows managing and inspecting topics, brokers, and other Kafka objects.Kafka exposes all its functionality over a language independent protocol which has clients available in many programming languages. However only the Java clients are maintained as part of the main Kafka project, the others are available as independent open source projects. A list of non-Java clients is available here.2.1 Producer APIThe Producer API allows applications to send streams of data to topics in the Kafka cluster.Examples showing how to use the producer are given in thejavadocs.To use the producer, you can use the following maven dependency:org.apache.kafkakafka-clientsfullDotVersion2.2 Consumer APIThe Consumer API allows applications to read streams of data from topics in the Kafka cluster.Examples showing how to use the consumer are given in thejavadocs.To use the consumer, you can use the following maven dependency:org.apache.kafkakafka-clientsfullDotVersion2.3 Streams APIThe Streams API allows transforming streams of data from input topics to output topics.Examples showing how to use this library are given in thejavadocsAdditional documentation on using the Streams API is available here.To use Kafka Streams you can use the following maven dependency:org.apache.kafkakafka-streamsfullDotVersion2.4 Connect APIThe Connect API allows implementing connectors that continually pull from some source data system into Kafka or push from Kafka into some sink data system.Many users of Connect won't need to use this API directly, though, they can use pre-built connectors without needing to write any code. Additional information on using Connect is available here.Those who want to implement custom connectors can see the javadoc.2.5 AdminClient APIThe AdminClient API supports managing and inspecting topics, brokers, acls, and other Kafka objects.To use the AdminClient API, add the following Maven dependency:org.apache.kafkakafka-clientsfullDotVersionFor more information about the AdminClient APIs, see the javadoc.2.6 Legacy APIsA more limited legacy producer and consumer api is also included in Kafka. These old Scala APIs are deprecated and only still available for compatibility purposes. Information on them can be found here here.
ASP.NET Zero Power Tools V2.1.0.1
2ff7e9595c
Comments