What you’ll learn
What is Kafka
Application
Getting started
Configuring the producer
Configuring the consumer
Big Picture
Running the application
Testing the application

Consuming data streams using Apache Kafka

15 minutes

Prerequisites:

Table of contents

Learn how to deploy the Apache Kafka platform to implement real-time data streaming between systems

What you’ll learn

You will learn about Kafka, a platform that handles real-time data streams that allows for more elegant and efficient communication between systems. You will then learn to implement a Kafka solution. First, you will create a producer that will generate records and push them to a Kafka server. Then you will create a consumer that will pull those records from the server and process them.

What is Kafka

Apache Kafka is a stream-processing platform that manages communication in distributed systems. Communication is message-oriented, and follows the publish-subscribe model. Kafka allows for real-time stream processing and distributed, replicated storage of streams and messages.

A stream of records is called a topic which is stored as a partitioned log. Records are added to the logs by the producer, which will add records to the ends of each partition along with a timestamp. Consumers will then read data from these logs, starting from the beginning of each one. Once a record has been consumed, it will be removed after a configured retention period, typically 24 hours, has passed. Records are removed starting from the head of the partition (FIFO). Simply put, a topic is a collection of different queues of records. Multiple consumers can read from the same partition and each consumer will have its own offset value to keep track of where it is reading from the partition. Each record will only be read once and there are multiple consumers active in the same partition, so you may wonder if this system is error prone and the answer is no. Kafka uses Apache Zookeeper to store data about the consumers and partition. Zookeeper keeps track of, among other things, the already consumed records, meaning we can avoid duplicate consumptions and avoid faults in this respect.

Topics run on servers that are called Kafka brokers, and groups of these brokers form a cluster. These brokers then act as the contact points for the producers and the consumers. The producers will, in batches, push messages to the broker and similarly, the consumers will pull records from the broker. In your application, you will have to create a producer and consumer that will interact with each other through the Kafka server.

Application

The application you will be working with is a job manager that maintains an inventory of available systems. It consists of four microservices, gateway, job, system, and inventory. The job microservice allows you to dispatch jobs that will be run by the system microservice. The job is a sleep operation used to represent a slow task that lasts for a duration of 5 to 10 seconds. When it completes, the system microservice reports the sleep time as the result of this job. In addition to running jobs, the system microservice also registers itself on startup with the inventory microservice that keeps track of all instances of the system microservice. Finally, the gateway microservice is a backend for frontend service. It communicates with the backend job and inventory microservices on the caller’s behalf.

The two microservices you will modify are the system and inventory services. The inventory service monitors the status of instances of the system service.

The implementations of the application and its services are provided for you in the start/src directory.

Getting started

The fastest way to work through this guide is to clone the Git repository and use the projects that are provided inside:

git clone https://github.com/openliberty/guide-kafka-intro.git
cd guide-kafka-intro

The start directory contains the starting project that you will build upon.

The finish directory contains the finished project that you will build.

Before you begin, make sure you have all the necessary prerequisites.

Configuring the producer

To begin, we have to set up our producer so that we can generate data for our system. The producer has to generate records and push them to the Kafka broker.

#Replace the `SystemProducer` class.#
`system/src/main/java/io/openliberty/guides/system/SystemProducer.java`

SystemProducer.java

 1// tag::copyright[]
 2/*******************************************************************************
 3 * Copyright (c) 2019 IBM Corporation and others.
 4 * All rights reserved. This program and the accompanying materials
 5 * are made available under the terms of the Eclipse Public License v1.0
 6 * which accompanies this distribution, and is available at
 7 * http://www.eclipse.org/legal/epl-v10.html
 8 *
 9 * Contributors:
10 *     IBM Corporation - Initial implementation
11 *******************************************************************************/
12// end::copyright[]
13package io.openliberty.guides.system;
14
15import java.util.Properties;
16
17import javax.enterprise.context.ApplicationScoped;
18import org.apache.kafka.clients.producer.KafkaProducer;
19import org.apache.kafka.clients.producer.Producer;
20import org.apache.kafka.clients.producer.ProducerConfig;
21import org.apache.kafka.clients.producer.ProducerRecord;
22
23@ApplicationScoped
24public class SystemProducer {
25
26  private Producer<String, String> producer;
27
28  public SystemProducer() {
29    String kafkaServer = System.getenv("KAFKA_SERVER");
30
31    Properties properties = new Properties();
32    // tag::serverConfig[]
33    properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaServer);
34    // end::serverConfig[]
35    // tag::keySerialConfig[]
36    properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
37    // end::keySerialConfig[]
38    // tag::valueSerialConfig[]
39    properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
40    // end::valueSerialConfig[]
41
42    // tag::producerCreate[]
43    this.producer = new KafkaProducer<>(properties);
44    // end::producerCreate[]
45  }
46
47  // tag::sendMessage[]
48  public void sendMessage(String topic, String message) {
49    producer.send(new ProducerRecord<String,String>(topic, message));
50  } 
51  // end::sendMessage[]
52}

First, we have to configure the properties of the producer of which we require at least three. The first is setting the bootstrap server property. This property is set with the ProducerConfig.BOOTSTRAP_SERVERS_CONFIG which creates the connection between the cluster and the producer. The next two properties are the ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG and the ProducerConfig.VALUE_SERIALIZER_CONFIG. These are required so that the system knows what kind of serializer to use to convert your records into bytes. In this case, we are using StringSerializer because the keys and values of our produced records will be of type String. These properties are then used to create the actual producer by calling KafkaProducer()

Finally, we create the sendMessage() method to create a ProducerRecord, which will be serialized and sent to the broker. Notice how the actual sending is performed by the built-in send() and not our defined sendMessage() method. The former is asynchronous and will have a result containing the partition the record was sent to, its offset within the partition as well as its timestamp.

Producers can batch records before they are actually sent. Once send() completes, records are put in the buffer and the method immediately returns so more records may be fetched. This means that multiple records can be grouped together and sent at once instead of sending one at a time. This allows Kafka to achieve better overall throughput while only paying a small latency penalty.

Producers support a lot more functionality that is not shown in this example. We can set an acks configuration so that transactions are only complete if we receive an acknowledgement, allowing us to trade speed for consistency. We can also configure the buffer size, allowing us to reduce our total number of transactions with the broker and increasing efficiency. You can read about these configurations and more in the KafkaProducer documentation

Configuring the consumer

Similarly to the producer, we must first set some basic configurations before actually creating an instance of the KafkaConsumer class.

#Replace the `SystemConsumer` class.#
`inventory/src/main/java/io/openliberty/guides/inventory/SystemConsumer.java`

SystemConsumer.java

 1// tag::copyright[]
 2/*******************************************************************************
 3 * Copyright (c) 2019 IBM Corporation and others.
 4 * All rights reserved. This program and the accompanying materials
 5 * are made available under the terms of the Eclipse Public License v1.0
 6 * which accompanies this distribution, and is available at
 7 * http://www.eclipse.org/legal/epl-v10.html
 8 *
 9 * Contributors:
10 *     IBM Corporation - Initial implementation
11 *******************************************************************************/
12// end::copyright[]
13package io.openliberty.guides.inventory;
14
15import java.time.Duration;
16import java.util.ArrayList;
17import java.util.Arrays;
18import java.util.List;
19import java.util.Properties;
20import java.util.UUID;
21import java.util.stream.Collectors;
22
23import javax.json.bind.Jsonb;
24import javax.json.bind.JsonbBuilder;
25
26import org.apache.kafka.clients.consumer.Consumer;
27import org.apache.kafka.clients.consumer.ConsumerConfig;
28import org.apache.kafka.clients.consumer.ConsumerRecord;
29import org.apache.kafka.clients.consumer.ConsumerRecords;
30import org.apache.kafka.clients.consumer.KafkaConsumer;
31import org.apache.kafka.common.serialization.StringDeserializer;
32
33public class SystemConsumer implements Runnable {
34  private Consumer<String, String> consumer;
35  private InventoryManager manager;
36
37  private final String OFFSET_RESET_CONFIG = "earliest";
38
39  public SystemConsumer(InventoryManager manager, String kafkaServer, String groupIdPrefix) {
40    Properties props = new Properties();
41    props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaServer);
42    props.put(ConsumerConfig.GROUP_ID_CONFIG, String.format("%s-%s", groupIdPrefix, UUID.randomUUID().toString()));
43    props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
44    props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
45    props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, OFFSET_RESET_CONFIG);
46
47    this.consumer = new KafkaConsumer<>(props);
48    this.consumer.subscribe(Arrays.asList("system-topic"));
49    this.manager = manager;
50  }
51
52  public List<String> consumeMessages() {
53    List<String> result = new ArrayList<String>();
54    ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(3));
55
56    for (ConsumerRecord<String, String> record : records) {
57      result.add(record.value());
58    }
59
60    consumer.commitAsync();
61    return result;
62  }
63
64  @Override
65  public void run() {
66    while (true) {
67      Jsonb jsonb = JsonbBuilder.create();
68      List<Properties> propertiesList = consumeMessages()
69          .stream()
70          .map(m -> jsonb.fromJson(m, Properties.class))
71          .collect(Collectors.toList());
72
73      for (Properties p : propertiesList) {
74        String hostname = p.getProperty("hostname");
75        p.remove("hostname");
76
77        if (manager.containsHostname(hostname)) {
78          manager.updateSystem(hostname, p.getProperty("system.busy"));
79        } else {
80          manager.add(hostname, p);
81        }
82      }
83    }
84  }
85}

We have the three equivalent minimal required configurations: ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, ConsumerConfig.KEY_SERIALIZER_CLASS_CONFIG and ConsumerConfig.VALUE_SERIALIZER_CLASS_CONFIG. On top of these three, we need some additional configurations. One required configuration is the ConsumerConfig.GROUP_ID_CONFIG, which allows you to group your various consumers for better synchronization. The final configuration in this consumer is the ConsumerConfig.AUTO_OFFSET_RESET_CONFIG which defines the behaviour of the consumer when its offset (index in partition) is not valid or defined. So it helps handle the offset when the consumer is initialized or when its offset is out of index. In this case, we reset the offset its earliest valid offset. You can add additional configurations to suit your needs in the KafkaConsumer documentation

After the properties have been set and the consumer created, the consumer is connected to the producer via the subscribe() method. The consumer subscribes to the producer, so that it can retrieve records after they are created. Subscribers are not explicitly informed of updates, instead they must poll the Kafka server for changes. This is done using the Consumer.poll() method in the Kafka.clients.consumer.Consumer class. In this case, the method is called in the consumeMessages() method and takes an argument of Duration.ofSeconds(3) which tells the consumer to only poll every 3 seconds. Records are stored as ConsumerRecords<T> objects and can then be manipulated however we want. The consumer will then commit its offset to the broker to indicate the number of consecutive records that have been committed. This is done using the commitAsync() method. Kafka also supports automatic commits with the enable.auto.commit = true property as well as synchronous manual commits with the commitSync() method. Without this call, the consumers index in each partition will never be updated.

The retrieved records are then parsed from JSON into a Properties class and the system status will be updated depending on the results in the record.

Big Picture

What you have made is a simple connection that moves a service status from one service to another. In the overall application, there are more Kafka connections in the whole application that we have not explicitly shown you. There are two more sets of producers and consumers between the Job and System services. The Job service has a producer (JobProducer) that creates new jobs that are published to a Kafka broker, and SystemRunnable contains the System service’s consumer that will take those jobs, and process them. SystemProducer (note: same producer as before) will then reciprocate and push the results to the Kafka broker. Then the SystemConsumer class and the JobConsumer class will consume their respective data. So in total, there are 3 Kafka connections in this application, of which you created one.

Running the application

Testing the application

A test has been provided for you to test the basic functionality of the consumer. If the test fails, then you may have introduced a bug into the code.

Create the InventoryEndpointIT class.
inventory/src/test/java/it/io/openliberty/guides/inventory/InventoryEndpointIT.java

InventoryEndpointIT.java

 1// tag::copyright[]
 2/*******************************************************************************
 3 * Copyright (c) 2019 IBM Corporation and others.
 4 * All rights reserved. This program and the accompanying materials
 5 * are made available under the terms of the Eclipse Public License v1.0
 6 * which accompanies this distribution, and is available at
 7 * http://www.eclipse.org/legal/epl-v10.html
 8 *
 9 * Contributors:
10 *     IBM Corporation - Initial implementation
11 *******************************************************************************/
12// end::copyright[]
13package it.io.openliberty.guides.inventory;
14
15import static org.junit.Assert.assertEquals;
16
17import java.io.IOException;
18
19import javax.json.JsonArray;
20import javax.json.JsonObject;
21import javax.net.ssl.HostnameVerifier;
22import javax.net.ssl.SSLSession;
23import javax.ws.rs.client.Client;
24import javax.ws.rs.client.ClientBuilder;
25import javax.ws.rs.core.Response;
26
27import org.apache.cxf.jaxrs.provider.jsrjsonp.JsrJsonpProvider;
28import org.junit.After;
29import org.junit.Before;
30import org.junit.Test;
31
32public class InventoryEndpointIT {
33
34    private static final String port = System.getProperty("test.http.port");
35    private static final String BASE_URL = "http://localhost:" + port + "/inventory/systems";
36
37    private Client client;
38
39    @Before
40    public void setup() throws InterruptedException {
41        client = ClientBuilder.newBuilder()
42                    .hostnameVerifier(new HostnameVerifier() {
43                        public boolean verify(String hostname, SSLSession session) {
44                            return true;
45                        }
46                    })
47                    .register(JsrJsonpProvider.class)
48                    .build();
49    }
50
51    @After
52    public void teardown() {
53        client.close();
54    }
55
56    @Test
57    public void testConsumeSystem() throws InterruptedException, IOException {
58
59        Response response = client
60            .target(BASE_URL)
61            .request()
62            .get();
63        assertEquals(200, response.getStatus());
64
65        JsonObject obj = response.readEntity(JsonObject.class);
66        int initialTotal = obj.getInt("total");
67        assertEquals(0,initialTotal);
68        JsonArray systems = obj.getJsonArray("systems");
69        assertEquals(0,systems.size());
70    }
71
72}