Skip to content Skip to sidebar Skip to footer

Spark Stream - 'utf8' Codec Can't Decode Bytes

I'm fairly new to stream programming. We have Kafka stream which use Avro. I want to connect a Kafka Stream to Spark Stream. I used bellow code. kvs = KafkaUtils.createDirectStream

Solution 1:

Right, the problem is with deserialization of the stream. You can use confluent-kafka-python library and specify valueDecoder in :

from confluent_kafka.avro.cached_schema_registry_client import CachedSchemaRegistryClient`
from confluent_kafka.avro.serializer.message_serializer import MessageSerializer

kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers}, valueDecoder=MessageSerializer.decode_message)`

Credits for the solution to https://stackoverflow.com/a/49179186/6336337

Solution 2:

Yes you should specify it.

With java :

creation of stream :

finalJavaInputDStream<ConsumerRecord<String, avroType>> stream =KafkaUtils.createDirectStream(
                        jssc,
                        LocationStrategies.PreferConsistent(),
                        ConsumerStrategies.Subscribe(topics, kafkaParams));

in the kafka consumer config :

kafkaParams.put("key.deserializer", org.apache.kafka.common.serialization.StringDeserializer.class);
        kafkaParams.put("value.deserializer", SpecificAvroDeserializer.class);

Post a Comment for "Spark Stream - 'utf8' Codec Can't Decode Bytes"