Apache kafka How to run multiple storm topology at the same instance?

I am learning storm.I have a doubt regarding the number of topology we can run at a time on Apache Storm.I have submitted two topologies on storm cluster but at a time only one topology runs.I need to KILL or DEACTIVATE the already present topology to run any new topology. I am using Storm 0.9.4 Zookeeper 3.4.6 Kafka 2.10- I am running one instance of storm nimbus,supervisor and ui. Do I need to run multiple instances of each? What do I need to do to run multiple topologoies at the

Apache kafka Kafka consumer fetching metadata for topics failed

I am attempting to write a Java client for a third party's Kafka and ZooKeeper servers. I am able to list and describe topics, but when I attempt to read any, a ClosedChannelException is raised. I reproduce them here with the command line client. $ bin/kafka-console-consumer.sh --zookeeper --topic eventbustopic [2015-06-02 16:23:04,375] WARN Fetching topic metadata with correlation id 0 for topics [Set(eventbustopic)] from broker [id:1,host:SOME_HOST,port:9092] failed (kafk

Apache kafka In storm 1.0.2, kafka-spout consume same data repeatdely every restarting topology

I'm currently developing storm version up project 0.9.6->1.0.2 My spout did not start reading from the latest offset even though use same spout id in SpoutConfig constructor. Oh, I did not delete zookeeper data, just delete storm-data. I changed my project configuration and source like below 1. storm-core and storm-kafka version change 0.9.6 to 1.0.2 and kafka_2.10 in pom.xml. 2. change package path - backtype -> org.apache - storm.kafka -> org.apache.storm.kafka 3. Change serializing

Apache kafka Why does a Kafka consumer take a long time to start consuming?

We start a Kafka consumer, listening on a topic which may not yet be created (topic auto creation is enabled though). Not long thereafter a producer is publishing messages on that topic. However, it takes some time for the consumer to notice this: 5 minutes to be exact. At this point the consumer revokes its partitions and rejoins the consumer group. Kafka re-stabilizes the group. Looking at the time-stamps of the consumer vs. kafka logs, this process is initiated at the consumer side. I su

Apache kafka Gracefully shut down Flink Kafka Comsumer at run time

I am using FlinkKafkaConsumer010 with Flink 1.2.0, and the problem I am facing is that: Is there a way that I can shut down the entire pipeline programmatically if some scenario is seen? On possible solution is that I can shut down the kafka consumer source by calling the close() method defined inside of FlinkKafkaConsumer010, then the pipeline with shut down as well. For this approach, I create a list that contains the references to all FlinkKafkaConsumer010 instance that I created at the begi

Apache kafka Logstash is not reading from Kafka

I am testing a simple pipeline - Filebeat > Fafka > Logstash > File. Logstash is not reading from Kafka, but I see Kafka has messages when i use this command - bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic MyTopic --from-beginning My file beat configuration - filebeat.prospectors: - input_type: log paths: - /root/LogData/input.log output.kafka: hosts: [""] topic: MyTopic partition.round_robin: reachable_only: false require

Apache kafka 2 step windowed aggregation with Kafka Streams DSL

Suppose I have a stream "stream-1" consisting of 1 datapoint every second and I'd like to calculate a derived stream "stream-5" which contains the sum using a hopping window of 5 seconds and another stream "stream-10" which is based off "stream-5" containing the sum using a hopping window of 10 seconds. The aggregation needs to be done for each key separately and I'd like to be able to run each step in a different process. It is not a problem in itself if stream-5 and stream-10 contain updates f

Apache kafka readinessProbe (k8s) for kafka statefulset causes bad deployment

I have a kubernetes cluster (v 1.9.0) in which I deployed 3 zookeeper pods (working correctly) and I want to have 3 kafka replicas. The following statefulset works only if I comment the readinessProbe section. apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: kafka spec: selector: matchLabels: app: kafka serviceName: kafka replicas: 3 template: metadata: labels: app: kafka spec: affinity: podAntiAffinity: requiredDuring

Apache kafka is it ok to use the zookeeper within kafka in production?

kafka has a zookeeper. Is it ok to use it on production? bin/zookeeper-server-start.sh I want to use SASL with kafka. However I cann't find a way to chieve it with the offical zookeeper. I did make it work with the kafka zookeeper. Therefore I want to know if it's ok to use the zookeeper which is in kafka on production environment.

Apache kafka Kafka Real-Time guarantees

Can Kafka gurantee that a consumer sees the message x ms after it has been (successfully) produced? Background: I have a system, where service A accepts requests. Service B needs to be able to answer how many requests have been coming in by a certain time. Service B needs to be precise. My plan is: Service A accepts requests, it produces a message and waits for the ack of at least one replica. As it got it, it will send the user that it's request is "in the system". As Service B is asked, I

Apache kafka Kafka maximum number of connections

We are planning to implement Kafka to collect logs from all kind of devices. We expect to have around 10k of devices. Can we connect all these devices directly to a kafka cluster or should we funnel the logs through log servers to limit the number of connections to kafka? We plan to have one topic per kind of devices (Linux, Aix, Windows 2003, 2008 and so on) Thanks

Apache kafka Testing Kafka Processor API that uses SpecificAvroSerde

I'm trying to write unit test for a custom stream processor and got stuck with serializing the message i need to send for the test. I followed this example by kafka: https://kafka.apache.org/11/documentation/streams/developer-guide/testing.html . I use SpecificAvroSerde for a custom class (auto genereted avro class) in my stream, but i cant configure it in the test with the MockSchemaRegistryClient(), I only can point to the URL of SR. Serde<MyCustomObject> valueSerde = new SpecificA

Apache kafka Kafka Conenct: Automatically terminating after processing all data

I want to backup and restore a huge amount of data in a Kafka topic to various destinations (file, another topic, S3, ...) using Kafka Connect. However, it runs in a streaming mode and hence never terminates. But in my scenario it should exit automatically after processing all data that is currently in the topic (it is ensured in my context that all producers are shut down before the backup starts). Is there any option/ parameter so that a Kafka Connect connector automatically terminates after

Apache kafka Integrating WSO2 Siddhi CEP and Kafka

I'm currently in the process of integrating WSO2's Siddhi CEP and Kafka. I want to produce a Siddhi stream by receiving events from Kafka. The Kafka data being received is in JSON format, where each event looks something like this: { "event":{ "orderID":"1532538588320", "timestamps":[ 15325, 153 ], "earliestTime":1532538 } } The SiddhiApp that I'm trying to run in the WSO2 stream processor looks like this: @App:name('KafkaSiddhi') @App:de

Apache kafka Update ksql stream with new topic schema

I write avro messages into kafka topic using schema registry. Then created stream based on the topic. The scream created with current schema. Then I add new field to the schema. The schema register updated, it's OK, but the stream stay with the first structure. Can I update the stream with new schema? It's problematic for me to drop and create the schema again because I have lot of other streams\tables that depend on it. The KSQL don't allow to drop stream with dependencies.

Apache kafka Kafka reset partition re-consume or not

If I consume from my topic and manage the offset myself, some records I process are successful then I move the offset on-wards, but occasionally I process records that will throw an exception. I still need to move the offset onwards. But at a later point I will need to reset the offset and re-process the failed records. Is it possible when advancing the offset to set a flag to say that if I consumer over that event again ignore or consume?

Apache kafka How to configure the Kstream state folder

By default Kstream uses /tmp location for maintaining its state (kind of metadata) with the app name as as folder name. definition of state directory i faced the below error Caused by: org.rocksdb.RocksDBException: While open a file for appending: /tmp/kafka-streams******** :Disk quota exceeded

Apache kafka use environment var to configure kafka host on seedstack

I'm trying to use env var to configure a kafka on seedstack. The syntax works with mongoDB configuration but not with kafka configuration. here's my mongo conf: env: MONGO_URL: "localhost:27017" MONGO_CREDENTIAL: "" mongoDb: clients: mongoClient: databases: mongoDB uri: mongodb://${env.MONGO_CREDENTIAL}${env.MONGO_URL} here's my mongo kafka env: MONGO_URL: "localhost:27017" MONGO_CREDENTIAL: "" kafka: consumers: consumer1: topics: [topic1] pro

Apache kafka Kafka can't delete topic and reopen fail when delete topic

Every topic I create in Kafka and delete it. .\bin\windows\kafka-topics.bat --zookeeper localhost:2181 --delete --topic linlin6 they always say Topic linlin6 is marked for deletion. I set the delete.topic.enable=true in server.properties and it shows the same message to me and next time I start zookeeper and Kafka will give me error like this: java.nio.file.AccessDeniedException: D:\WEBSOCKET\kafka_2.12-2.1.0\logs\linlin6-0 -> D:\WEBSOCKET\kafka_2.12-2.1.0\logs\linlin6-0.bd274

Apache kafka Kafka Consumers get many replays when new consumers connect

I'm playing with Kafka, trying to get to grips with it. One of the things we need to be able to do is run load-balanced sets of servers - for redundancy/high availability/etc - and have then get rebooted independently of each other. Should be simple. What I've found though is slightly strange. If I'm running a single Kafka consumer that is processing a set of messages, and then I add a second consumer to the same consumer group whilst the messages are being processed, I get the entire set of me

Apache kafka Does Zookeeper require SSD disks for Apache Kafka Clusters?

we want to install kafka cluster and 3 zookeeper servers kafka should use the zookeeper servers in order to save the metadata on the zookeeper servers ZK Data and Log files should be on disks, which have least contention from other I/O activities. Ideally the ZK data and ZK transaction log files should be on different disks, so that they don't contend for the IO resource. Note that, it isn't enough to just have partitions but they have to be different disks to ensure performance. So dose z

Apache kafka Kafka HeartbeatThread BLOCKED

We are using spring kafka version(2.1.5.RELEASE). After our performance testing while analysing thread dump we saw below stack trace which indicates HeartbeatThread is being blocked by one of the Consumer thread LOG: Consumer Thread Dump org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1 - priority:5 - threadId:0x00007fca72cf4800 - nativeId:0x5d - nativeId (decimal):93 - state:RUNNABLE stackTrace: java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Nati

Apache kafka Kafka JUnit Test failing due to FileSystemException

I am trying to do some testing using the EmbeddedKafkaCluster from Kafka's integration test utils library. Something similar to the Integration Test examples Integration Test examples here. But I am hitting java.nio.file.FileSystemException: java.lang.RuntimeException: java.nio.file.FileSystemException: C:\Users\AppData\Local\Temp\junit3544952207288614104\junit965058698809817752\inputTopic-0\00000000000000000000.timeindex: The process cannot access the file because it is being used by another

Apache kafka Kafka Consumer keeps getting data that was produced and consumed 2 days ago, every 5 minutes

I am working on a Kafka Consumer, and I am noticing it is consuming messages that should have been consumed 2 days ago. It repeats these messages roughly every 5 minutes, and the producer is no longer producing those messages for 2 days. I have new data that should have been consumed, and the producer's logs show the new data is being produced and being sent to Kafka. But is not being consumed on the other side, it is just repeating the same data over and over again. The Kafka application is b

Apache kafka Different between KafkaProducer.close() and KafkaProducer.flush()

Looking at the documentation, I'm not sure if I understand the difference between using close() and flush(). This is the doc for flush() * Invoking this method makes all buffered records immediately available to send (even if <code>linger.ms</code> is * greater than 0) and blocks on the completion of the requests associated with these records. The post-condition * of <code>flush()</code> is that any previously sent record will have completed (e.g. <code>Future

Apache kafka Can not consume messages from Kafka cluster

I have defined a Kafka cluster of two nodes with a replication factor of 2. When I try to consume messages using the console consumer it doesn't do anything, it just waits. Producer ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic adi Consumer ./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic adi --from-beginning Cluster Description Running ./bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic adi renders: Topic:adi Part

Elasticsearch Distributed Kafka Connect with multiple Connectors and one Topic

What's the behavior of Offset Management of a Kafka Connect cluster in Distributed mode, that is running multiple Connectors and listen to the same set of Topics (or one Topic)? So in Distributed mode, Kafka Connect will store Offset information in Kafka, this Offset will be read and committed by the workers in the cluster. What happened if I have multiple Connectors running in that Kafka Connect cluster listening to the same Topic? Are the offset of a partition the same of all Connectors, or e

Apache kafka Creating topic in kafka_2.12-2.2.0 causes “Timed out waiting for a node assignment” error

Does anyone know how to fix this error when creating a new topic in Kafka? ➜ kafka_2.12-2.2.0 bin/kafka-topics.sh --create --bootstrap-server localhost:9093 --replication-factor 2 --partitions 2 --topic user-tracking Error while executing topic command : org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. [2019-07-14 13:01:35,094] ERROR java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node as

Apache kafka Kafka connect transformation isn't applied

I working in kerberised server , with distributed kafka connect. the connector work well, just the transformation part is totally ignored. I have no Warn or Error or any info in logs about this problem. My connector without transformation work well : { "name": "hdfs-avro-sink-X", "config": { "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector", "tasks.max": "1", "topics": "Y", "hdfs.url": "HA_name",

Apache kafka When to use ExponentialBackOffPolicy vs FixedBackOffPolicy when setting retry policy for a kafka consumer in a Spring boot app?

When to use ExponentialBackOffPolicy vs FixedBackOffPolicy when setting retry policy for a kafka consumer in a Spring boot app? I see FixedBackOffPolicy as an implementation of BackOffPolicy that pauses for a fixed period of time before continuing and ExponentialBackOffPolicy as an implementation of BackOffPolicy that increases the back off period for each retry attempt in a given set. Apart from this, FixedBackOffPolicy extends StatelessBackOffPolicy whereas ExponentialBackOffPolicy don't. In

Apache kafka Kafka Should Number of Consumer Threads equal number of Topic Partitions

Pretend you determined that you wanted to use exactly 8 consumer threads for your application. Would there be any difference in processing if a Kafka topic was set up as having 8 partitions vs 16 partitions? In the first case, each thread is assigned to a single partition with twice the data, and in the second case each thread is assigned to two partitions with half the data each. It seems to me that there is no difference between these two setups.

Apache kafka If we deploy streams app multiple times which has state, does tasks assigned to instances may change and it's rebuild state from earliest?

If there are no changes to number of kafka topic partitions and number of streams app replicas, if we redeploy the app does streams app build it's internal state from earliest? does stream tasks assign to the instances change? I see it changes sometimes. Ex: If we are running 12 partitions on 4 instances.

Apache kafka How can I produce a Kafka Record with null value using the kafka tool set

I'm using the following command: bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test.topic --property parse.key=true --property key.separator=# This allows me to start typing key#value entries. However, no matter what I try, I'm not able to create a null entry. If I try sending [myKey#] and press Enter, on the feed I will see an Empty message for the Key, but not null. I need to create a Null value.

Apache kafka Kafka - Loosing messages even if app is configured for exactly once and highest durability

There are cases (very rarely, but there are) when I receive duplicates, even if everything is configured for high durability and we use exactly once configuration. Please check below the application context and test scenario that causes this issue. Kafka Cluster Setup 3 x Kafka Brokers (1 on host1, 2 on host2 and 3 on host3) 3 x Zookeeper instances (1 on host1, 2 on host2 and 3 on host3) Kafka configuration broker.id=1,2,3 num.network.threads=2 num.io.threads=8 socket.s

Apache kafka How to minimize Kafka consumer's latency?

I am trying to develop a real time data streaming application with Kafka. Producers send message to broker which resides in different pc in same LAN. Then broker send message to consumer which also resides in different pc in same LAN. If producer,consumer and brokers are running on same pc no problem arrives. But in my scenario consumer's lag increases dramatically. On the other hand when new consumer running on broker machine it's lag value changes between 0 and 10. But remote consumer's lag va

Apache kafka Java Apache Kafka Producer Metadata Updater & Retry Logic

I am using Spring for Apache Kafka and have created a service that uses a Kafka Producer (org.apache.kafka.clients.producer) via Spring's KafkaTemplate to send messages to a topic. On the target Kafka cluster I have disabled auto topic creation. Using a combination of producer configurations listed here https://kafka.apache.org/documentation/#producerconfigs I am successfully controlling how many times a request is retried, time between retries, etc. If I provide a topic that does not exist

Apache kafka What is the use of Header in Kafka Processor API?

I am learning Kafka Processor API and find one method headers in ProcessorContext. headers​() Returns the headers of the current input record; could be null if it is not available What is the use of this method? In docs only one line is written: Returns the headers of the current input record; could be null if it is not available Can i perform some operation on this like add?

Apache kafka Zookeeper resiliency

We have cluster of 17 brokers and 5 zookeepers. I wanted to test resiliency of zookeepers. So I took down 3 zookeepers as my understanding is that for a cluster with 5 zookeepers the maximum outage it can withstand is failure of 2 ( using 2n+1 rule) zookeepers. But to my surprise I was able to produce & consume data. And even with all the zookeepers ( i.e. all 5) down I was able to produce data. Can some explain the reason behind the two behaviors ?

Apache kafka I am trying to set up Kafka in my local mac

I have unzipped the kafka_2.12-2.5.0 version and started zookeeper and when I am trying to start kafka using the command "bin/zookeeper-server-start.sh config/zookeeper.properties" i am getting the following eror: /Users/manig/Desktop/kafka_2.12-2.5.0/bin/kafka-run-class.sh: line 317: /Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home#/bin/java: No such file or directory /Users/manig/Desktop/kafka_2.12-2.5.0/bin/kafka-run-class.sh: line 317: exec: /Library/Java/JavaVirtualMachine

Apache kafka Kafka Mirror Maker - Lag is aggregated and then each 60 seconds it is zeroed and then all over again

I am using Kafka MirrorMaker based on Kafka http://apache.cbox.biz/kafka/2.4.1/kafka_2.13-2.4.1.tgz image. My issue is that no matter what I tried, MirrorMaker is aggregating lag on 10 partitions of a topic for 60 seconds, then lag is zeroed in a second as messages are obviously not that much and then lag is growing again for 60 seconds. I would like messages and lag to be zeroed say each 10 seconds, but couldn't achieve it, although I played a bit mostly with mirrormaker producer config file. c

Apache kafka Kafka Connect - JDBC Avro connect how define custom schema registry

I was following tutorial on kafka connect, and I am wondering if there is a possibility to define a custom schema registry for a topic which data came from a MySql table. I can't find where define it in my json/connect config and I don't want to create a new version of that schema after creating it. My MySql table called stations has this schema Field | Type ---------------+------------- code | varchar(4) date_measuring | timestamp attributes | varchar(256) w

Apache kafka In KStreams How can I dynamically control when Ktable/Ktable joins yield results?

I have a Ktable to KTable join. I create the Ktables using .aggregate() Those yield results to the next stream processor when either side receives a new message. I have a use case where I can receive another message on the left KTable, but the message is a "duplicate". It's not an actual duplicate in the technical sense but it's a duplicate per my business logic (it contains X,Y and Z fields that have identical values to the previous message). How can I check the previous aggregate val

Apache kafka KafkaJS: ECONNREFUSED when trying to produce a message on a topic

I'm using KafkaJS to produce a message on a Kafka topic. To do so, I've put the Kafka server in Docker using the wurstmeister image. What I want to do: the Poll container produce a message to the Poll topic and consume messages from the responsePoll topic. But I have an error when trying to produce the message Error: poll | {"level":"ERROR","timestamp":"2020-10-24T15:21:27.113Z","logger":"kafkajs","message":"[Connection]

Apache kafka KSQL create table with multi-column aggregation

So essentially I'd like to group by two columns, like this: CREATE TABLE foo AS SELECT a, b, SUM(a) FROM whatever GROUP BY a, b whatever is a stream in Kafka format. When I issue the command, ksql returns: Key format does not support schema. format: KAFKA schema: Persistence{columns=[`a` STRING KEY, `b` STRING KEY], features=[]} reason: The 'KAFKA' format only supports a single field. Got: [`a` STRING KEY, `b` STRING KEY] Caused by: The 'KAFKA' format only supports a single field. Got: [`DEVICE

Apache kafka kafka Leader skew: After adding a new broker to the cluster and reassigning partitions, kafka brokers leaders are skewed

I have a Kafka cluster with 3 zookeeper nodes and 4 Kafka nodes. I added 2 new brokers to the partition. The config auto.leader.rebalance.enable is set to true on all the brokers and leader.imbalance.check.interval.seconds, leader.imbalance.per.broker.percentage have the default value. To distribute the partitions across all brokers, I generated and reassigned the partitions. But the generation did not generate balanced leadership across all brokers. Two of the old brokers served as a leader for

Apache kafka Lenses MQTT Source Connector doesn't send PINGREQ to MQTT Broker when idle in its keep-alive time

PROBLEM - I have created an MQTT Source Connector using Lenses. The connector works fine till the data is being published on my MQTT Mosquitto broker and works seemlessly. Now, when i stop publishing the data and there is no data sent to the mqtt source connector, after about 4-5 mins , if i start publishing the data again , the data doesn't come in my source connector even though the connector is still in running state. For resolving this i need to restart my connector everytime which is bad.

  1    2   3   4   5   6  ... 下一页 最后一页 共 38 页