Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Streaming from Kafka has error numRecords must not be negative

Its kind of strange error because I still push data to kafka and consume message from kafka and Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: numRecords must not be negative is kind of strange too. I search and don't get any resource related to.

Let me explain my cluster. I have 1 server is master and agents run mesos, on that I set up 3 brokers of kafka like that. Then I run spark-job on that cluster. brokers Mesos I am using spark 1.5.2

brokers:
  id: 0
  active: true
  state: running
  resources: cpus:1.00, mem:1024, heap:512, port:31000
  failover: delay:1m, max-delay:10m
  stickiness: period:10m, hostname:test-master
  task:
    id: broker-0-c32082d0-a544-4260-b7c4-0239d99f0972
    state: running
    endpoint: test-master:31000
  metrics:
    collected: 2016-01-25 17:46:47+08
    under-replicated-partitions: 0
    offline-partitions-count: 0
    is-active-controller: 1

  id: 1
  active: true
  state: running
  resources: cpus:1.00, mem:1024, heap:512, port:31001
  failover: delay:1m, max-delay:10m
  stickiness: period:10m, hostname:test-master
  task:
    id: broker-1-7b30d6ad-6b19-4420-b743-c6f7f1adfb07
    state: running
    endpoint: test-master:31001
  metrics:
    collected: 2016-01-25 17:46:31+08
    under-replicated-partitions: 0
    offline-partitions-count: 0
    is-active-controller: 0

  id: 2
  active: true
  state: running
  resources: cpus:1.00, mem:1024, heap:512, port:31002
  failover: delay:1m, max-delay:10m
  stickiness: period:10m, hostname:test-master
  task:
    id: broker-2-8ef6437b-79b2-4183-8653-17cf2fe4591f
    state: running
    endpoint: test-master:31002
  metrics:
    collected: 2016-01-25 17:46:38+08
    under-replicated-partitions: 0
    offline-partitions-count: 0
    is-active-controller: 0

Then I run spark-streaming job get data from kafka then parsing.

I checked broker is working by using

kafkacat -b test-master:31001,test-master:31000,test-master:31002 -t bid_event

It got data but when I run spark-job I get error

6/01/25 17:44:52 INFO SparkContext: Running Spark version 1.5.2
16/01/25 17:44:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/25 17:44:52 INFO SecurityManager: Changing view acls to: ubuntu
16/01/25 17:44:52 INFO SecurityManager: Changing modify acls to: ubuntu
16/01/25 17:44:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
16/01/25 17:44:53 INFO Slf4jLogger: Slf4jLogger started
16/01/25 17:44:53 INFO Remoting: Starting remoting
16/01/25 17:44:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:51816]
16/01/25 17:44:53 INFO Utils: Successfully started service 'sparkDriver' on port 51816.
16/01/25 17:44:53 INFO SparkEnv: Registering MapOutputTracker
16/01/25 17:44:53 INFO SparkEnv: Registering BlockManagerMaster
16/01/25 17:44:53 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-97e9787b-3a67-4d00-aff6-a5e02b271a74
16/01/25 17:44:53 INFO MemoryStore: MemoryStore started with capacity 441.9 MB
16/01/25 17:44:53 INFO HttpFileServer: HTTP File server directory is /tmp/spark-442ac339-2ac7-427f-9e6b-5b5cb18a54dd/httpd-4724a937-4d03-4bd0-99f1-7b9f1129291e
16/01/25 17:44:53 INFO HttpServer: Starting HTTP Server
16/01/25 17:44:53 INFO Utils: Successfully started service 'HTTP file server' on port 51817.
16/01/25 17:44:53 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/25 17:44:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/25 17:44:53 INFO SparkUI: Started SparkUI at http://10.xxx.xxx.25:4040
16/01/25 17:44:54 INFO SparkContext: Added JAR file:/home/ubuntu/spark-jobs/./rtb_spark-assembly-1.0-deps.jar at http://10.xxx.xxx.25:51817/jars/rtb_spark-assembly-1.0-deps.jar with timestamp 1453715094219
16/01/25 17:44:54 INFO SparkContext: Added JAR file:/home/ubuntu/spark-jobs/./rtb-spark.jar at http://10.xxx.xxx.25:51817/jars/rtb-spark.jar with timestamp 1453715094222
16/01/25 17:44:54 INFO Utils: Copying /home/ubuntu/spark-jobs/./test.conf to /tmp/spark-442ac339-2ac7-427f-9e6b-5b5cb18a54dd/userFiles-cdef27e0-c357-4ebb-adcf-ccf963ff9d60/test.conf
16/01/25 17:44:54 INFO SparkContext: Added file file:/home/ubuntu/spark-jobs/./test.conf at http://10.xxx.xxx.25:51817/files/test.conf with timestamp 1453715094309
16/01/25 17:44:54 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
2016-01-25 17:44:54,444:5202(0x7f2d7604f700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2016-01-25 17:44:54,444:5202(0x7f2d7604f700):ZOO_INFO@log_env@716: Client environment:host.name=knx-rtb-server-google-test
2016-01-25 17:44:54,444:5202(0x7f2d7604f700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2016-01-25 17:44:54,444:5202(0x7f2d7604f700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-76-generic
2016-01-25 17:44:54,444:5202(0x7f2d7604f700):ZOO_INFO@log_env@725: Client environment:os.version=#120~precise1-Ubuntu SMP Tue Jan 19 11:09:43 UTC 2016
I0125 17:44:54.444169  5444 sched.cpp:166] Version: 0.26.0
2016-01-25 17:44:54,444:5202(0x7f2d7604f700):ZOO_INFO@log_env@733: Client environment:user.name=ubuntu
2016-01-25 17:44:54,444:5202(0x7f2d7604f700):ZOO_INFO@log_env@741: Client environment:user.home=/home/ubuntu
2016-01-25 17:44:54,444:5202(0x7f2d7604f700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/ubuntu/spark-jobs
2016-01-25 17:44:54,444:5202(0x7f2d7604f700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=test-master:2181 sessionTimeout=10000 watcher=0x7f2ded821210 sessionId=0 sessionPasswd=<null> context=0x7f2d54001470 flags=0
2016-01-25 17:44:54,444:5202(0x7f2d6e6fb700):ZOO_INFO@check_events@1703: initiated connection to server [10.xxx.xxx.25:2181]
2016-01-25 17:44:54,446:5202(0x7f2d6e6fb700):ZOO_INFO@check_events@1750: session establishment complete on server [10.xxx.xxx.25:2181], sessionId=0x15278112832012c, negotiated timeout=10000
I0125 17:44:54.447082  5439 group.cpp:331] Group process (group(1)@10.xxx.xxx.25:28249) connected to ZooKeeper
I0125 17:44:54.447120  5439 group.cpp:805] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0125 17:44:54.447140  5439 group.cpp:403] Trying to create path '/mesos' in ZooKeeper
I0125 17:44:54.448109  5439 detector.cpp:156] Detected a new leader: (id='28')
I0125 17:44:54.448246  5439 group.cpp:674] Trying to get '/mesos/json.info_0000000028' in ZooKeeper
I0125 17:44:54.448755  5440 detector.cpp:482] A new leading master ([email protected]:5050) is detected
I0125 17:44:54.448832  5440 sched.cpp:264] New master detected at [email protected]:5050
I0125 17:44:54.448977  5440 sched.cpp:274] No credentials provided. Attempting to register without authentication
I0125 17:44:54.449766  5440 sched.cpp:643] Framework registered with a636c17f-2b0d-46f7-9b15-5a3d6e9918a4-0003
16/01/25 17:44:54 INFO MesosSchedulerBackend: Registered as framework ID a636c17f-2b0d-46f7-9b15-5a3d6e9918a4-0003
16/01/25 17:44:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51820.
16/01/25 17:44:54 INFO NettyBlockTransferService: Server created on 51820
16/01/25 17:44:54 INFO BlockManagerMaster: Trying to register BlockManager
16/01/25 17:44:54 INFO BlockManagerMasterEndpoint: Registering block manager 10.xxx.xxx.25:51820 with 441.9 MB RAM, BlockManagerId(driver, 10.xxx.xxx.25, 51820)
16/01/25 17:44:54 INFO BlockManagerMaster: Registered BlockManager
16/01/25 17:44:55 INFO ForEachDStream: metadataCleanupDelay = -1
16/01/25 17:44:55 INFO DirectKafkaInputDStream: metadataCleanupDelay = -1
16/01/25 17:44:55 INFO DirectKafkaInputDStream: Slide time = 30000 ms
16/01/25 17:44:55 INFO DirectKafkaInputDStream: Storage level = StorageLevel(false, false, false, false, 1)
16/01/25 17:44:55 INFO DirectKafkaInputDStream: Checkpoint interval = null
16/01/25 17:44:55 INFO DirectKafkaInputDStream: Remember duration = 30000 ms
16/01/25 17:44:55 INFO DirectKafkaInputDStream: Initialized and validated org.apache.spark.streaming.kafka.DirectKafkaInputDStream@270235cb
16/01/25 17:44:55 INFO ForEachDStream: Slide time = 30000 ms
16/01/25 17:44:55 INFO ForEachDStream: Storage level = StorageLevel(false, false, false, false, 1)
16/01/25 17:44:55 INFO ForEachDStream: Checkpoint interval = null
16/01/25 17:44:55 INFO ForEachDStream: Remember duration = 30000 ms
16/01/25 17:44:55 INFO ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@219b66f
16/01/25 17:44:55 INFO RecurringTimer: Started timer for JobGenerator at time 1453715100000
16/01/25 17:44:55 INFO JobGenerator: Started JobGenerator at 1453715100000 ms
16/01/25 17:44:55 INFO JobScheduler: Started JobScheduler
16/01/25 17:44:55 INFO StreamingContext: StreamingContext started
16/01/25 17:45:00 INFO VerifiableProperties: Verifying properties
16/01/25 17:45:00 INFO VerifiableProperties: Property auto.commit.interval.ms is overridden to 1000
16/01/25 17:45:00 INFO VerifiableProperties: Property auto.offset.reset is overridden to smallest
16/01/25 17:45:00 INFO VerifiableProperties: Property group.id is overridden to bid_event_consumer_group_zk
16/01/25 17:45:00 INFO VerifiableProperties: Property zookeeper.connect is overridden to test-master:2181
16/01/25 17:45:00 INFO VerifiableProperties: Property zookeeper.session.timeout.ms is overridden to 400
16/01/25 17:45:00 INFO VerifiableProperties: Property zookeeper.sync.time.ms is overridden to 200
16/01/25 17:45:00 ERROR JobScheduler: Error generating jobs for time 1453715100000 ms
java.lang.IllegalArgumentException: requirement failed: numRecords must not be negative
    at scala.Predef$.require(Predef.scala:233)
    at org.apache.spark.streaming.scheduler.StreamInputInfo.<init>(InputInfoTracker.scala:38)
    at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:165)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:350)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:350)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:349)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:349)
    at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:342)
    at scala.Option.orElse(Option.scala:257)
    at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:339)
    at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
    at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120)
    at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
    at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:120)
    at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:247)
    at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:245)
    at scala.util.Try$.apply(Try.scala:161)
    at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:245)
    at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:181)
    at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
    at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: numRecords must not be negative
    at scala.Predef$.require(Predef.scala:233)
    at org.apache.spark.streaming.scheduler.StreamInputInfo.<init>(InputInfoTracker.scala:38)
    at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:165)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:350)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:350)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:349)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:349)
    at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:342)
    at scala.Option.orElse(Option.scala:257)
    at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:339)
    at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
    at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120)
    at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
    at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:120)
    at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:247)
    at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:245)
    at scala.util.Try$.apply(Try.scala:161)
    at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:245)
    at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:181)
    at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
    at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
16/01/25 17:45:00 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook
16/01/25 17:45:00 INFO JobGenerator: Stopping JobGenerator immediately
16/01/25 17:45:00 INFO RecurringTimer: Stopped timer for JobGenerator after time 1453715100000
16/01/25 17:45:00 INFO JobGenerator: Stopped JobGenerator
16/01/25 17:45:00 INFO JobScheduler: Stopped JobScheduler
16/01/25 17:45:00 INFO StreamingContext: StreamingContext stopped successfully
16/01/25 17:45:00 INFO SparkContext: Invoking stop() from shutdown hook
16/01/25 17:45:00 INFO SparkUI: Stopped Spark web UI at http://10.xxx.xxx.25:4040
16/01/25 17:45:00 INFO DAGScheduler: Stopping DAGScheduler
I0125 17:45:00.281819  5579 sched.cpp:1805] Asked to stop the driver
I0125 17:45:00.281951  5437 sched.cpp:1043] Stopping framework 'a636c17f-2b0d-46f7-9b15-5a3d6e9918a4-0003'
16/01/25 17:45:00 INFO MesosSchedulerBackend: driver.run() returned with code DRIVER_STOPPED
16/01/25 17:45:00 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/01/25 17:45:00 INFO MemoryStore: MemoryStore cleared
16/01/25 17:45:00 INFO BlockManager: BlockManager stopped
16/01/25 17:45:00 INFO BlockManagerMaster: BlockManagerMaster stopped
16/01/25 17:45:00 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/01/25 17:45:00 INFO SparkContext: Successfully stopped SparkContext
16/01/25 17:45:00 INFO ShutdownHookManager: Shutdown hook called
16/01/25 17:45:00 INFO ShutdownHookManager: Deleting directory /tmp/spark-442ac339-2ac7-427f-9e6b-5b5cb18a54dd
like image 475
giaosudau Avatar asked Jan 25 '16 09:01

giaosudau


2 Answers

I've experienced a problem like this very recently on a project with Kafka and Spark Streaming. What helped me in this situation is to delete checkpoint files of Spark Streaming manually and then start over again.

like image 79
user3791111 Avatar answered Oct 11 '22 12:10

user3791111


I just had the same problem getting data from a Kafka cluster for a specific topic, the problem is because I was consuming a topic using an offset higher than the biggest offset in the topic, this was because the offsets for some reason were reseted.

I have realized that when you try to consume data with a smaller offset compared with the ones in the topic, you will out of range error, something like:

org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions.

The error you are having is because you are trying to consume data with a higher offset, that is because removing the checkpoints files or removing the offsets from Zookeeper will remediate the error mentioned below, essentially you will tell to Kafka: "ignore the checkpoints or offset, start from the beginning".

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: numRecords must not be negative

The way go here is try to figure out why your offset consumer is higher than the biggest in Kafka topic and try to sync with them. If you simply want to start over, you can either (1) modify manually the offset from the folder where Kafka store them: set /consumers/offsets/groupid/topic/partition <new_offset_for_the_topic_partition> or (2) simply remove /consumers/offsets/groupid/topic/ and all consumers will start getting data from the earliest or oldest offset (will depends on your configuration).

like image 36
dbustosp Avatar answered Oct 11 '22 11:10

dbustosp