Hi All,
I am getting below exception while trying to load data in Scylla DB using PySpark Application (running in AWS EMR) -
Traceback (most recent call last):
File “/mnt/tmp/aip-workflows/scylla-load/src/s3-to-scylla.py”, line 215, in
source_json.write.format(cassandra_write_format).mode(‘append’).options(
File “/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py”, line 1461, in save
File “/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py”, line 1322, in call
File “/usr/lib/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py”, line 179, in deco
File “/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py”, line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o115.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Authorized committer (attemptNumber=0, stage=2, partition=2133) failed; but task commit success, data duplication may happen. reason=ExceptionFailure(java.io.IOException,Failed to write statements to oeidp_artemis.json_content. The
latest exception was
Cassandra failure during write query at consistency ALL (2 responses were required but only 1 replica responded, 1 failed)
Please check the executor logs for more exceptions and information
,[Ljava.lang.StackTraceElement;@29602c51,java.io.IOException: Failed to write statements to oeidp_artemis.json_content. The
latest exception was
Cassandra failure during write query at consistency ALL (2 responses were required but only 1 replica responded, 1 failed)
Please check the executor logs for more exceptions and information
at com.datastax.spark.connector.writer.AsyncStatementWriter.$anonfun$close$2(TableWriter.scala:282)
at scala.Option.map(Option.scala:230)
at com.datastax.spark.connector.writer.AsyncStatementWriter.close(TableWriter.scala:277)
at com.datastax.spark.connector.datasource.CassandraDriverDataWriter.commit(CasssandraDriverDataWriterFactory.scala:46)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteToDataSourceV2Exec.scala:459)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1409)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:486)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:143)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:629)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:632)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Suppressed: java.io.IOException: Failed to write statements to oeidp_artemis.json_content. The
latest exception was
Cassandra failure during write query at consistency ALL (2 responses were required but only 1 replica responded, 1 failed)
Please check the executor logs for more exceptions and information
at com.datastax.spark.connector.writer.AsyncStatementWriter.$anonfun$close$2(TableWriter.scala:282)
at scala.Option.map(Option.scala:230)
at com.datastax.spark.connector.writer.AsyncStatementWriter.close(TableWriter.scala:277)
at com.datastax.spark.connector.datasource.CassandraDriverDataWriter.abort(CasssandraDriverDataWriterFactory.scala:51)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$6(WriteToDataSourceV2Exec.scala:482)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1420)
... 15 more
Suppressed: java.io.IOException: Failed to write statements to oeidp_artemis.json_content. The
latest exception was
Cassandra failure during write query at consistency ALL (2 responses were required but only 1 replica responded, 1 failed)
Please check the executor logs for more exceptions and information
at com.datastax.spark.connector.writer.AsyncStatementWriter.$anonfun$close$2(TableWriter.scala:282)
at scala.Option.map(Option.scala:230)
at com.datastax.spark.connector.writer.AsyncStatementWriter.close(TableWriter.scala:277)
at com.datastax.spark.connector.datasource.CassandraDriverDataWriter.close(CasssandraDriverDataWriterFactory.scala:56)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$9(WriteToDataSourceV2Exec.scala:486)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1431)
... 15 more
,Some(org.apache.spark.ThrowableSerializationWrapper@7c4eed1c),Vector(AccumulableInfo(83,None,Some(64134),None,false,true,None), AccumulableInfo(85,None,Some(0),None,false,true,None), AccumulableInfo(86,None,Some(10),None,false,true,None), AccumulableInfo(112,None,Some(150024615),None,false,true,None), AccumulableInfo(113,None,Some(19624),None,false,true,None)),Vector(LongAccumulator(id: 83, name: Some(internal.metrics.executorRunTime), value: 64134), LongAccumulator(id: 85, name: Some(internal.metrics.resultSize), value: 0), LongAccumulator(id: 86, name: Some(internal.metrics.jvmGCTime), value: 10), LongAccumulator(id: 112, name: Some(internal.metrics.input.bytesRead), value: 150024615), LongAccumulator(id: 113, name: Some(internal.metrics.input.recordsRead), value: 19624)),WrappedArray(3078441872, 162501632, 0, 0, 35523537, 0, 35523537, 0, 290824738, 0, 0, 0, 0, 0, 0, 0, 21, 1468, 8, 1076, 2544))
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3067)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3003)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3002)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3002)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1(DAGScheduler.scala:1311)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleStageFailed$1$adapted(DAGScheduler.scala:1311)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleStageFailed(DAGScheduler.scala:1311)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3268)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3205)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3194)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1041)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2406)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:385)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:359)
at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:225)
at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:337)
at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:336)
at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:225)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:113)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:255)
at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:129)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:165)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:255)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$8(SQLExecution.scala:165)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:276)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:164)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:70)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:101)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:503)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:503)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:33)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:33)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:33)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:479)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:101)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:88)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:151)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:859)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:312)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:248)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:840)
Also I am unable to find any error log path for Scylla DB installed in CentOS system. Can you please let me know possible solution for this issue?
Though other table loads are running fine, facing issue for one of the table only.