Describe the bug

After restarting the meta node the StreamingClient is unable to connect. Node restarts in line 495 (see tmp.log file below).

This node currently is a leader. Serving at 192.168.1.1:5690

The node is still serving on the same address as before. Retrieving a StreamClient results in a Transport error: transport error. Error is caused by below code in the observer_manager.rs

/// `re_subscribe` is used to re-subscribe to the meta's notification.
async fn re_subscribe(&mut self)

Entire log in tmp.log. I added a few extra logs in the log in the tmp.log.

  backtrace of `MetaError`:
   0: std::backtrace_rs::backtrace::libunwind::trace
             at /rustc/b8c35ca26b191bb9a9ac669a4b3f4d3d52d97fb1/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   1: std::backtrace_rs::backtrace::trace_unsynchronized
             at /rustc/b8c35ca26b191bb9a9ac669a4b3f4d3d52d97fb1/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2: std::backtrace::Backtrace::create
             at /rustc/b8c35ca26b191bb9a9ac669a4b3f4d3d52d97fb1/library/std/src/backtrace.rs:333:13
   3: <risingwave_meta::error::MetaError as core::convert::From<risingwave_meta::error::MetaErrorInner>>::from
             at ./src/meta/src/error.rs:66:33
   4: <T as core::convert::Into<U>>::into
             at /rustc/b8c35ca26b191bb9a9ac669a4b3f4d3d52d97fb1/library/core/src/convert/mod.rs:726:9
   5: <risingwave_meta::error::MetaError as core::convert::From<risingwave_rpc_client::error::RpcError>>::from
             at ./src/meta/src/error.rs:135:9
   6: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /rustc/b8c35ca26b191bb9a9ac669a4b3f4d3d52d97fb1/library/core/src/result.rs:2108:27
   7: risingwave_meta::barrier::recovery::<impl risingwave_meta::barrier::GlobalBarrierManager<S>>::reset_compute_nodes::{{closure}}
             at ./src/meta/src/barrier/recovery.rs:346:9
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/b8c35ca26b191bb9a9ac669a4b3f4d3d52d97fb1/library/core/src/future/mod.rs:91:19
   9: risingwave_meta::barrier::recovery::<impl risingwave_meta::barrier::GlobalBarrierManager<S>>::recovery::{{closure}}::{{closure}}::{{closure}}
             at ./src/meta/src/barrier/recovery.rs:138:44
  10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/b8c35ca26b191bb9a9ac669a4b3f4d3d52d97fb1/library/core/src/future/mod.rs:91:19
 ...

To Reproduce

  • Checkout 7bdef7c53f3516b0d8d1c35721b9dabce821b369 on branch jm/multi-meta-etcd-grafana
MADSIM_CONFIG_HASH=C2F7CAF3EA64B636  MADSIM_TEST_SEED=167077265649007 RUST_BACKTRACE=1   RUST_LOG=info  risedev sslt -- --kill-rate=0.1 --kill "e2e_test/**/**/*.slt" > tmp.log 2>&1  ; code tmp.log

Expected behavior

No response

Additional context

tmp.log

0

Please rebase the main branch, it might have been resolved in #6709 .

0

Thank you very much for the hint. The error message definitely changed, but we still run into a panic. I will keep this bug open for now. Not sure if it is the same issue, though

0
© 2022 pullanswer.com - All rights reserved.