Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testnet upgrading to V13 fail after git pull + run_node.sh #13

Open
3 tasks done
powerpsy opened this issue Feb 25, 2024 · 4 comments
Open
3 tasks done

Testnet upgrading to V13 fail after git pull + run_node.sh #13

powerpsy opened this issue Feb 25, 2024 · 4 comments

Comments

@powerpsy
Copy link

Did you read the documentation and guides?

  • I have inspected the documentation.

Is there an existing issue?

  • I have searched the existing issues.

Description of the problem

I upgraded the testnet V12 to V13 as recommended: stop docker + launch run_node.sh. It worked. but no pruning option in env/files, no download of parityDB. Checking docs, I saw that it is recommended to git pull the aleph-zero-node/ directory.

After git pull, updating correct ports in env/ files, I saw pruning option. When I launched run_node.sh, it downloaded the 58Gb of data. Installation finished with "Testnest sucessfully updated" with a message to remove manually db/.

Nevertheless, the docker was not started, it was continuously restarting. (restarting (123)).
I stopped docker, updated it again, made a git pull, restarted the update to get the pruning option & downloading db.

when restarting I got the arror at the end of run_node.sh install:
docker: Error response from daemon: driver failed programming external connectivity on endpoint powerpsy (b8ac417cf3f0e13e5a128d77e91348b6b6bcd21993da9d46f55663d5765c2d3d): Error starting userland proxy: listen tcp4 0.0.0.0:31343: bind: address already in use.

closing docker, renaming the docker to another name to rebuild it (I had message that the container name was already used)
run_node.sh properly: I get an error on binding ports: Error starting userland proxy: listen tcp4 0.0.0.0:31343: bind: address already in use.

closing docker docker stop $(docker ps -a -q), rebooting,

2024-02-25 06:22:40.564  INFO main sc_cli:🏃 Aleph Node
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 ✌️  version 0.13.0-ebd98ce0c88
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 ❤️  by Cardinal:Aleph Zero Foundation, 2021-2024
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 📋 Chain specification: Aleph Zero Testnet
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 🏷️  Node name: powerpsy
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 👤 Role: AUTHORITY
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 💾 Database: ParityDb at /data/chains/testnet/paritydb/full
./docker_entrypoint.sh: line 141: CUSTOM_ARGS: unbound variable
CLI parameter --execution` has no effect anymore and will be removed in the future!

====================

Version: 0.13.0-ebd98ce0c88

   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/alloc/src/boxed.rs:1987:9
      std::panicking::rust_panic_with_hook
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:695:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:580:13
   3: std::sys_common::backtrace::rust_end_short_backtrace
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:150:18
   4: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   5: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   6: core::panicking::panic
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
   7: parity_db::column::HashColumn::get_in_index
   8: parity_db::column::HashColumn::get
   9: parity_db::db::Db::get
  10: <sc_client_db::parity_db::DbAdapter as sp_database::Database<H>>::get
  11: sc_state_db::StateDbSync<BlockHash,Key,D>::new
  12: sc_service::builder::new_db_backend
  13: aleph_node::service::new_partial
  14: aleph_node::service::new_authority
  15: aleph_node::main::{{closure}}::{{closure}}
  16: sc_cli:🏃:Runner<C>::run_node_until_exit
  17: aleph_node::main
  18: std::sys_common::backtrace::rust_begin_short_backtrace
  19: main
  20: <unknown>
  21: __libc_start_main
  22: _start

Thread 'main' panicked at 'called Option::unwrap() on a None value', /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parity-db-0.4.12/src/file.rs:130

This is a bug. Please report it at:

    docs.alephzero.org

Information on your setup.

ubuntu 22.04 LTS desktop freshly updated
intel 11900 + 64Go DDR4, 2To NVME SSD, B560M chipset

running testnet 12.2 (node powerpsy), upgrading to 13
latest docker install without root privilege, validator node

Steps to reproduce

  1. modify env/validator file (ports or anything)
  2. stop docker
  3. ./run_node.sh
  4. backup env/ files
  5. stop docker
  6. git pull
  7. modify env/validator_testnet (with working ports)
  8. ./run_node.sh

"docker ps -a" shows a restarting container

Did you attach relevant logs?

  • I have attached logs (if relevant).
@powerpsy
Copy link
Author

powerpsy commented Feb 25, 2024

One day after, I managed to make it work somehow after many trials:
stopping docker service, fresh reboot and launched
sudo ./run_node.sh with root privileges

It sent a load of warnings at start the time it was catching up with the downloaded database, then I was validating again (but in root !): my new additional validator on testnet is AZERO_STAKING_TESTNET.

I have stopped the docker, launched again ./run_node.sh and I get same error as before.

Remark: when it failed, I tried on a fresh new ubuntu server 22.04 a new node installation from scratch, creating a testnet address, using faucet, installing node under docker, mondiy ports, only paritybd was loaded and it works like a charm !!!
--> install from scratch is perfect ! but upgrading is not.

Good luck ! I hope you can find a global way to upgrade mainnet !

@piotrMocz
Copy link

Thank you for reporting and sorry for the problems — I’ll do my best to investigate and fix this.

@fixxxedpoint
Copy link

fixxxedpoint commented Feb 29, 2024

First part looks like the container was in some state like restarting and the script was unable to stop it and release its allocated ports. That PR should improve this scenario Cardinal-Cryptography/aleph-node-runner#63 . Second one looks more scary - it looks like a corrupted database. What is your disk quoata?

@powerpsy
Copy link
Author

I have 2Tb disk, only ubuntu 22.04 + aleph node installed on if.
I had ~500Gb used when I had all downloaded after update. So plenty of room still.

Note that I didn't obey the good practices after: I tried to run node with sudo privilege and it worked. I stopped all containers + stopped docker as well (systemctl stop). When I relaunch the script without sudo it does not work anymore, I have to stick with sudo now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants