Testnet upgrading to V13 fail after git pull + run_node.sh #13

powerpsy · 2024-02-25T16:54:06Z

Did you read the documentation and guides?

I have inspected the documentation.

Is there an existing issue?

I have searched the existing issues.

Description of the problem

I upgraded the testnet V12 to V13 as recommended: stop docker + launch run_node.sh. It worked. but no pruning option in env/files, no download of parityDB. Checking docs, I saw that it is recommended to git pull the aleph-zero-node/ directory.

After git pull, updating correct ports in env/ files, I saw pruning option. When I launched run_node.sh, it downloaded the 58Gb of data. Installation finished with "Testnest sucessfully updated" with a message to remove manually db/.

Nevertheless, the docker was not started, it was continuously restarting. (restarting (123)).
I stopped docker, updated it again, made a git pull, restarted the update to get the pruning option & downloading db.

when restarting I got the arror at the end of run_node.sh install:
docker: Error response from daemon: driver failed programming external connectivity on endpoint powerpsy (b8ac417cf3f0e13e5a128d77e91348b6b6bcd21993da9d46f55663d5765c2d3d): Error starting userland proxy: listen tcp4 0.0.0.0:31343: bind: address already in use.

closing docker, renaming the docker to another name to rebuild it (I had message that the container name was already used)
run_node.sh properly: I get an error on binding ports: Error starting userland proxy: listen tcp4 0.0.0.0:31343: bind: address already in use.

closing docker docker stop $(docker ps -a -q), rebooting,

2024-02-25 06:22:40.564  INFO main sc_cli:🏃 Aleph Node
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 ✌️  version 0.13.0-ebd98ce0c88
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 ❤️  by Cardinal:Aleph Zero Foundation, 2021-2024
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 📋 Chain specification: Aleph Zero Testnet
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 🏷️  Node name: powerpsy
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 👤 Role: AUTHORITY
2024-02-25 06:22:40.564  INFO main sc_cli:🏃 💾 Database: ParityDb at /data/chains/testnet/paritydb/full
./docker_entrypoint.sh: line 141: CUSTOM_ARGS: unbound variable
CLI parameter --execution` has no effect anymore and will be removed in the future!

====================

Version: 0.13.0-ebd98ce0c88

   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/alloc/src/boxed.rs:1987:9
      std::panicking::rust_panic_with_hook
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:695:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:580:13
   3: std::sys_common::backtrace::rust_end_short_backtrace
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:150:18
   4: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   5: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   6: core::panicking::panic
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
   7: parity_db::column::HashColumn::get_in_index
   8: parity_db::column::HashColumn::get
   9: parity_db::db::Db::get
  10: <sc_client_db::parity_db::DbAdapter as sp_database::Database<H>>::get
  11: sc_state_db::StateDbSync<BlockHash,Key,D>::new
  12: sc_service::builder::new_db_backend
  13: aleph_node::service::new_partial
  14: aleph_node::service::new_authority
  15: aleph_node::main::{{closure}}::{{closure}}
  16: sc_cli:🏃:Runner<C>::run_node_until_exit
  17: aleph_node::main
  18: std::sys_common::backtrace::rust_begin_short_backtrace
  19: main
  20: <unknown>
  21: __libc_start_main
  22: _start

Thread 'main' panicked at 'called Option::unwrap() on a None value', /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parity-db-0.4.12/src/file.rs:130

This is a bug. Please report it at:

    docs.alephzero.org

Information on your setup.

ubuntu 22.04 LTS desktop freshly updated
intel 11900 + 64Go DDR4, 2To NVME SSD, B560M chipset

running testnet 12.2 (node powerpsy), upgrading to 13
latest docker install without root privilege, validator node

Steps to reproduce

modify env/validator file (ports or anything)
stop docker
./run_node.sh
backup env/ files
stop docker
git pull
modify env/validator_testnet (with working ports)
./run_node.sh

"docker ps -a" shows a restarting container

Did you attach relevant logs?

I have attached logs (if relevant).

The text was updated successfully, but these errors were encountered:

powerpsy · 2024-02-25T17:00:02Z

One day after, I managed to make it work somehow after many trials:
stopping docker service, fresh reboot and launched
sudo ./run_node.sh with root privileges

It sent a load of warnings at start the time it was catching up with the downloaded database, then I was validating again (but in root !): my new additional validator on testnet is AZERO_STAKING_TESTNET.

I have stopped the docker, launched again ./run_node.sh and I get same error as before.

Remark: when it failed, I tried on a fresh new ubuntu server 22.04 a new node installation from scratch, creating a testnet address, using faucet, installing node under docker, mondiy ports, only paritybd was loaded and it works like a charm !!!
--> install from scratch is perfect ! but upgrading is not.

Good luck ! I hope you can find a global way to upgrade mainnet !

piotrMocz · 2024-02-26T09:48:08Z

Thank you for reporting and sorry for the problems — I’ll do my best to investigate and fix this.

fixxxedpoint · 2024-02-29T15:45:26Z

First part looks like the container was in some state like restarting and the script was unable to stop it and release its allocated ports. That PR should improve this scenario Cardinal-Cryptography/aleph-node-runner#63 . Second one looks more scary - it looks like a corrupted database. What is your disk quoata?

powerpsy · 2024-02-29T18:22:16Z

I have 2Tb disk, only ubuntu 22.04 + aleph node installed on if.
I had ~500Gb used when I had all downloaded after update. So plenty of room still.

Note that I didn't obey the good practices after: I tried to run node with sudo privilege and it worked. I stopped all containers + stopped docker as well (systemctl stop). When I relaunch the script without sudo it does not work anymore, I have to stick with sudo now.

fixxxedpoint mentioned this issue Feb 26, 2024

improved stop-container procedure Cardinal-Cryptography/aleph-node-runner#63

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testnet upgrading to V13 fail after git pull + run_node.sh #13

Testnet upgrading to V13 fail after git pull + run_node.sh #13

powerpsy commented Feb 25, 2024

powerpsy commented Feb 25, 2024 •

edited

Loading

piotrMocz commented Feb 26, 2024

fixxxedpoint commented Feb 29, 2024 •

edited

Loading

powerpsy commented Feb 29, 2024

Testnet upgrading to V13 fail after git pull + run_node.sh #13

Testnet upgrading to V13 fail after git pull + run_node.sh #13

Comments

powerpsy commented Feb 25, 2024

Did you read the documentation and guides?

Is there an existing issue?

Description of the problem

Information on your setup.

Steps to reproduce

Did you attach relevant logs?

powerpsy commented Feb 25, 2024 • edited Loading

piotrMocz commented Feb 26, 2024

fixxxedpoint commented Feb 29, 2024 • edited Loading

powerpsy commented Feb 29, 2024

powerpsy commented Feb 25, 2024 •

edited

Loading

fixxxedpoint commented Feb 29, 2024 •

edited

Loading