-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Program stalls/can't download entire blog #8
Comments
If you can apply this patch, either by hand or with GNU patch (copy this to a text file, including the whitespace at the end, and run This assumes 10 seconds should be enough for everything to finish, but if you're more patient you could try changing the number on the diff --git a/tumblr_backup.py b/tumblr_backup.py
index d9fb4ea..292fbc7 100755
--- a/tumblr_backup.py
+++ b/tumblr_backup.py
@@ -1520,7 +1520,7 @@ class ThreadPool:
self.queue = LockedQueue(threading.RLock(), max_queue)
self.quit = threading.Event()
self.abort = threading.Event()
- self.threads = [threading.Thread(target=self.handler) for _ in range(thread_count)]
+ self.threads = [threading.Thread(target=self.handler, daemon=True) for _ in range(thread_count)]
for t in self.threads:
t.start()
@@ -1540,9 +1540,16 @@ class ThreadPool:
def cancel(self):
self.abort.set()
no_internet.destroy()
+
+ import traceback
+ timeout = time.time() + 10
for i, t in enumerate(self.threads, start=1):
logger.status('Stopping threads {}{}\r'.format(' ' * i, '.' * (len(self.threads) - i)))
- t.join()
+ t.join(max(1, timeout - time.time()))
+ for t in self.threads:
+ if t.is_alive():
+ print(t, 'is stuck')
+ traceback.print_stack(sys._current_frames()[t.ident])
logger.info('Backup canceled.\n')
|
Thank you for your response! Unfortunately, I haven't been able to recreate the issue because of a new one arising. The program will now get stuck with the message "DNS probe finished: No internet. Waiting...o finish", which is confusing because it will say this despite my computer being connected to the internet and being able to load websites. (Sorry that it said I marked as completed, I am apparently bad with websites too and accidentally marked that haha) |
Hm, that's weird. That would imply that your computer is somehow unable to reach Google DNS (8.8.8.8), which the script checks if a web request failed in case you don't have internet. Can you |
I didn't have any issues pinging/connecting to 8.8.8.8 with those commands |
For now you can bypass the check by adding a line to is_dns_working in util.py, like this: util.py | 1 +
1 file changed, 1 insertion(+)
diff --git a/util.py b/util.py
index 3bbd5c3..dfef1dc 100644
--- a/util.py
+++ b/util.py
@@ -97,6 +97,7 @@ DNS_QUERY = b'\xf1\xe1\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x06google\x03com\
def is_dns_working(timeout=None):
+ return True
try:
with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as sock:
if timeout is not None: I haven't decided what to do about this yet. I suppose having a way to specify an alternate DNS server or disable the feature entirely might be useful if Google DNS isn't available. I can't think of any reason why dig or nslookup would succeed but the check in the script would fail, unless your internet connection is so slow that it takes more than 5 seconds to get a reply - maybe an option to change the timeout would help? |
I just pushed 91d872a which provides a --skip-dns-check option you can use to work around that issue. Let me know if you run into anything else. |
I might be able to add more context, as it seems to be specific posts which throw the DNS error for me. A specific .[post id].html.[string] file will refuse to download after the error is thrown, and when I wait for all the other queued files to finish (so I can tell which one it is), get the post id, delete my reblog from Tumblr and rerun, then it continues until it hits the next one. I'm unsure what the posts have in common, but this one threw the error twice, once in a 2022 reblog and once in a 2021 reblog: https://www.tumblr.com/bunjywunjy/669018562974957568/petermorwood-caitlynlynch-the1920sinpictures |
This is known - the script only attempts to check for a working internet connection when some network request fails. I had assumed that basically everyone with a working internet connection should be able to send a DNS query to Google, but apparently this is not true - some people are simply unable to e.g. I think the only reason this DNS request would (falsely) fail would be if your internet connection is aggressively firewalled, e.g. becuase you are using a VPN client that tries to prevent leaks of DNS traffic onto the public internet. Does that apply to you? I suppose this should be changed to a simple HTTP request - perhaps a HEAD request to Tumblr's homepage. |
No, as far as I know my internet connection is completely VPN-free. |
I'm currently having the same problem as OP originally had when I try to backup my blog - it stalls at around 7700/51000. No DNS error messages on my end though. I assume it must be getting stuck on a particular post. Any thoughts on how I could try to bypass it? Would the same fixes suggested earlier in the thread be worth trying? |
Also seconding having the same problem as OP, my backup is getting consistently stuck at 25200/33725, all four times I've tried to backup the blog! I also tried by year and immediately get the stall once I try 2012. |
also having the same problem - on two of my sub-1k post sideblogs, everything was fine, but when i moved to back up the first of one of my more moderately-sized sideblogs, it started consistently stalling at 2250 to 2299 (of 4449 expected). |
I also have this issue when trying to run a larger blog, I had already run the command at first to backup the original posts and that worked fine, but when trying to backup all of it it became a lot slower and stalled. |
This hasn't occurred for me yet, but I made a version with stall detection that you can run if you are seeing this. From an e-mail I sent to one user:
|
Downloaded 3 days ago, have been trying since then and don't know what I'm doing wrong. I know pretty much nothing about Python or any coding language so this is all pretty new to me.
I've tried the all of these variations of the command:
tumblr_backup.py -i --save-video --save-audio --tag-index blog-name
tumblr_backup.py --save-video --save-audio --tag-index blog-name
tumblr_backup.py -i --save-video --save-audio --tag-index -p year blog-name
tumblr_backup.py -i --save-video --save-audio --tag-index -p year-month blog-name
tumblr_backup.py --save-video --save-audio --tag-index -p year-month blog-name
The first two would work at first, but eventually result in a stall, with a message like "downloading 7000 to 7050" that never moved again. I saw people saying this would be fixed with the -p command, so I tried that. It worked for most of my blog (2016 to 2020), but I got the same stall once I tried 2021. So then, I tried adding the month to the command. After some frustration with the program telling me "Stopping backup: Incremental backup complete, 0 posts backed up", I took out the -i command and that seemed to work. But now I am stuck again, this time on the message "Waiting for worker threads to finish." I don't know what's causing these stalls or how to fix them. I had seen some people saying it could be caused by the fancy/colored text offered in more recent Tumblr updates, but the post that seemed to stall one of my "year-month" attempts didn't have any of that, it was just an image.
The text was updated successfully, but these errors were encountered: