Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8comic下載問題回報 #391

Open
rickchen16 opened this issue Nov 16, 2024 · 6 comments
Open

8comic下載問題回報 #391

rickchen16 opened this issue Nov 16, 2024 · 6 comments

Comments

@rickchen16
Copy link

今天下載了最新版的ComicCrawler
但下載8comic漫畫還是有錯

漫畫網址:
https://www.8comic.com/html/13736.html

錯誤:
Traceback (most recent call last):
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop
process()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download
crawler.init()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init
self.init_images(self.ep.current_page - 1)
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images
self.get_images()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images
images = self.mod.get_images(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images
j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
wait 10 seconds...

@eight04
Copy link
Owner

eight04 commented Nov 19, 2024

是哪一話?

@rickchen16
Copy link
Author

https://www.8comic.com/html/13736.html
看起來第0話就下載失敗了
我打開網址點開第0話
網址會是
https://articles.onemoreplace.tw/online/new-13736.html?ch=0

total 305 episode.
Downloading ep 00話
Traceback (most recent call last):
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop
process()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download
crawler.init()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init
self.init_images(self.ep.current_page - 1)
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images
self.get_images()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images
images = self.mod.get_images(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images
j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
wait 10 seconds...
Traceback (most recent call last):
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop
process()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download
crawler.init()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init
self.init_images(self.ep.current_page - 1)
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images
self.get_images()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images
images = self.mod.get_images(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images
j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
wait 20 seconds...
Traceback (most recent call last):
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop
process()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download
crawler.init()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init
self.init_images(self.ep.current_page - 1)
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images
self.get_images()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images
images = self.mod.get_images(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images
j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
wait 40 seconds...
Traceback (most recent call last):
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop
process()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download
crawler.init()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init
self.init_images(self.ep.current_page - 1)
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images
self.get_images()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images
images = self.mod.get_images(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images
j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
wait 80 seconds...
Traceback (most recent call last):
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop
process()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download
crawler.init()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init
self.init_images(self.ep.current_page - 1)
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images
self.get_images()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images
images = self.mod.get_images(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images
j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
Something bad happened, skip the episode.
Downloading ep 01話
Traceback (most recent call last):
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 393, in error_loop
process()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 342, in download
crawler.init()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 58, in init
self.init_images(self.ep.current_page - 1)
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 65, in init_images
self.get_images()
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\crawler.py", line 235, in get_images
images = self.mod.get_images(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\LIMIT\AppData\Local\Programs\Python\Python311\Lib\site-packages\comiccrawler\mods\eight.py", line 76, in get_images
j_js = re.search(r'src="([^"]/j.js[^"])"', html).group(1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'group'
wait 10 seconds...

@eight04
Copy link
Owner

eight04 commented Dec 6, 2024

我這裡測試正常。檢查一下第0話的原始碼,有沒有這段︰
圖片

@rickchen16
Copy link
Author

我這裡測試正常。檢查一下第0話的原始碼,有沒有這段︰ 圖片

我用chrome 無痕視窗 實驗
如果由
https://www.8comic.com/html/13736.html
點0話打開
https://articles.onemoreplace.tw/online/new-13736.html?ch=0
會正常開啟0話有漫畫圖片
image
檢視網頁原始碼, 是會有這段原始碼的
image

但如果我複製網址
https://articles.onemoreplace.tw/online/new-13736.html?ch=0
直接無痕開新分頁
貼上此網址
打開畫面會是
image
而不是正常漫畫畫面
此時檢視網頁原始碼, 就不會有這段了

我猜測comiccrawler抓到的html就是第2個不是漫畫頁面的狀況

@eight04
Copy link
Owner

eight04 commented Dec 9, 2024

試試開啟 errorlog︰

  1. 在 setting.ini 裡,設定 errorlog = true
  2. 開啟 comiccrawler,開始下載
  3. 看到錯誤後,關閉 comiccrawler
  4. 網路請求的結果會寫進 setting.ini 旁的 grabber.log

如果可以編輯程式碼,可以找到 eight.py 做以下修改︰

diff --git a/comiccrawler/mods/eight.py b/comiccrawler/mods/eight.py
index 815e10a..ffc57ef 100644
--- a/comiccrawler/mods/eight.py
+++ b/comiccrawler/mods/eight.py
@@ -71,6 +71,9 @@ j_js = ""
 lazy_js = ""
 	
 def get_images(html, url):
+	import pathlib
+	pathlib.Path("8comic.html").write_text(html, encoding="utf-8")
+
 	global j_js
 	if not j_js:
 		j_js = re.search(r'src="([^"]*/j\.js[^"]*)"', html).group(1)

這樣在發生錯誤時,就會把HTML原始碼寫進 8comic.html

@rickchen16
Copy link
Author

rickchen16 commented Dec 14, 2024

pathlib.Path("8comic.html").write_text(html, encoding="utf-8")

grabber.log

8comic.html
是空的
我就不附檔案,改附圖了
image

我有另外印
crawler.py裡get_html和get_images裡拿到的資訊
crawler.py呼叫完self.downloader.html
self.html還是空的
所以eight.py裡的get_images html也是空的
這和我直接把https://articles.onemoreplace.tw/online/new-13736.html?ch=0 貼到瀏覽器無痕視窗看到的不一樣

Start downloading 炎炎之消防隊-無限-8comic
total 305 episode.
Downloading ep 00話
[crawler.py][get_html]self.ep.current_url https://articles.onemoreplace.tw/online/new-13736.html?ch=0
[crawler.py][get_html]self.mission.url https://8comic.com/html/13736.html
[crawler.py][get_html]self.html

[crawler.py][get_images]self.html

[eight.py][get_images]html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants