Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot find embeddings #3

Open
econinomista opened this issue Jan 12, 2024 · 10 comments
Open

Cannot find embeddings #3

econinomista opened this issue Jan 12, 2024 · 10 comments

Comments

@econinomista
Copy link

econinomista commented Jan 12, 2024

Hi, thank you so much for providing this code! Unfortunatetly I am having issues running SemScale. In Anaconda Promptshell I ran:

python scaler.py C:\Users\SemScale\embeddings\wiki.big-five.mapped.vec C:\Users\SemScale\datadir_test C:\Users\SemScale\output.txt

However, this always yields the error:
WARNING:tensorflow:From C:\Users\Documents\Python\envs\semscale\lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead. Error: File containing pre-trained word embeddings not found.

Is the embedding not working anymore? Thank you very much in advance for your help!!

@fedenanni
Copy link
Collaborator

fedenanni commented Jan 12, 2024

Hi! It seems there's an issue with the path of the embedding file. Could you check two things:

  1. if the file is correctly in that folder. You should download it from here
  2. If you need to write the path differently, for instance like this:
    python scaler.py C:/Users/SemScale/embeddings/wiki.big-five.mapped.vec C:/Users/SemScale/datadir_test C:/Users/SemScale/output.txt
    I don't have a Window PC with me, but it might be that it's simply something to do with the filepath specification.

@econinomista
Copy link
Author

Thank you very much for the quick response! Unfortunately, I could not resolve the issue. After restarting my device, the path and file was found, however the programm tells me, that the embeddings file contains errors and I am unsure how to deal with that.

File "C:\Users\noske\Documents\Nikola\SemScale\embeddings\wiki.big-five.mapped.vec", line 9 en__' -0.17489 -0.13695 0.13345 -0.07282 0.038794 0.13294 0.0015304 -0.071056 -0.20026 -0.045437 -0.0019054 -0.17913 0.18241 -0.058909 -0.0088248 0.060522 0.1872 0.2255 -0.11638 0.080349 -0.33614 -0.035788 -0.21518 -0.062891 -0.1322 -0.09628 0.065516 0.16418 -0.014492 0.11139 -0.25025 0.25303 -0.20538 -0.027447 -0.18057 -0.13118 -0.36836 0.055097 0.23968 -0.17034 0.26393 0.30392 -0.18615 0.13712 -0.012511 0.11977 0.00017869 0.059385 -0.05704 -0.046391 0.012484 -0.067036 0.20004 -0.34513 -0.16117 -0.082885 -0.043013 0.031685 -0.01498 0.11803 0.068215 -0.18596 0.11503 -0.020593 -0.15533 0.031101 0.1294 0.038285 -0.075081 -0.095411 0.13559 -0.13448 -0.092657 -0.39257 -0.1617 -0.06562 0.069601 0.26207 -0.039711 0.39187 0.16218 0.053275 -0.066056 0.10139 -0.076679 -0.059841 -0.069376 0.21551 -0.029553 -0.123 0.011586 0.16999 0.17508 0.090918 0.10799 0.085566 -0.0042548 0.097031 0.18012 -0.24137 -0.1599 0.018539 -0.1056 -0.052341 -0.034019 -0.13327 -0.15889 0.033714 0.079085 -0.01673 0.062222 0.16459 -0.021192 0.014571 -0.017858 0.17836 0.13005 0.27747 0.056348 0.13513 0.4205 0.024011 0.18547 0.030009 0.119 -0.058 -0.092228 0.025134 0.003047 -0.024764 0.11025 0.21792 0.12071 0.26308 0.13265 0.058854 -0.36855 -0.04149 0.10599 0.25175 -0.028787 -0.043812 -0.036435 0.0089733 0.066932 0.1702 0.1665 0.094226 -0.14053 -0.18362 -0.035076 0.11685 -0.08793 -0.17653 -0.24763 0.12285 0.0053936 -0.048667 0.23958 0.17958 -0.21611 0.08723 -0.17605 0.17473 0.14182 0.081131 -0.087419 0.071543 0.21449 -0.061005 -0.07196 -0.23685 -0.11879 -0.0071595 -0.071583 0.049396 -0.02676 0.068993 0.0073673 -0.038216 0.16864 0.16553 0.01517 0.15875 -0.1054 0.05747 0.13809 -0.019921 0.36033 0.21684 0.063086 -0.11092 0.35303 0.30894 0.12569 -0.008461 0.25211 -0.073476 -0.442 0.022188 -0.0423 -0.018912 -0.15181 0.19475 0.043222 -0.23028 -0.25009 0.011266 0.14797 0.22005 0.40872 -0.13427 -0.18417 0.011872 -0.1966 -0.18597 0.13815 -0.22767 -0.17908 0.10512 -0.057826 0.071071 -0.23812 -0.0067891 0.036996 -0.029889 -0.17022 0.14456 0.040532 -0.029142 -0.012301 0.2311 -0.14316 -0.22666 -0.19614 0.15429 -0.023078 0.015926 -0.077029 0.065054 -0.30557 0.13245 0.068753 0.11286 0.14658 0.2298 0.18136 0.22165 0.1076 0.0045102 0.1825 0.10714 0.027691 0.13585 0.07148 0.033098 0.030476 -0.13848 0.23759 -0.26323 0.095756 0.15745 0.099187 0.013283 -0.030978 0.10267 0.030753 0.22487 -0.014633 -0.16486 -0.30891 0.0551 -0.15767 -0.11141 0.034447 -0.054475 0.33544 -0.0042994 0.27241 -0.15068 0.096341 0.14226 0.097858 0.00082821 -0.0092396 0.10388 0.18306 0.39652 0.21525 -0.01238 -0.040262 -0.1476 -0.0018151 -0.040134 -0.17208 -0.225 -0.18652 0.13567 0.20318 0.10497 ^ SyntaxError: unterminated string literal (detected at line 9)

@fedenanni
Copy link
Collaborator

Can you re-download the embeddings file making sure it is downloaded properly? (it seems the file is broken there). Note that the file size should be around 1.3G

@econinomista
Copy link
Author

econinomista commented Jan 16, 2024 via email

@fedenanni
Copy link
Collaborator

From here it is a bit hard to debug. I have just reinstalled it all and it seems to be working for me using that input embedding file and the textual data from the online appendix
Screenshot 2024-01-16 at 12 17 55

I'm tagging @irehbein because she might be working on Windows on this (I've just tested on Mac and Linux and in both cases it loaded embeddings just fine). Sorry, but it has been a long time since we last worked on this!

@fedenanni
Copy link
Collaborator

fedenanni commented Jan 16, 2024

Ah - check the order of the commands! You should have:

  • input folder (where your documents sit)
  • embedding file
  • output file

Your examples has embeddings first and input folder second:

python scaler.py C:\Users\SemScale\embeddings\wiki.big-five.mapped.vec C:\Users\SemScale\datadir_test C:\Users\SemScale\output.txt

@fedenanni
Copy link
Collaborator

I just noticed that this is wrong in the documentation. Above we say the correct order, but here they are inverted! Sorry for this, I'll fix it now:

Screenshot 2024-01-16 at 12 22 48

@fedenanni
Copy link
Collaborator

Fixed it - let me know if this works now:

Screenshot 2024-01-16 at 12 25 24

@econinomista
Copy link
Author

econinomista commented Jan 28, 2024 via email

@fedenanni
Copy link
Collaborator

I see, maybe you could group tweets together by author to reduce the number of files. So one file for each user - this way you'll be scaling users, not single tweets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants