Cannot find embeddings #3

econinomista · 2024-01-12T14:35:46Z

Hi, thank you so much for providing this code! Unfortunatetly I am having issues running SemScale. In Anaconda Promptshell I ran:

python scaler.py C:\Users\SemScale\embeddings\wiki.big-five.mapped.vec C:\Users\SemScale\datadir_test C:\Users\SemScale\output.txt

However, this always yields the error:
WARNING:tensorflow:From C:\Users\Documents\Python\envs\semscale\lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead. Error: File containing pre-trained word embeddings not found.

Is the embedding not working anymore? Thank you very much in advance for your help!!

The text was updated successfully, but these errors were encountered:

fedenanni · 2024-01-12T16:49:14Z

Hi! It seems there's an issue with the path of the embedding file. Could you check two things:

if the file is correctly in that folder. You should download it from here
If you need to write the path differently, for instance like this:
python scaler.py C:/Users/SemScale/embeddings/wiki.big-five.mapped.vec C:/Users/SemScale/datadir_test C:/Users/SemScale/output.txt
I don't have a Window PC with me, but it might be that it's simply something to do with the filepath specification.

econinomista · 2024-01-13T07:31:29Z

Thank you very much for the quick response! Unfortunately, I could not resolve the issue. After restarting my device, the path and file was found, however the programm tells me, that the embeddings file contains errors and I am unsure how to deal with that.

File "C:\Users\noske\Documents\Nikola\SemScale\embeddings\wiki.big-five.mapped.vec", line 9 en__' -0.17489 -0.13695 0.13345 -0.07282 0.038794 0.13294 0.0015304 -0.071056 -0.20026 -0.045437 -0.0019054 -0.17913 0.18241 -0.058909 -0.0088248 0.060522 0.1872 0.2255 -0.11638 0.080349 -0.33614 -0.035788 -0.21518 -0.062891 -0.1322 -0.09628 0.065516 0.16418 -0.014492 0.11139 -0.25025 0.25303 -0.20538 -0.027447 -0.18057 -0.13118 -0.36836 0.055097 0.23968 -0.17034 0.26393 0.30392 -0.18615 0.13712 -0.012511 0.11977 0.00017869 0.059385 -0.05704 -0.046391 0.012484 -0.067036 0.20004 -0.34513 -0.16117 -0.082885 -0.043013 0.031685 -0.01498 0.11803 0.068215 -0.18596 0.11503 -0.020593 -0.15533 0.031101 0.1294 0.038285 -0.075081 -0.095411 0.13559 -0.13448 -0.092657 -0.39257 -0.1617 -0.06562 0.069601 0.26207 -0.039711 0.39187 0.16218 0.053275 -0.066056 0.10139 -0.076679 -0.059841 -0.069376 0.21551 -0.029553 -0.123 0.011586 0.16999 0.17508 0.090918 0.10799 0.085566 -0.0042548 0.097031 0.18012 -0.24137 -0.1599 0.018539 -0.1056 -0.052341 -0.034019 -0.13327 -0.15889 0.033714 0.079085 -0.01673 0.062222 0.16459 -0.021192 0.014571 -0.017858 0.17836 0.13005 0.27747 0.056348 0.13513 0.4205 0.024011 0.18547 0.030009 0.119 -0.058 -0.092228 0.025134 0.003047 -0.024764 0.11025 0.21792 0.12071 0.26308 0.13265 0.058854 -0.36855 -0.04149 0.10599 0.25175 -0.028787 -0.043812 -0.036435 0.0089733 0.066932 0.1702 0.1665 0.094226 -0.14053 -0.18362 -0.035076 0.11685 -0.08793 -0.17653 -0.24763 0.12285 0.0053936 -0.048667 0.23958 0.17958 -0.21611 0.08723 -0.17605 0.17473 0.14182 0.081131 -0.087419 0.071543 0.21449 -0.061005 -0.07196 -0.23685 -0.11879 -0.0071595 -0.071583 0.049396 -0.02676 0.068993 0.0073673 -0.038216 0.16864 0.16553 0.01517 0.15875 -0.1054 0.05747 0.13809 -0.019921 0.36033 0.21684 0.063086 -0.11092 0.35303 0.30894 0.12569 -0.008461 0.25211 -0.073476 -0.442 0.022188 -0.0423 -0.018912 -0.15181 0.19475 0.043222 -0.23028 -0.25009 0.011266 0.14797 0.22005 0.40872 -0.13427 -0.18417 0.011872 -0.1966 -0.18597 0.13815 -0.22767 -0.17908 0.10512 -0.057826 0.071071 -0.23812 -0.0067891 0.036996 -0.029889 -0.17022 0.14456 0.040532 -0.029142 -0.012301 0.2311 -0.14316 -0.22666 -0.19614 0.15429 -0.023078 0.015926 -0.077029 0.065054 -0.30557 0.13245 0.068753 0.11286 0.14658 0.2298 0.18136 0.22165 0.1076 0.0045102 0.1825 0.10714 0.027691 0.13585 0.07148 0.033098 0.030476 -0.13848 0.23759 -0.26323 0.095756 0.15745 0.099187 0.013283 -0.030978 0.10267 0.030753 0.22487 -0.014633 -0.16486 -0.30891 0.0551 -0.15767 -0.11141 0.034447 -0.054475 0.33544 -0.0042994 0.27241 -0.15068 0.096341 0.14226 0.097858 0.00082821 -0.0092396 0.10388 0.18306 0.39652 0.21525 -0.01238 -0.040262 -0.1476 -0.0018151 -0.040134 -0.17208 -0.225 -0.18652 0.13567 0.20318 0.10497 ^ SyntaxError: unterminated string literal (detected at line 9)

fedenanni · 2024-01-16T11:58:46Z

Can you re-download the embeddings file making sure it is downloaded properly? (it seems the file is broken there). Note that the file size should be around 1.3G

econinomista · 2024-01-16T11:59:59Z

Yes, it is downloaded correctly and 1.3GB is also correct. Federico Nanni ***@***.***> schrieb am Di. 16. Jan. 2024 um 12:58:

…

Can you re-download the embeddings file making sure it is downloaded properly? (it seems the file is broken there). Note that the file size should be around 1.3G — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUAYD6OD7BUNG6EF2DFFSXLYOZTQDAVCNFSM6AAAAABBYHNVF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJTGYYDANJXGY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

fedenanni · 2024-01-16T12:20:36Z

From here it is a bit hard to debug. I have just reinstalled it all and it seems to be working for me using that input embedding file and the textual data from the online appendix

I'm tagging @irehbein because she might be working on Windows on this (I've just tested on Mac and Linux and in both cases it loaded embeddings just fine). Sorry, but it has been a long time since we last worked on this!

fedenanni · 2024-01-16T12:22:04Z

Ah - check the order of the commands! You should have:

input folder (where your documents sit)
embedding file
output file

Your examples has embeddings first and input folder second:

python scaler.py C:\Users\SemScale\embeddings\wiki.big-five.mapped.vec C:\Users\SemScale\datadir_test C:\Users\SemScale\output.txt

fedenanni · 2024-01-16T12:23:33Z

I just noticed that this is wrong in the documentation. Above we say the correct order, but here they are inverted! Sorry for this, I'll fix it now:

fedenanni · 2024-01-16T12:25:44Z

Fixed it - let me know if this works now:

econinomista · 2024-01-28T19:29:20Z

Thank you so much! This is working now. However, I am still having some issues with the application. I want to use semscale for a csv datafile, that I have containing tweets of German parliament politicians. Since it contains tweets of many years, I do have about a million txt files now. Therefore, I tried running the code a few times now, but it seems that due to memory limitations it is never able to finish. Do I see it correctly, that I need a txt file of every tweet in the beginning, starting with "de /n (text)"? And do you have any advice on how I could use the package more efficiently? Am Di., 16. Jan. 2024 um 13:25 Uhr schrieb Federico Nanni < ***@***.***>:

…

Fixed it - let me know if this works now: Screenshot.2024-01-16.at.12.25.24.png (view on web) <https://github.com/umanlp/SemScale/assets/8415204/f0c2c767-8c9b-4eb1-8aaa-d5bd1ff86feb> — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUAYD6MMMDNLZNATB3FTGVTYOZWVHAVCNFSM6AAAAABBYHNVF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJTGY2DENBSGY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

fedenanni · 2024-01-29T09:24:55Z

I see, maybe you could group tweets together by author to reduce the number of files. So one file for each user - this way you'll be scaling users, not single tweets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot find embeddings #3

Cannot find embeddings #3

econinomista commented Jan 12, 2024 •

edited

Loading

fedenanni commented Jan 12, 2024 •

edited

Loading

econinomista commented Jan 13, 2024

fedenanni commented Jan 16, 2024

econinomista commented Jan 16, 2024 via email

fedenanni commented Jan 16, 2024

fedenanni commented Jan 16, 2024 •

edited

Loading

fedenanni commented Jan 16, 2024

fedenanni commented Jan 16, 2024

econinomista commented Jan 28, 2024 via email

fedenanni commented Jan 29, 2024

Cannot find embeddings #3

Cannot find embeddings #3

Comments

econinomista commented Jan 12, 2024 • edited Loading

fedenanni commented Jan 12, 2024 • edited Loading

econinomista commented Jan 13, 2024

fedenanni commented Jan 16, 2024

econinomista commented Jan 16, 2024 via email

fedenanni commented Jan 16, 2024

fedenanni commented Jan 16, 2024 • edited Loading

fedenanni commented Jan 16, 2024

fedenanni commented Jan 16, 2024

econinomista commented Jan 28, 2024 via email

fedenanni commented Jan 29, 2024

econinomista commented Jan 12, 2024 •

edited

Loading

fedenanni commented Jan 12, 2024 •

edited

Loading

fedenanni commented Jan 16, 2024 •

edited

Loading