Ensuring the Spark NLP models in Model Hub generate the same embeddings as the Hugging Face referenced one #14535
-
Pardon my ignorance - as I'm very new to all this stuff, but I'm trying to verify the gte-small embedding model referenced in https://sparknlp.org/2023/08/15/gte_small_en.html is directly based on the one listed in https://huggingface.co/thenlper/gte-small. I downloaded both models and tried comparing (using diff) the ONNX files - but they are different. I installed the onnx PyPI library in an attempt to examine the metadata of the ONNX file from Spark NLP but it gave me an error - unlike the one from Hugging Face. Now, I'm guessing the Spark NLP was built/constructed (is that the right term?) directly from the Hugging Face one and have the same weights - and both should give the same exact embeddings for the same words/sentence, but how can I be sure? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi, |
Beta Was this translation helpful? Give feedback.
Hi,
They are the same model, however, we export these models the way we can use them internally. So they are not intended for public use outside Spark NLP. But they are the same weights with the same results, so if you need to use it outside Spark NLP you can use the original model.