Ensuring the Spark NLP models in Model Hub generate the same embeddings as the Hugging Face referenced one #14535

lsli8888 · 2025-03-17T06:16:10Z

lsli8888
Mar 17, 2025

Pardon my ignorance - as I'm very new to all this stuff, but I'm trying to verify the gte-small embedding model referenced in https://sparknlp.org/2023/08/15/gte_small_en.html is directly based on the one listed in https://huggingface.co/thenlper/gte-small.

I downloaded both models and tried comparing (using diff) the ONNX files - but they are different. I installed the onnx PyPI library in an attempt to examine the metadata of the ONNX file from Spark NLP but it gave me an error - unlike the one from Hugging Face.

Now, I'm guessing the Spark NLP was built/constructed (is that the right term?) directly from the Hugging Face one and have the same weights - and both should give the same exact embeddings for the same words/sentence, but how can I be sure?

Answered by maziyarpanahi

Mar 17, 2025

Hi,
They are the same model, however, we export these models the way we can use them internally. So they are not intended for public use outside Spark NLP. But they are the same weights with the same results, so if you need to use it outside Spark NLP you can use the original model.

View full answer

maziyarpanahi · 2025-03-17T07:54:25Z

maziyarpanahi
Mar 17, 2025
Maintainer

Hi,
They are the same model, however, we export these models the way we can use them internally. So they are not intended for public use outside Spark NLP. But they are the same weights with the same results, so if you need to use it outside Spark NLP you can use the original model.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensuring the Spark NLP models in Model Hub generate the same embeddings as the Hugging Face referenced one #14535

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Ensuring the Spark NLP models in Model Hub generate the same embeddings as the Hugging Face referenced one #14535

lsli8888 Mar 17, 2025

Replies: 1 comment

maziyarpanahi Mar 17, 2025 Maintainer

lsli8888
Mar 17, 2025

maziyarpanahi
Mar 17, 2025
Maintainer