Skip to content

fix: parallel peer requests with dht over bitswap #773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

TheGreatAlgo
Copy link

@TheGreatAlgo TheGreatAlgo commented Apr 16, 2025

Background

We have a large amount of climate data hosted on a bunch of our nodes that we know will always connect to via bootstrap.
We have not yet published to the DHT and right now just focused on uploading those datasets. Because this data is not uploaded to the DHT we are unable to connect and download the data in the method helia expects. Previously Helia would rely only on the DHT even though the nodes we dialed on initialization 100% has this data. This forced us to rely on the http-gateway and not a direct node connection via websockets.

Using Helia with this fix now, we can fetch and visualize climate data directly from our nodes with datasets that are Terabytes in size and get speeds of around 8MB/s

Screenshot 2025-04-16 at 10 49 42 AM

Description

This fix works by requesting the directly connected peers in parallel to the dht request. This ensures that even if the directly connected peers do not have the data, the dht will still be used. It also has the added benefit that for large amounts of data where DHT updates cause issues, we can still request the data if we know the nodes the data is likely to be on.

This also mirrors the Kubo functionality which also requests the data from directly connected peers instead of relying solely on the DHT.

Notes & open questions

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if necessary (this includes comments as well)
  • I have added relevant tests

@TheGreatAlgo TheGreatAlgo requested a review from a team as a code owner April 16, 2025 14:53
@Faolain
Copy link

Faolain commented Apr 16, 2025

For additional context here, nodes other than our boostrap nodes provide data as well where in our platform the same behavior is expected as Kubo. In other words, within the browser, a user can dial their node's websockets multiaddr and the CID can be discovered and retrieved without them needing to publish via the DHT. This reduces reliance on DHT and improves resilience in data retrieval.

Copy link
Member

@achingbrain achingbrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening this.

From what I can see the old version:

  1. Queries the routing for providers but does not wait for any results
  2. Adds the CID to the want list and awaits the returned promise
  3. All connected peers receive the wantlist, and as new providers are found they are connected to so they also receive the wantlist

The new version:

  1. Adds the block to the wantlist but does not wait for the result
  2. Starts the same routing query, again does not wait for the result
  3. Awaits the wantlist result

Unless I'm misreading it, it does the same thing as before just in a different order. Indeed the test that has been added passes against main so the changes made to bitswap.ts may not be necessary?

@achingbrain
Copy link
Member

This also mirrors the Kubo functionality which also requests the data from directly connected peers instead of relying solely on the DHT.

It's worth noting this is Helia's default behaviour too. The difference is, in the example linked to from here @helia/verified-fetch is used, which defaults to sessions, which currently start by making the routing query instead of using connected peers.

An alternate solution here may be merging #777 and then internally starting each session with connected peers instead of going straight to the routing. This will negate the initial benefit of sessions (e.g. reduced chattiness) but will make the behaviour more consistent.


One thing about relying on pulling data from connected peers is that there's nothing to say you will remain connected to those peers - if your connection pauses or your node is otherwise disconnected, you may not re-connect to them so it may not be completely reliable since without telling Helia that the peers have the data ahead of time (either by passing them as providers or having them be returned from a routing query), it's just coincidence that they can supply blocks from your want list.

@TheGreatAlgo
Copy link
Author

So I am not to familiar with the repo, tried the best to understand it. So also maybe the test doesn't full capture the issue as well.

The changing of the order is important here as It sends a want request immediately to its connected peers while also trying to traverse the DHT for the response. If the connected nodes have the data, it will respond immediately and not rely on the DHT. Looking back, I do see what you mean as I await only the wantlist so that appears to be an error on my side. It should be trying to fetch from both. Although I think my thinking was maybe that as new providers are found and connected, the wantlist will eventually return a cid as it traverses the dht finding providers.

If you think #777 is a better idea then works for me. Our main goal is to give a list of peers which we know has the CIDs we are querying from (to help improve speed as well given our hamt structure), and to not fully rely on the DHT since we know the peers which have it. The one downside is we have to pass around the provider list as well, instead of just the helia context to query it but that should be fine.

the example in @helia/verified-fetch was to quickly show our issue. In our actual implementation we use the helia node directly. I tried to replicate it with the helia node but it seems easier to just make a small modification in helia for the bitswap.

@achingbrain
Copy link
Member

The changing of the order is important here as It sends a want request immediately to its connected peers while also trying to traverse the DHT for the response. If the connected nodes have the data, it will respond immediately and not rely on the DHT. Looking back, I do see what you mean as I await only the wantlist so that appears to be an error on my side.

The existing implementation does this already - it makes the routing query but does not wait for it before continuing. It only waits for the wantlist entry which involves sending the current wantlist to all connected peers.

The one downside is we have to pass around the provider list as well, instead of just the helia context to query it but that should be fine.

If the providers will not change over the lifetime of the node, you can add a custom routing implementation to return them as suggested, then you don't need to pass a list around.

You can add this to the existing routing implementations so it will still fall back to the Amino DHT if necessary. The routing implementations are all raced against each other so one will not hold up any of the others.

import { createHelia, defaultConfig } from 'helia'
import { multiaddr } from '@multiformats/multiaddr'
import { peerIdFromString } from '@libp2p/peer-id'
import { httpGatewayRouting, libp2pRouting } from '@helia/routers'
import { createLibp2p } from 'libp2p'

libp2p = await createLibp2p({
  // config
})

helia = await createHelia({
  // other config
  libp2p,
  routers: [{
    async * findProviders() {
      yield * [
        FLUORINE_WEBSOCKETS,
        BISMUTH_WEBSOCKETS,
        CERIUM_WEBSOCKETS
      ].map(ma => {
        const address = multiaddr(ma)
        const id = peerIdFromString(address.getPeerId() ?? '')

        return {
          id,
          multiaddrs: [
            address
          ]
        }
      })
    }
  }, httpGatewayRouting(), libp2pRouting(libp2p)]
})

If there are only certain CIDs you want to return providers for you can interrogate the first arg passed to findProviders - see https://ipfs.github.io/helia/interfaces/helia.Routing.html#findProviders

@TheGreatAlgo
Copy link
Author

I can try this and report back, thanks!

@TheGreatAlgo
Copy link
Author

So I tried it again. We accidentally published to the DHT since we did this so we turned it off and hoping to try again when it becomes unpublished to recreate this.

I did try another dataset that we generated quickly that was not published to the DHT but I can't recreate anything. It works completely if I am peer connected. If I have empty routers, or just a router with libp2p routing it still works fine. Prior it would break completely unless i was using the http gateway with our own gateway. So not even using your suggestion it works fine which is odd.

I will try again when it comes off the DHT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants