Can we shave 60% off of wallet sync times by compressing blocks from remote nodes?

tusker · edit-2 8 months ago

Can we shave 60% off of wallet sync times by compressing blocks from remote nodes?

mister_monster · edit-2 8 months ago

The amount of operations per second required to decompress depends on the compression protocol, how compressed something is, so it can be fast or slow, also more importantly, the relationship between compute required to decompress and the amount decompressed is not linear, that is, 10% more compression does not translate to 10% more computation to decompress, it takes more than that. So at some point you’re taking more time to decompress than you saved downloading due to your bandwidth constraint. This is different for every node (or more accurately, every pair of nodes, sinceax bandwidth is the lowest of the two communicating) and so the more compression you use, the more you favor low bandwidth, high power nodes. I don’t know what the median or mean processing power is for nodes, and I don’t know what the median or mean bandwidth is, I’m sure some compression would benefit the network overall, but you’re always benefitting some nodes at the expense of others in doing it, and there’s no optimal scheme for all nodes on the network. Also this optimum is ever changing as people upgrade hardware and connections.

It might make sense to allow nodes to request compressed blocks from each other in the RPC, like a field in the request that says “send compressed blocks” so that high power, low bandwidth nodes can ask for it, but compression also has a processing requirement and the node being asked might not want to do it. It could cache compressed blocks, since blocks don’t change, but then it has to decompress compressed blocks every time it has to access them, or store a compressed and uncompressed version of each block if it needs constant access but wants to send compressed blocks. Its trade offs all the way down. There are considerations that can be made. But is it worth it? I don’t know. Also consider that adding a field to the request can be used for fingerprinting, the more granular you make RPC requests, the more data points can be used to fingerprint the node, which is a problem over Tor or i2p.

tusker · edit-2 8 months ago

There are established compression standards which should avoid all of the issues you mention. Obviously we would not compress to the point where it takes longer to decompress than to download over a 1mbit/s connection or cause data loss.

Most software distributed over the internet is compressed despite all the “unknowns” being present. Data stream compression is likewise beneficial and established when transferring large amounts of data to remote locations, such as backups.

Let us not get caught up in analysis paralysis and instead stick to practical solutions that will benefit the majority of users.