Through most of 2015, I wrote and operated a program named i2spy. The purpose of i2spy was to perform a census of the I2P network. From the way that I2P is ran, there is no centralized way to see even basic statistics about the network. Although the I2P network is an anonimity network, there is still value in having some basic diagnostic information of the whole network.
In order to see more of the I2P network, I ran nodes on VPSs in, New York, the Netherlands, and Singapore. Two other nodes were ran by friends in unknown locations. Each of these nodes reported in to a centralized node with various data points (which you’ll see below).
The code for this project can be found on GitHub
What I Collected
Minimizing Data
I’d like to start off with, as of writing this, it is explicitly not in I2P’s threat model to hide that you are using I2P. So from an ethical point of view, this research project was probably within what an I2P user could expect.
While i2spy was running, I collected the following hourly from roughly five nodes distributed around the world:
- Full netdb
- Public key
- Signing key type
- Country
- I2P Version
- IPv6 Enabled
- Firewalled
- Router capabilities
- Hourly averages
- How many tunnels
- Number of peers
- Crypto times
- Network speeds
Note, that I specifically did not collect the following:
- IP addresses
- Ports
- RouterInfo timezones
- Any information if the node was from a country that had less than 25 nodes
I could have collected more, but I decided less is better. A concern of mine was that I could be subponead for the data. So I tried to collect just enough that I could not cause harm to any I2P users.
Things I Wish I Collected
Given I make another implementaion (likely), these are other attributes I would collect.
- LeaseSet information
- Number of downloads thru geti2p.net
- Number of fownloads thru Debian
- Requests to the I2P news servers
- Requests to I2P reseed servers
Results
User Count
Based on the peak number of detected router infos over several days, it appears there are about 50k routers. Please note that one user can run multiple routers. I did not do a super serious mathematical analysis on this number. I figured an approximate number was good enough.
Also note that the I2P rekeying happened at this time. This number tries to takes in to account the rekey.
Fast Pushing of Updates
Luckily, during the observation period two updates happened. I was able to observe I2P’s update process live! Since the update system is based on bittorrent, every I2P users helps every other I2P user upgrade.
Within two weeks, approximately 80% of I2P nodes had upgraded to the newest version! In my opinion, that’s a pretty good turn around. MUCH better than certain mobile operating systems and some web browsers.
Russians!
About half the network is in Russia. This has an added benefit that neither country likes to cooperate, so traffic analysis is, in theory, more difficult.
Most Activity around 17:00 UTC
Matches country stats (20:00 in Moscow).
NetDB ReKey Working
In I2P 0.9.22 the default signing algorigthm for router infos was switched from DSA_SHA1
to EdDSA_SHA256_Ed25519
.
To make sure that stable network conditions persisted, routers at reboot would randomly (with some percent probability) change their router info’s signing key.
After two or so releases, the I2P team changed the signing key to only be EdDSA_SHA256_Ed25519
.
Using this data, str4d was able to prove that the network rekey worked! I was cited indirectly in a presentation in Real World Cryptography 2016!
Can an I2P User Opt Out?
Short answer no, long anwser yes.
With enough work I2P could make this kind of analysis more difficult. I don’t know exactly how, but I am sure it could happen 🙂.
Follow Ups
Overall this was fun to research, and I was able to learn a lot about I2P and data analysis. In the future I hope to make another implemention that is much more modular to make up for a lot of poorly thought out design decisions I made early.
If you’re a researcher and would like the data, feel free to contact me.