IPFS, built by Protocol Labs, is the latest uncensorable tech to be used in the fight for freedom of information. Specifically, Matters.news, a Chinese news source, has been publishing articles that are stored on this immutable version of the web.
The Chinese government is censoring criticism of its handling of the coronavirus, particularly any mention of Dr. Li Wenliang, the doctor who warned of the disease and later died of it. Citizens are turning to decentralized protocols to share news and sentiment as a result.
“It’s been an outlet for community information, support, and advice”, said IPFS lead Molly Mackinlay.
IPFS, which stands for the InterPlanetary File System, is a radical redesign of how people navigate and use the internet.
The current paradigm of web-search runs HTTP, which sends requests for online content to a single server that stores information, meaning that if anything is changed or blocked there is no reliable way to access it again. IPFS, a peer-to-peer protocol, instead allows users to download webpages and content stored across multiple servers, and provides “historical versioning” that shows how documents have been manipulated.
While this may seem like a clunky solution to a problem that only affects a few, IPFS has spread the world over.
It’s used in Turkey as host to a mirror version of Wikipedia, after the nation banned the online encyclopedia for including it on a list of terrorist financiers.
“These public, read-only snapshots of English and Turkish Wikipedia provided distributed access to important facts and neutral commentary censored by Turkey in their nearly 3 year Wikipedia block”, Mackinlay said. The block ended last January, but the mirror site remains.
In the past year, the number of nodes running IPFS grew 30 percent, driven primarily by new community adoption from applications like Microsoft ION, CharityEngine, EthDNS, and Brave, Mackinlay said.
Now, Protocol Labs is looking to get to the next stage of growth. Dedicated to building the next version of cyberspace, the non-profit organization will commit over $100,000, plus developer support and guidance, to the IPFS DevGrants program, and over $1 million in wider ecosystem support projects over the next six months.
“IPFS is a free and open protocol and always will be”, Mackinley said. “While there are absolutely ways for open-source development teams like ours to achieve profitability while building and improving free and open software through consulting and selling associated tools or services, that isn’t on our roadmap this year.”
We sat down with Makinley to get a read on IPFS and fo understand more about how the system functions.
Molly Mackinlay
What’s so wrong with HTTP?
The web as we know it is pretty brittle. That’s because of the way we choose to store content. HTTP, the core protocol in use, is a way of storing content located in a particular server, in a particular place. It’s not resilient to have such a centralized structure.
If you ever move a piece of content, suddenly all of the references to it break. This is like if you’ve ever gone to a library and tried to find a particular book that someone has moved to a different location on the stacks. No one is able to find that book again. Instead of “addressing” things by the location of where data is – like on the third shelf, fourth from the right in the New York Public Library at 42nd Street – you instead address something by the content itself. So if you want to read Tom Sawyer, you can go get a copy from whoever happens to have it. It could be in your backpack. Your neighbor could have it. Your local library. Instead of having to travel all the way to the one central location that’s hosting that content, you’d be able to get it from anyone who’s able to loan to you. And that’s why IPFS is more resilient.
“We’ve fallen into this centralized trap.”
This also helps resist censorship. Again, if someone prohibits you from accessing a library, or if there’s a natural disaster and you’re unable to get to that library, that’s a problem. Because content is distributed across IPFS, you wouldn’t have to travel to that particular location, you can find a different copy.
IPFS comes from this core primitive of changing the web from a location-based model, which relies on central parties to host and distribute content, to a content based system. To some extent, this was how the web was initially designed. It was supposed to be decentralized and enable this kind of free sharing of ideas. But we’ve fallen into this centralized trap.
When did you realize that internet centralization could be an issue?
I was a product manager of Google for a number of years, working on education games for Google Classroom. If you look at schools all over the world, even in New Jersey where I was living, they have terrible, terrible Internet. We’re talking really minimal bandwidth so that when students try to do their school work it takes them minutes to load in the content. Even loading a Google Doc could put an unreasonable load on their infrastructure, but they can’t afford to upgrade. I saw this first hand in classroom visits in New Jersey, in Thailand, all over the world. Schools are having this challenge and they’re wasting a whole ton of instructional time.
And it’s a problem with the fabric of the Internet, which requires every individual child to load a video over and over and over again from some distant location. Teachers and students should be able to share digital information with each other directly, instead of having to go through some distant intermediary to share that content. This also helps if the internet goes out or a service provider goes offline, or something like that, the classroom can keep running. So it’s a more resilient fabric for the internet, which could support a ton of applications so we don’t run into these kinds of central line hangups.
Storing data locally would theoretically improve retrieval speeds. But reports show IPFS lagging, and it’s not exactly clear where that’s coming from. Has that been identified?
With any new technology, performance is definitely an issue. We know millions of users who are using IPFS for a whole ton of cases. Once you have a lot of people who are using it and excited about using it, it’s going to be a challenge to make it faster and scalable, so that all of those people can rely upon the service.
Like the biggest performance challenge we’ve been focused on is our content rabbit, which is how you go about finding the content that you care about in a large distributed network.
If you’re on a centralized web model – like Google – it’s incentivized to make bigger content as available as possible; versus in our distributed network, you have a lot more complexity.
So that’s been a big focus for us for the past three months and will be going forward. Until the end of June, our big focus is making sure that IPFS is actually a distributed network. And ensuring only good nodes join the network.
How do you define a good node?
Nodes that have a lot of strong connections to other nodes are ideal participants in distributed networks. We saw 30x growth in the number of nodes last year, which was huge. So we need to upgrade our systems and algorithms to support that.
Considering that level of growth, how are poor performers kicked off, while maintaining the decentralized nature of the protocol?
We’ve created this concept of having two different types of nodes participating in the system: servers and clients. Servers help other nodes get to the content they care about. We want to make sure you only become a server if you’re going to be online consistently. You need to be dependable. People need to be able to connect to your machine.
“We can’t have an internet that relies on you centralized linking back to Earth once we’re spread all across the galaxy.”
We also want all kinds of people and devices to participate. That opens the doors to all sorts of unreliable devices, like mobile phones. But we don’t want them to be servers within the network, so these less dependable devices become clients. Actually, I should clarify that server here does not mean, like, a physical server used today. You could do this on a laptop or any other sort of machine. It could even be a phone if you really were reliable.
The aim here is to programmatically diagnose whether a node is going to be online and dependable. And if we detect those characteristics, then you get marked as a DHD server and, if not, you become a client.
Is there any user information collected?
The node itself collects this information about itself and then makes the decision whether or not it joins as a client or server. They also ask peers in the network to check for you, by dialing to see if they’re accessible or not. This again, puts the power inside the node itself.
So we’re not collecting some centralized database of this sort of stuff. That’s not how we work. We’re all about a decentralized model of things, and that gets baked into the network.
What are the incentives of joining the network as a node?
Right now, a lot of people are building their businesses on IPFS. They’re building applications that they want people to have access to, so that highly incentivizes folks to run their own nodes and help serve the data that they care about.
We also have a feature that came out in December, which is an example of giving people the tools they need to help maintain the data they care about. It’s called Collaborative Clusters. It allows everyone who cares about a dataset to peer into a global network of people who are all helping replicate and a host of that data.
There’s a huge collaborative ecosystem here, in addition to folks who are highly incentivized because their business is dependent – like us – to run their own servers or to pay others to make sure that it continues to exist.
The first two letters in IPFS stand for interplanetary. Is the plan to take IPFS to space?
We’re very inspired by the idea that not too far from now we’re going to have persistent human colonies on Mars or some other planet in the solar system.
When you have that sort of set up where humans become an interplanetary species, we will need to maintain connections and connectivity between Earth and Mars. Imagine living on Mars and needing to load a Wikipedia page. If you were reliant on a centralized server/client system based on Earth, you’ll have something like a 14 minute delay to load every single page that you want to access. That’s just crazy.
We can’t have an internet that relies on you centralized linking back to Earth once we’re spread all across the galaxy. We’re gonna need a more kind of resilient and content-aware network, that allows content to cache and persist in local environments. Go fetch information from the server next to you, instead of going all the way back to Earth.
So it’s definitely motivational. It’s an exemplar use case. It demonstrates we’re working on already has benefits here. We don’t need to go all the way to Mars to show being able to connect with the person next to you is gonna be faster than going all the way across the country.
But it also gives us some nice timelines: the last time I heard Elon Musk was planning to have humans on Mars by 2024. So we need to make sure that IPFS becomes the default Web platform by then.