A link archive

1.0 Background                                                           line 14
1.1 The manually curated link collection                                 line 34
1.2 The demise of link collections                                       line 42
2.0 Using archive.org to stabilize a link collection                     line 58
3.0 Experiences                                                          line 70

I have wondered whether there is a way for a curated link archive, in the spirit
of the early days of the WWW, to be useful at the time of writing. This is an
experiment where I tried to use archive.org links to stabilize the link targets.


    1.0 Background

A functional information network provides three core services:

    1: Means of sharing.
    2: Means of discovering what is shared.
    3: Means of providing feedback.

The early days of the web provided (1), built (3) on top the fact sharing an
e-mail address was a lot less foolhardy and therefore quite common back then and
used home pages, link collections, and so on to try and solve (2).

Currently, open systems provide both means of sharing and means peer to peer
communication, but discoverability is firmly in the grasp of search giants and
walled gardens. An index is by its very nature a monolithic beast; dividing an
index into smaller parts both requires domain expertise when splitting it as
well as the greater problem the user must now know which index to ask. In other
words, web search is extremely friendly to monopolies.


    1.1 The manually curated link collection

From home pages and early attempts at identifying different subsets of the web
grew the larger, curated link collections, most famously Yahoo. The growth of
Yahoo in that period should be ample demonstration these collections were
considered useful resources.


    1.2 The demise of link collections

There might be no single factor which killed off the popularity of manually
maintained link collections. Some factors may be:

    - The explosive growth of the web itself making the collections outdated at
      an accelerating pace.
    - The lack of a way to make them scale without adding manpower.
    - The increasing quality of search engines.
    - Single page applications and other hindrances to deep linking, like walled
      gardens in general.
    - Whenever a major site changed content management system, large parts of a
      collection could, and often would, become dead links, as well as other
      sources of link rot.


    2.0 Using archive.org to stabilize a link collection

Instead of linking directly to the original resource, this experiment always
uses an indirection through archive.org's Wayback Machine. Using links to
archive.org is an attempt to avoid long term link rot. Also, if what is
available through archive.org is insufficient, the actual resource may be
available (and updated), or the data on archive.org may provide sufficient bread
crumbs to locate the original or new resources regarding the same problem.

The problem of scaling with the web really does not interest me at all. :)


    3.0 Experiences

I chose a topic for storing a few links, which by experience were hard to dig up
and could be quite hard to find again given what seems to be a continous
degradation of the quality of web search. After more than a year, what I have is
a pretty nice, little bookmarks file, with short comments. The links work, the
information is retrievable, but I have become utterly disillusioned in regards
to this approach.

First of all, archive.org has been sued. Trusting information integrity to the
litigious nature of the US public life seems like a fundamentally bad idea.

Second, archive.org is still yet another centralized service. The institution
of the library is one of civilization's greatest success stories, while the
demise of the library of Alexandria is remembered as a tragedy millennia after
it burned.

For bookmarks into a cache to be reliable and trusted, the cache must be
distributed and replicated. The bookmarks must be tended regularly, like a
garden, to avoid it becoming stale and divorced from possible live, maintained
copies. The goal was discoverability of useful resource, not pure archival work.

In other words, my conclusion is that we ideally would have multiple public
digital libraries. I think that is a good idea in general, but perhaps something
could be done on a smaller scale? One alternative might be topical archives
maintained by smaller organizations?

I don't really know where to go from here, I just wanted to share my rather
mixed success and reality check.

Edited 20220218T152539Z.

Steinar Knutsen, 20231001T183425Z, 8E0DDAAF