The Wild Wild Web

  • Surf
  • About
  • Sites
  • News
  • Recent
  • Pending
  • Graveyard
  • Keep up-to-date with the latest news of this category.
    | Search My Site
    Wilhelm von Humboldt on “the individual man, and the highest ends of his existence” (via Henrik Karlsson): The true end of Man, or that which is prescribed by the eternal and immutable dictates of reason, and not suggested by vague and transient desires, is the highest and most harmonious developme [...]
    | CORE
    This International Open Access Week, the global research community is asking a vital question: Who owns our knowledge? At CORE (COnnecting REpositories), our answer is clear and unapologetic:We all do.  For over a decade, CORE has stood at the forefront of the open access movement, not as a passive [...]
    | CORE
    We’re pleased to share that Professor Petr Knoth, Founder and Head of CORE (core.ac.uk) and Professor of Data Science at The Open University’s Knowledge Media Institute, will be giving a Computer Science Talk at Yale University on 13 October 2025. In his talk, titled “COnnecting REpositories (CORE) [...]
    | CORE
    Every research article, thesis, and working paper, accessible to anyone, anywhere. That’s the reality CORE has been building for 15 years, making knowledge discoverable and usable for students, educators, researchers, and curious minds across the globe. “It’s extraordinary to witness how CORE (COnne [...]
    | Marginalia Search
    One of the big ambitions for the search engine this year has been to enable searching in more languages than English, and a pilot project for this has just been completed, allowing experimental support for German, French and Swedish. These changes are now live for testing, but with an extremely smal [...]
    | CORE
    On 25 September 2025, CORE was invited by the UK Council of Open Research and Repositories (UKCORR)  to present a webinar for their members, titled “From Principles to Practice: Making Repository Content Discoverable with the CORE Data Provider’s Guide.” The session focused on one of the most pressi [...]
    | Mwmbl
    This article was originally posted on my personal blog on 2nd August 2025. Dear friends, I am constantly besieged by the feeling that I am not doing enough. A genocide is unfolding before our eyes. I feel the guilt with every mother holding a starving child, with every doctor killed, with every jour [...]
    | Kagi
    ------------------------------------------------------------------- Our curation of privacy-first projects worth knowing and supporting ------------------------------------------------------------------- Nothing excites us more than bringing together people who believe the web should respect its us [...]
    | Marginalia Search
    The Marginalia Search index has been partially rewritten to perform much better, using new data structures designed to make better use of modern hardware. This post will cover the new design, and will also touch upon some of the unexpected and unintuitive performance characteristics of NVMe SSDs whe [...]
    | Mwmbl
    It’s been so long since we’ve had an update on the blog that people are often confused as to whether the project is still active. It definitely is! I’m just bad at updating the blog. Most of the updates have been going to the Matrix channel. So an update is long overdue. Most of the recent work has [...]
    | Scryfall
    Scryfall is proud to announce that we’ve entered into a new partnership with Cardmarket. In the coming weeks, you should see a lot more richness in our available data for European pricing.
    | Marginalia Search
    As some of the work planned for Marginalia Search this year has been progressing a bit faster than anticipated, there was time to implement an unplanned change. This post details the implementation of a system for detecting when servers are online, to avoid serving dead links and improve data qualit [...]
    | Kagi
    Three years ago, Kagi officially launched with a splash on popular technology forum Hacker News (to which we are eternally grateful for helping put Kagi on the map).
    | Marginalia Search
    The most recent change to the search engine is a system that profiles websites based on their rendered DOM. The goal is identifying advertisements, trackers, nuisance popovers, and similar elements. The search engine already tries to do this, but isn’t very good at it because it’s only looking at st [...]
    | Marginalia Search
    The search engine has recently gained the ability to index the PDF file format. The change will deploy over a few months. Extracting text information from PDFs is a significantly bigger challenge than it might seem. The crux of the problem is that the file format isn’t a text format at all, but a gr [...]
    | clew
    The web is mind-bogglingly huge; let's look at how personal websites can thrive and interact despite that.
    | Scryfall
    Scryfall now offers a search term for cards that use the default frame. That is, cards that aren't showcases, borderless, extended art, and so on.
    | clew
    While on a fourteen-hour international flight, I finally managed to come up with an architecture for Clew's web crawler that I'm happy with. Here's the run-down.
    | clew
    I believe I've reached a point in Clew's development where, armed with the knowledge I've acquired from months of crawling sites and using that data to search the index, it's time to wipe the index and start over.
    | Mwmbl
    By many measures, Mwmbl is doing great. We have indexed over half a billion pages, we have over 4,000 registered users, and over 30,000 curations from those users. Our volunteers are crawling around 5 million pages a day. But the score that I care about most right now is NDCG. This measures the qual [...]
    | Mwmbl
    It’s two years since we launched Mwmbl, the open source, non-profit search engine, on Boxing Day 2021. A good time to take stock of where we are and where we’re going. We’ve indexed over 100 million pages Thanks to our volunteers, who crawl the web using the Firefox extension and command line script [...]
    | Mwmbl
    Mwmbl is the first search engine to allow users to change the search results: You can add results, delete them, and rerank them. The changes you made are saved instantly to the index and will be shown to other users who run the same query. But what is the point of users changing search results? Th [...]