The Wild Wild Web

The new X/Twitter algorithim is hard to predict, but I’ve had one go viral with over a million views now, a quote-tweet of a cool demo video of Apple’s website builder from 2009, with t…

Announcing the IndieWeb Hackathon | Anthony Ciccarello

2 days ago | Search My Site

Following the IndieWebCamp San Diego Open Source development session, I’m proposing the start of a new monthly event tentatively called the IndieWe...

ODonnellWeb – Weekend Update 34

2 days ago | Search My Site

The weekly update, plus bits and bobs that caught my attention this week.

The journey

3 days ago | Search My Site

Dear reader, Earlier today, as I was cleaning my windows, I was watching a video […]

Is It a Metric or an Obsession?

3 days ago | Search My Site

I’ve been thinking about why some people can track their progress on goals without going insane, while others turn into the guy who weighs himself four times a day and has a panic attack when his Fitbit dies… You’d think there’d be some obvious personality difference: anxious

Webinar Series Recap: [...]

6 days ago | CORE

On 3 December 2025, CORE (COnnecting REpositories) hosted the latest live training session in our webinar series, Become a Professional User of the CORE Dashboard. We extend our sincere thanks to the many participants who joined us from across Europe, Africa, the United States and beyond. The high l [...]

New Search Filtering in Web and API

8 days ago | Marginalia Search

The search engine recently exposed a fair number of new tools for custom filtering to the API consumers and users of the new UI. This was originally going to be an incredibly chaotic update, both annuncing the new features and doing a technical walkthrough of the changes but that ambition turned out [...]

PRESS RELEASE

19 days ago | CORE

CORE Announces Strategic Support for the Creation of the Ukrainian open repositories community Milton Keynes, United Kingdom – 27 November 2025 CORE is pleased to announce its formal support for the creation of the Ukrainian open repositories community, a national initiative led by the Scientific an [...]

Orion 1.0 ✴︎ Browse Beyond

21 days ago | Kagi

After six years of relentless development, *Orion for MacOS 1.0 is here*.

Kagi Hub Belgrade: A home base for Kagi members worldwide

25 days ago | Kagi

We’re excited to announce that Kagi Hub Belgrade ( https://hub.kagi.com ) is now open! Our first office doubles as a free coworking space for all Kagi members.

Introducing Kagi Assistants

26 days ago | Kagi

*TL;DR* Today we’re releasing two research assistants: Quick Assistant and Research Assistant (previously named Ki during beta).

LLMs are bullshitters. But that doesn't mean they're not useful

27 days ago | Kagi

*Note:* This is a personal essay by Matt Ranger, Kagi’s head of ML In 1986, Harry Frankfurt wrote On Bullshit ( https://en.wikipedia.org/wiki/On_Bullshit ).

Introducing SlopStop: Community-driven AI slop detection in Kagi Search

last month | Kagi

------------------------------------------------------------------- Your collective defense against AI-generated spam and content farms ------------------------------------------------------------------- We made it our mission to prevent the web from becoming useless and a harmful space.

SoFAIR study paper accepted to JCDL2025

last month | CORE

The Open University is the project coordinator for the 2-year CHIST-ERA funded SoFAIR project which aims to make research software a first-class, FAIR research object (Findable, Accessible, Interoperable, and Reusable). We are excited to share that our paper, “Identifying and Classifying Software [...]

We Are Open Access And We’re Reclaiming Knowledge Together

2 months ago | CORE

This International Open Access Week, the global research community is asking a vital question: Who owns our knowledge? At CORE (COnnecting REpositories), our answer is clear and unapologetic:We all do. For over a decade, CORE has stood at the forefront of the open access movement, not as a passive [...]

Co-Designing the Next 15 Years: Highlights from CORE’s Board of Supporters Meeting

2 months ago | CORE

Twice each year, CORE’s Board of Supporters (BoS) meeting brings together our members, partners, and collaborators to exchange ideas, share progress, and shape the priorities that guide our development. The October 2025 meeting marked yet another successful, well-attended session and the second of t [...]

Language Support for Marginalia Search

2 months ago | Marginalia Search

One of the big ambitions for the search engine this year has been to enable searching in more languages than English, and a pilot project for this has just been completed, allowing experimental support for German, French and Swedish. These changes are now live for testing, but with an extremely smal [...]

Silent No Longer

3 months ago | Mwmbl

This article was originally posted on my personal blog on 2nd August 2025. Dear friends, I am constantly besieged by the feeling that I am not doing enough. A genocide is unfolding before our eyes. I feel the guilt with every mother holding a starving child, with every doctor killed, with every jour [...]

Through the Omenpaths added, plus English printed text support

3 months ago | Scryfall

Find out how Scryfall is handling data entry for Through the Omenpaths.

Mojeek is Not an Answer Engine

3 months ago | Mojeek

Mojeek is about Search. AI is not the Answer.

Faster Index I/O with NVMe SSDs

4 months ago | Marginalia Search

The Marginalia Search index has been partially rewritten to perform much better, using new data structures designed to make better use of modern hardware. This post will cover the new design, and will also touch upon some of the unexpected and unintuitive performance characteristics of NVMe SSDs whe [...]

Update July 2025

5 months ago | Mwmbl

It’s been so long since we’ve had an update on the blog that people are often confused as to whether the project is still active. It definitely is! I’m just bad at updating the blog. Most of the updates have been going to the Matrix channel. So an update is long overdue. Most of the recent work has [...]

Scryfall + Cardmarket

5 months ago | Scryfall

Scryfall is proud to announce that we’ve entered into a new partnership with Cardmarket. In the coming weeks, you should see a lot more richness in our available data for European pricing.

Finding Dead Websites

6 months ago | Marginalia Search

As some of the work planned for Marginalia Search this year has been progressing a bit faster than anticipated, there was time to implement an unplanned change. This post details the implementation of a system for detecting when servers are online, to avoid serving dead links and improve data qualit [...]

Profiling Websites

7 months ago | Marginalia Search

The most recent change to the search engine is a system that profiles websites based on their rendered DOM. The goal is identifying advertisements, trackers, nuisance popovers, and similar elements. The search engine already tries to do this, but isn’t very good at it because it’s only looking at st [...]

A Secret Web

7 months ago | clew

The web is mind-bogglingly huge; let's look at how personal websites can thrive and interact despite that.

Searchception

8 months ago | Mojeek

The illusion created by the merging of browsers with search engines.

Introducing is:default

9 months ago | Scryfall

Scryfall now offers a search term for cards that use the default frame. That is, cards that aren't showcases, borderless, extended art, and so on.

Learning and Sharing about Alternatives to Big Tech

9 months ago | Mojeek

What you can do once you've decided to avoid Big Tech?

Errata Notice: Aetherdrift

10 months ago | Scryfall

In early February 2025, Gatherer released a sweep of nearly 12,000 card Oracle text updates.

Leaving Big Tech

10 months ago | Mojeek

A range of tools available to help you kick Big Tech companies out of your life...

The MTG Wiki is now at mtg.wiki, hosted by Scryfall

10 months ago | Scryfall

The Magic: The Gathering wiki is moving to mtg.wiki and will no longer be hosted on Fandom.

Topical Custom Search Engines

11 months ago | Mojeek

How the Mojeek API can be used to build topical search engines...

The New Ariadne Architecture

last year | clew

While on a fourteen-hour international flight, I finally managed to come up with an architecture for Clew's web crawler that I'm happy with. Here's the run-down.

Lumen Researcher Interview Series: Phineas Rueckert - Forbidden Stories

last year | Lumen

Redesigning the Index

last year | clew

I believe I've reached a point in Clew's development where, armed with the knowledge I've acquired from months of crawling sites and using that data to search the index, it's time to wipe the index and start over.

Re-ranking search results on the client side in Rust

last year | Mwmbl

By many measures, Mwmbl is doing great. We have indexed over half a billion pages, we have over 4,000 registered users, and over 30,000 curations from those users. Our volunteers are crawling around 5 million pages a day. But the score that I care about most right now is NDCG. This measures the qual [...]

More Local News Caught in Flood of Unrelated Copyright Takedown Requests

last year | Lumen

A set of over 60,00 notuces in Lumen's database, ostensibly targeting Turkish escort sites, inadvertently sweep up local news URLs.

What’s Behind The Unusual DMCA Notices From “Crowdstrike”?

last year | Lumen

In the wake of the disatrous Windows update of July, 2024, several mysterious DMCA notices sent to Google, apparently form Crowdstrike

Takedowns: Olympic Edition

last year | Lumen

Lumen has identified new signs of a coordinated and potentially automated fraudulent DMCA takedown campaign relating to articles about a Russian Olympian. Building on work done by past Lumen team members and documented in previous Lumen blog posts, the evidence presented here sheds light on previo [...]

Tell Them To Bring Out The Whole Ocean

last year | Lumen

A standard DMCA notice from an OnlyFans performer leads to the de-indexing of some unrelated science & environment webpages

I'm Losing Faith in BM25

last year | clew

The current way that result ranking works in Clew is very different from what I want.

Welcome to the Madness

last year | clew

In which we launch the insanity that is this development blog for Clew.

Indexing a billion pages

2 years ago | Mwmbl

It’s two years since we launched Mwmbl, the open source, non-profit search engine, on Boxing Day 2021. A good time to take stock of where we are and where we’re going. We’ve indexed over 100 million pages Thanks to our volunteers, who crawl the web using the Firefox extension and command line script [...]

Why is curation of web search results important?

2 years ago | Mwmbl

Mwmbl is the first search engine to allow users to change the search results: You can add results, delete them, and rerank them. The changes you made are saved instantly to the index and will be shown to other users who run the same query. But what is the point of users changing search results? Th [...]