Ahem I thought this would be a great time to take a look

zihadhasan019 · Post by **zihadhasan019** » Sun Dec 22, 2024 8:27 am

Back at the year and ask, "where did all those pages go?" Being a data-driven kind of guy, I want to take a look at some numbers about churn, freshness and what it means for the size of the web and web indexes over the last year, and the hundreds of billions, indeed trillion plus urls we've gotten our hands on. This index update has a lot going on, so I've broken things out section by section: Analysis of the Web's Churn (or why having ten trillion URLs isn't very useful) Canonicalization, De-Duping & Choosing Which Pages to Keep Statistics on our December Linkscape Update New Updates to the FREE SEOmoz API (and a 90% price drop on the paid API) An Analysis of the Web's Churn Rate Not too long ago, at SMX East, I heard Joachim Kupke (senior software engineer on Google's indexing team) say that "a majority of the web is duplicate content".

I made great use of that point at a Jane and israel email list Robot meet up shortly after. Now, I'd like to add my own corollary to that statement: "most of the web is short-lived". Churn on the Web After just a single month, a full 25% of the URLs are what we call "unverifiable". By that I mean that the content was either duplicate, included session parameters, or for some reason could not be retrieved (verified) again (404s, 500s, etc.

Six months later, 75% of the tens of billions of URLs we've seen are "unverifiable" and a year later, only 20% qualifies for "verified" status. As Rand noted earlier this week, Google's doing a lot of verifying themselves. To visualize this dramatic churn, imagine the web six months ago... the web six months ago Using Joachim's point, plus what we've observed, that six-month old content today looks something like this: what remains of the the six month old web What this means for you as a marketer is that some of the links you build and content you share across the web is not permanent.