In a cluster spanning multiple geographical regions, Global Tables let database clients in any region read data with region-local latencies. Imagine a table with data that is read from all over the world: say, for example, a table of exchange rates in a global bank’s database. Queries ran on behalf of users anywhere might access…
[This is one of the finalists in the 2022 book review contest. It’s not by me – it’s by an ACX reader who will remain anonymous until after voting is done, to prevent their identity from influencing your decisions. I’ll be posting about one of these a week for several months. When you’ve read them…
Way back in April 2018, GitLab 10.7 introduced the Web IDE to the world and brought a delightful multi-file editor to the heart of the GitLab experience. Our goal was to make it easier for anyone and everyone to contribute, regardless of their development experience. Since its introduction, tens of millions of commits have been…
90% of the world’s data is unstructured. It is built by humans, for humans. That’s great for human consumption, but it is very hard to organize when we begin dealing with the massive amounts of data abundant in today’s information age.
Organization is complicated because unstructured text data is not intended to be understood by machines, and having humans process this abundance of data is wildly expensive and very slow.
Fortunately, there is light at the end of the tunnel. More and more of this unstructured text is becoming accessible and understood by machines. We can now search text based on meaning, identify the sentiment of text, extract entities, and much more.
Transformers are behind much of this. These transformers are (unfortunately) not Michael Bay’s Autobots and Decepticons and (fortunately) not buzzing electrical boxes. Our NLP transformers lie somewhere in the middle, they’re not sentient Autobots (yet), but they can understand language in a way that existed only in sci-fi until a short few years ago.
Machines with a human-like comprehension of language are pretty helpful for organizing masses of unstructured text data. In machine learning, we refer to this task as topic modeling, the automatic clustering of data into particular topics.
BERTopic takes advantage of the superior language capabilities of these (not yet sentient) transformer models and uses some other ML magic like UMAP and HDBSCAN (more on these later) to produce what is one of the most advanced techniques in language topic modeling today.
BERTopic at a GlanceWe will dive into the details behind BERTopic , but before we do, let us see how we can use it and take a first glance at its components.
To begin, we need a dataset. We can download the dataset from HuggingFace datasets with:
The dataset contains data extracted using the Reddit API from the /r/python subreddit. The code used for this (and all other examples) can be found here.
Reddit thread contents are found in the selftext feature. Some are empty or short, so we remove them with:
We perform topic modeling using the BERTopic library. The “basic” approach requires just a few lines of code.
From model.fit_transform we return two lists:
topics contains a one-to-one mapping of inputs to their modeled topic (or cluster).probs contains a list of probabilities that an input belongs to their assigned topic.We can then view the topics using get_topic_info.
The top -1 topic is typically assumed to be irrelevant, and it usually contains stop words like “the”, “a”, and “and”. However, we removed stop words via the vectorizer_model argument, and so it shows us the “most generic” of topics like “Python”, “code”, and “data”.
The library has several built-in visualization methods like visualize_topics, visualize_hierarchy, and visualize_barchart.
BERTopic’s visualize_hierarchy visualization allows us to view the “hierarchy” of topics.
These represent the surface level of the BERTopic library, which has excellent documentation, so we will not rehash that here. Instead, let’s try and understand how BERTopic works.
OverviewThere are four key components used in BERTopic , those are:
A transformer embedding modelUMAP dimensionality reductionHDBSCAN clusteringCluster tagging using c-TF-IDFWe already did all of this in those few lines of BERTopic code; everything is just abstracted away. However, we can optimize the process by understanding the essentials of each component. This section will work through each component without BERTopic, and learn how they work before returning to BERTopic at the end.
Transformer EmbeddingBERTopic supports several libraries for encoding our text to dense vector embeddings. If we build poor quality embeddings, nothing we do in the other steps will be able to help us, so it is very important that we choose a suitable embedding model from one of the supported libraries, which include:
Sentence TransformersFlairSpaCyGensimUSE (from TF Hub)Of the above, the Sentence Transformers library provides the most extensive library of high-performing sentence embedding models. They can be found on HuggingFace Hub by searching for “sentence-transformers”.
We can find official sentence transformer models by searching for “sentence-transformers” on HuggingFace Hub.
The first result of this search is sentence-transformers/all-MiniLM-L6-v2, this is a popular high-performing model that creates 384-dimensional sentence embeddings.
To initialize the model and encode our Reddit topics data, we first pip install sentence-transformers and then write:
Here we have encoded our text in batches of 16. Each batch is added to the embeds array. Once we have all of the sentence embeddings in embeds we’re ready to move on to the next step.
Dimensionality ReductionAfter building our embeddings, BERTopic compresses them into a lower-dimensional space. This means that our 384-dimensional vectors are transformed into two/three-dimensional vectors.
We can do this because 384 dimensions are a lot, and it is unlikely that we really need that many dimensions to represent our text . Instead, we attempt to compress that information into two or three dimensions.
We do this so that the following HDBSCAN clustering step can be done more efficiently. Performing the clustering step with 384-dimensions would be desperately slow .
Another benefit is that we can visualize our data; this is incredibly helpful when assessing whether our data can be clustered. Visualization also helps when tuning the dimensionality reduction parameters.
To help us understand dimensionality reduction, we will start with a 3D representation of the world. You can find the code for this part here.
3D scatter plot of points from the jamescalam/world-cities-geo dataset.
We can apply many dimensionality reduction techniques to this data; two of the most popular choices are PCA and t-SNE.
Our 2D world reduced using PCA.
PCA works by preserving larger distances (using mean squared error). The result is that the global structure of data is usually preserved . We can see that behavior above as each continent is grouped with its neighboring continent(s). When we have easily distinguishable clusters in datasets, this can be good, but it performs poorly for more nuanced data where local structures are important.
2D Earth reduced using t-SNE.
t-SNE is the opposite; it preserves local structures rather than global. This localized focus results from t-SNE building a graph, connecting all of the nearest points. These local structures can indirectly suggest the global structure, but they are not strongly captured.
PCA focuses on preserving dissimilarity whereas t-SNE focuses on preserving similarity.
Fortunately, we can capture the best of both using a lesser-known technique called Uniform Manifold Approximation and Production (UMAP).
We can apply UMAP in Python using the UMAP library, installed using pip install umap. To map to a 3D or 2D space using the default UMAP parameters, all we write is:
The UMAP algorithm can be fine-tuned using several parameters. Still, the simplest and most effective tuning can be achieved with just the n_neighbors parameter.
For each datapoint, UMAP searches through other points and identifies the kth nearest neighbors . It is k, controlled by the n_neighbors parameter.
k and n_neighbors are synonymous here. As we increase n_neighbors the graph built by UMAP can consider more distant points and better represent the global structure.
Where we have many points (high-density regions), the distance between our point and its kth nearest neighbor is usually smaller. In low-density regions with fewer points, the distance will be much greater.
Density is measured indirectly using the distances between kth nearest neighbors in different regions.
UMAP will attempt to preserve distances to the kth nearest point is what UMAP attempts to preserve when shifting to a lower dimension.
By increasing n_neighbors we can preserve more global structures, whereas a lower n_neighbors better preserves local structures.
Higher n_neighbors (k) means we preserve larger distances and thus maintain more of the global structure.
Compared to other dimensionality reduction techniques like PCA or t-SNE, finding a good n_neighbours value allows us to preserve both local and global structures relatively well.
Applying it to our 3D globe, we can see neighboring countries remain neighbors. At the same time, continents are placed correctly (with North-South inverted), and islands are separated from continents. We even have what seems to be the Spanish Peninsula in “western Europe”.
The UMAP-reduced Earth.
UMAP maintains distinguishable features that are not preserved by PCA and a better global structure than t-SNE. This is a great overall example of where the benefit of UMAP lies.
UMAP can also be used as a supervised dimensionality reduction method by passing labels to the target argument if we have labeled data. It is possible to produce even more meaningful structures using this supervised approach.
With all that in mind, let us apply UMAP to our Reddit topics data. Using n_neighbors of 3-5 seems to work best. We can add min_dist=0.05 to allow UMAP to place points closer together (the default value is 1.0); this helps us separate the three similar topics from r/Python, r/LanguageTechnology, and r/pytorch.
Reddit topics data reduced to 3D space using UMAP.
With our data reduced to a lower-dimensional space and topics easily visually identifiable, we’re in an excellent spot to move on to clustering.
HDBSCAN ClusteringWe have visualized the UMAP reduced data using the existing sub feature to color our clusters. It looks pretty, but we don’t usually perform topic modeling to label already labeled data. If we assume that we have no existing labels, our UMAP visual will look like this:
UMAP reduced cities data, we can distinguish many clusters/continents, but it is much more difficult without label coloring.
Now let us look at how HDBSCAN is used to cluster the (now) low-dimensional vectors.
Clustering methods can be broken into flat or hierarchical and centroid or density-based techniques . Each of which has its own benefits and drawbacks.
Flat or hierarchical focuses simply on whether there is (or is not) a hierarchy in the clustering method. For example, we may (ideally) view our graph hierarchy as moving from continents to countries to cities. These methods allow us to view a given hierarchy and try to identify a logical “cut” along the tree.
Hierarchical techniques begin from one large cluster and split this cluster into smaller and smaller parts and try to find the ideal number of clusters in the hierarchy.
The other split is between centroid-based or density-based clustering. That is clustering based on proximity to a centroid or clustering based on the density of points. Centroid-based clustering is ideal for “spherical” clusters, whereas density-based clustering can handle more irregular shapes and identify outliers.
Centroid-based clustering (left) vs density-based clustering (right).
HDBSCAN is a hierarchical, density-based method. Meaning we can benefit from the easier tuning and visualization of hierarchical data, handle irregular cluster shapes, and identify outliers.
When we first apply HDBSCAN clustering to our data, we will return many tiny clusters, identified by the red circles in the condensed tree plot below.
The condensed tree plot shows the drop-off of points into outliers and the splitting of clusters as the algorithm scans by increasing lambda values.
HDBSCAN chooses the final clusters based on their size and persistence over varying lambda values. The tree’s thickest, most persistent “branches” are viewed as the most stable and, therefore, best candidates for clusters.
These clusters are not very useful because the default minimum number of points needed to “create” a cluster is just 5. Given our ~3K points dataset where we aim to produce ~4 subreddit clusters, this is small. Fortunately, we can increase this threshold using the min_cluster_size parameter.
Better, but not quite there, we can try to reduce the min_cluster_size to 60 to pull in the three clusters below the green block.
Unfortunately, this still pulls in the green block and even allows too small clusters (as on the left). Another option is to keep min_cluster_size=80 but add min_samples=40, to allow for more sparse core points.
Now we have four clusters, and we can visualize them using the data in clusterer.labels_.
HDBSCAN clustered Reddit topics, accurately identifying the different subreddit clusters. The sparse blue points are outliers and are not identified as belonging to any cluster.
A few outliers are marked in blue, some of which make sense (pinned daily discussion threads) and others that do not. However, overall, these clusters are very accurate. With that, we can try to identify the meaning of these clusters.
The final step in BERTopic is extracting topics for each of our clusters. To do this, BERTopic uses a modified version of TF-IDF called c-TF-IDF.
TF-IDF is a popular technique for identifying the most relevant “documents” given a term or set of terms. c-TF-IDF turns this on its head by finding the most relevant terms given all of the “documents” within a cluster.
c-TF-IDF looks at the most relevant terms from each class (cluster) to create topics.
In our Reddit topics dataset, we have been able to identify very distinct clusters. However, we still need to determine what these clusters talk about. We start by preprocessing the selftext to create tokens.
Part of c-TF-IDF requires calculating the frequency of term t in class c. For that, we need to see which tokens belong in each class. We first add the cluster/class labels to data.
Now create class-specific lists of tokens.
We can calculate Term Frequency (TF) per class.
Note that this can take some time; our TF process prioritizes readability over any notion of efficiency. Once complete, we’re ready to calculate Inverse Document Frequency (IDF), which tells us how common a term is. Rare terms signify greater relevance than common terms (and will output a greater IDF score).
We now have TF and IDF scores for every term, and we can calculate the c-TF-IDF score by simply multiplying both.
In tf_idf, we have a vocab sized list of TF-IDF scores for each class. We can use Numpy’s argpartition function to retrieve the index positions containing the greatest TF-IDF scores per class.
Now we map those index positions back to the original words in the vocab.
Here we have the top five most relevant words for each cluster, each identifying the most relevant topics in each subreddit.
Back to BERTopicWe’ve covered a considerable amount, but can we apply what we have learned to the BERTopic library?
Fortunately, all we need are a few lines of code. As before, we initialize our custom embedding, UMAP, and HDBSCAN components.
You might notice that we have added prediction_data=True as a new parameter to HDBSCAN. We need this to avoid an AttributeError when integrating our custom HDBSCAN step with BERTopic. Adding gen_min_span_tree adds another step to HDBSCAN that can improve the resultant clusters.
We must also initialize a vectorizer_model to handle stopword removal during the c-TF-IDF step. We will use the list of stopwords from NLTK but add a few more tokens that seem to pollute the results.
We’re now ready to pass all of these to a BERTopic instance and process our Reddit topics data.
We can visualize the new topics with model.visualize_barchart()
Our final topics produced using the BERTopic library with the tuned UMAP and HDBSCAN parameters.
We can see the topics align perfectly with r/investing, r/pytorch, r/LanguageTechnology, and r/Python.
Transformers, UMAP, HDBSCAN, and c-TF-IDF are clearly powerful components that have huge applications when working with unstructured text data. BERTopic has abstracted away much of the complexity of this stack, allowing us to apply these technologies with nothing more than a few lines of code.
Although BERTopic can be simple, you have seen that it is possible to dive quite deeply into the individual components. With a high-level understanding of those components, we can greatly improve our topic modeling performance.
We have covered the essentials here, but we genuinely are just scratching the surface of topic modeling in this article. There is much more to BERTopic and each component than we could ever hope to cover in a single article.
So go and apply what you have learned here, and remember that despite showing the incredible performance of BERTopic shown here, there is even more that it can do.
Resources🔗 All Notebook Scripts M. Grootendorst, BERTopic Repo, GitHub  M. Grootendorst, BERTopic: Neural Topic Modeling with a Class-based TF-IDF Procedure (2022)  L. McInnes, J. Healy, J. Melville, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction (2018)  L. McInnes, Talk on UMAP for Dimension Reduction (2018), SciPy 2018  J. Healy, HDBSCAN, Fast Density Based Clustering, the How and the Why (2018), PyData NYC 2018  L. McInnes, A Bluffer’s Guide to Dimension Reduction (2018), PyData NYC 2018
UMAP Explained, AI Coffee Break with Letitia, YouTube
Share this on knowasiak.com to discuss with people on this topicsign up on Knowasiak.com now if you’re not registered yet.
I be nuts about components, because they are huge!
The Future Will Have to Wait
Published on Sunday, January 22, 02006 • 16 years, 3 months ago
Written by Michael Chabon for Details
I was reading, in a recent issue of Discover, about the Clock of the Long Now. Have you heard of this thing? It is going to be a kind of gigantic mechanical computer, slow, simple and ingenious, marking the hour, the day, the year, the century, the millennium, and the precession of the equinoxes, with a huge orrery to keep track of the immense ticking of the six naked-eye planets on their great orbital mainspring. The Clock of the Long Now will stand sixty feet tall, cost tens of millions of dollars, and when completed its designers and supporters, among them visionary engineer Danny Hillis, a pioneer in the concept of massively parallel processing; Whole Earth mahatma Stewart Brand; and British composer Brian Eno (one of my household gods), plan to hide it in a cave in the Great Basin National Park in Nevada, a day’s hard walking from anywhere. Oh, and it’s going to run for ten thousand years. That is about as long a span as separates us from the first makers of pottery, which is among the oldest technologies we have. Ten thousand years is twice as old as the pyramid of Cheops, twice as old as that mummified body found preserved in the Swiss Alps, which is one of the oldest mummies ever discovered. The Clock of the Long Now is being designed to thrive under regular human maintenance along the whole of that long span, though during periods when no one is around to tune it, the giant clock will contrive to adjust itself. But even if the Clock of the Long Now fails to last ten thousand years, even if it breaks down after half or a quarter or a tenth that span, this mad contraption will already have long since fulfilled its purpose. Indeed the Clock may have accomplished its greatest task before it is ever finished, perhaps without ever being built at all. The point of the Clock of the Long Now is not to measure out the passage, into their unknown future, of the race of creatures that built it. The point of the Clock is to revive and restore the whole idea of the Future, to get us thinking about the Future again, to the degree if not in quite the way same way that we used to do, and to reintroduce the notion that we don’t just bequeath the future—though we do, whether we think about it or not. We also, in the very broadest sense of the first person plural pronoun, inherit it.
The Sex Pistols, strictly speaking, were right: there is no future, for you or for me. The future, by definition, does not exist. “The Future,” whether you capitalize it or not, is always just an idea, a proposal, a scenario, a sketch for a mad contraption that may or may not work. “The Future” is a story we tell, a narrative of hope, dread or wonder. And it’s a story that, for a while now, we’ve been pretty much living without.
Ten thousand years from now: can you imagine that day? Okay, but do you? Do you believe “the Future” is going to happen? If the Clock works the way that it’s supposed to do—if it lasts—do you believe there will be a human being around to witness, let alone mourn its passing, to appreciate its accomplishment, its faithfulness, its immense antiquity? What about five thousand years from now, or even five hundred? Can you extend the horizon of your expectations for our world, for our complex of civilizations and cultures, beyond the lifetime of your own children, of the next two or three generations? Can you even imagine the survival of the world beyond the present presidential administration?
I was surprised, when I read about the Clock of the Long Now, at just how long it had been since I had given any thought to the state of the world ten thousand years hence. At one time I was a frequent visitor to that imaginary mental locale. And I don’t mean merely that I regularly encountered “the Future” in the pages of science fiction novels or comic books, or when watching a TV show like The Jetsons (1962) or a movie like Beneath the Planet of the Apes (1970). The story of the Future was told to me, when I was growing up, not just by popular art and media but by public and domestic architecture, industrial design, school textbooks, theme parks, and by public institutions from museums to government agencies. I heard the story of the Future when I looked at the space-ranger profile of the Studebaker Avanti, at Tomorrowland through the portholes of the Disneyland monorail, in the tumbling plastic counters of my father’s Seth Thomas Speed Read clock. I can remember writing a report in sixth grade on hydroponics; if you had tried to tell me then that by 2005 we would still be growing our vegetables in dirt, you would have broken my heart.
Even thirty years after its purest expression on the covers of pulp magazines like Amazing Stories and, supremely, at the New York World’s Fair of 1939, the collective cultural narrative of the Future remained largely an optimistic one of the impending blessings of technology and the benevolent, computer-assisted meritocracy of Donald Fagen’s “fellows with compassion and vision.” But by the early seventies—indeed from early in the history of the Future—it was not all farms under the sea and family vacations on Titan. Sometimes the Future could be a total downer. If nuclear holocaust didn’t wipe everything out, then humanity would be enslaved to computers, by the ineluctable syllogisms of “the Machine.” My childhood dished up a series of grim cinematic prognostications best exemplified by the Hestonian trilogy that began with the first Planet of the Apes (1968) and continued through The Omega Man (1971) and Soylent Green (1973). Images of future dystopia were rife in rock albums of the day, as on David Bowie’s Diamond Dogs (1974) and Rush’s 2112 (1976), and the futures presented by seventies writers of science fiction such as John Brunner tended to be unremittingly or wryly bleak.
In the aggregate, then, stories of the Future presented an enchanting ambiguity. The other side of the marvelous Jetsons future might be a story of worldwide corporate-authoritarian technotyranny, but the other side of a post-apocalyptic mutational nightmare landscape like that depicted in The Omega Man was a landscape of semi-barbaric splendor and unfettered (if dangerous) freedom to roam, such as I found in the pages of Jack Kirby’s classic adventure comic book Kamandi, The Last Boy on Earth (1972-76). That ambiguity and its enchantment, the shifting tension between the bright promise and the bleak menace of the Future, was in itself a kind of story about the ways, however freakish or tragic, in which humanity (and by implication American culture and its values however freakish and tragic) would, in spite of it all, continue. Eed plebnista, intoned the devolved Yankees, in the Star Trek episode “The Omega Glory,” who had somehow managed to hold on to and venerate as sacred gobbledygook the Preamble to the Constitution, norkon forden perfectunun. All they needed was a Captain Kirk to come and add a little interpretive water to the freeze-dried document, and the American way of life would flourish again.
I don’t know what happened to the Future. It’s as if we lost our ability, or our will, to envision anything beyond the next hundred years or so, as if we lacked the fundamental faith that there will in fact be any future at all beyond that not-too-distant date. Or maybe we stopped talking about the Future around the time that, with its microchips and its twenty-four-hour news cycles, it arrived. Some days when you pick up the newspaper it seems to have been co-written by J. G. Ballard, Isaac Asimov, and Philip K. Dick. Human sexual reproduction without male genetic material, digital viruses, identity theft, robot firefighters and minesweepers, weather control, pharmaceutical mood engineering, rapid species extinction, US Presidents controlled by little boxes mounted between their shoulder blades, air-conditioned empires in the Arabian desert, transnational corporatocracy, reality television—some days it feels as if the imagined future of the mid-twentieth century was a kind of checklist, one from which we have been too busy ticking off items to bother with extending it. Meanwhile, the dwindling number of items remaining on that list—interplanetary colonization, sentient computers, quasi-immortality of consciousness through brain-download or transplant, a global government (fascist or enlightened)—have been represented and re-represented so many hundreds of times in films, novels and on television that they have come to seem, paradoxically, already attained, already known, lived with, and left behind. Past, in other words.
This is the paradox that lies at the heart of our loss of belief or interest in the Future, which has in turn produced a collective cultural failure to imagine that future, any Future, beyond the rim of a couple of centuries. The Future was represented so often and for so long, in the terms and characteristic styles of so many historical periods from, say, Jules Verne forward, that at some point the idea of the Future—along with the cultural appetite for it—came itself to feel like something historical, outmoded, no longer viable or attainable.
If you ask my eight-year-old about the Future, he pretty much thinks the world is going to end, and that’s it. Most likely global warming, he says—floods, storms, desertification—but the possibility of viral pandemic, meteor impact, or some kind of nuclear exchange is not alien to his view of the days to come. Maybe not tomorrow, or a year from now. The kid is more than capable of generating a full head of optimistic steam about next week, next vacation, his tenth birthday. It’s only the world a hundred years on that leaves his hopes a blank. My son seems to take the end of everything, of all human endeavor and creation, for granted. He sees himself as living on the last page, if not in the last paragraph, of a long, strange and bewildering book. If you had told me, when I was eight, that a little kid of the future would feel that way—and that what’s more, he would see a certain justice in our eventual extinction, would think the world was better off without human beings in it—that would have been even worse than hearing that in 2006 there are no hydroponic megafarms, no human colonies on Mars, no personal jetpacks for everyone. That would truly have broken my heart.
When I told my son about the Clock of the Long Now, he listened very carefully, and we looked at the pictures on the Long Now Foundation’s website. “Will there really be people then, Dad?” he said. “Yes,” I told him without hesitation, “there will.” I don’t know if that’s true, any more than do Danny Hillis and his colleagues, with the beating clocks of their hopefulness and the orreries of their imaginations. But in having children—in engendering them, in loving them, in teaching them to love and care about the world—parents are betting, whether they know it or not, on the Clock of the Long Now. They are betting on their children, and their children after them, and theirs beyond them, all the way down the line from now to 12,006. If you don’t believe in the Future, unreservedly and dreamingly, if you aren’t willing to bet that somebody will be there to cry when the Clock finally, ten thousand years from now, runs down, then I don’t see how you can have children. If you have children, I don’t see how you can fail to do everything in your power to ensure that you win your bet, and that they, and their grandchildren, and their grandchildren’s grandchildren, will inherit a world whose perfection can never be accomplished by creatures whose imagination for perfecting it is limitless and free. And I don’t see how anybody can force me to pay up on my bet if I turn out, in the end, to be wrong.
Written by Michael Chabon for Details. Originally published in 02006.
Share this on knowasiak.com to discuss with people on this topicsign up on Knowasiak.com now if you’re not registered yet.
If we manage to avoid a large catastrophe, we are living at the early beginnings of human historyMarch 15, 2022The point of this text is not to predict how many people will ever live. What I learned from writing this post is that our future is potentially very, very big. This is what I try to…
Google Analytics 4 was the perfect choice in understanding and improving our new e-commerce app. Maxwell Petitjean Head of Product Insights, Gymshark Improve ROI with data-driven attributionUse data-driven attribution to analyze the full impact of your marketing across the customer journey. It assigns attribution credit to more than just the last click using your Analytics…
Kubernetes is ubiquitous in container orchestration, and its popularity has yet to weaken. This does, however, not mean that evolution in the container orchestration space is at a stand-still. This blog will put forward some arguments for why Kubernetes users, and developers, in particular, should look beyond the traditional Kubernetes we have learned over the…
Mixed, at best. Non-existent at worstONE NON-HUMAN casualty of Russia’s invasion of Ukraine seems likely to be at least some of its scientific collaborations with other countries, starting with those involving Roscosmos, its state-owned space corporation. On February 25th Josef Aschbacher, head of the European Space Agency (ESA), tweeted that all partnerships between his organisation…
According to the Coreboot camp, future Intel systems with FSP 3.0 and Universal Scalable Firmware (USF) will be even less friendly for open-source system firmware. Coreboot developer Philipp Deppenwiese “Zaolin” who is CEO of German-based security firm Immune and founder of 9elements Security and heavily involved in the open-source firmware scene shared some bad news…
cthulix.com index preview status consulting preview Cthulix: the future of distributed-computing. Pitch (Cthulix MPI) is as (Unix Multics) Introduction Think of Cthulix as a toolkit for building event-driven distributed systems that scale well. Conventional techniques grow infrastructure around algorithms. In Cthulix, algorithms grow into an infrastructure. Build To Scale This is how most businesses create…
Explorers into unknown territory face plenty of risks. One that doesn’t always get the attention it deserves is the possibility that they know less about the country ahead than they think. Inaccurate maps, jumbled records, travelers’ tales that got garbled in transmission or were made up in the first place: all these and more have laid their share of traps at the feet of adventurers on their way to new places and accounted for an abundance of disasters. As we make our way willy-nilly into that undiscovered country called the future, a similar rule applies.
Christopher Columbus, when he set sail on the first of his voyages across the Atlantic, brought with him a copy of The Travels of Sir John Mandeville, a fraudulent medieval travelogue that claimed to recount a journey east to the Earthly Paradise across a wholly imaginary Asia, packed full of places and peoples that never existed. Columbus’ eager attention to that volume seems to have played a significant role in keeping him hopelessly confused about the difference between the Asia of his dreams and the place where he’d actually arrived. It’s a story more than usually relevant today, because most people nowadays are equipped with comparable misinformation for their journey into the future, and are going to end up just as disoriented as Columbus.
I’ve written at some length already about some of the stage properties with which the Sir John Mandevilles of science fiction and the mass media have stocked the Earthly Paradises of their technofetishistic dreams: flying cars, space colonies, nuclear reactors that really, truly will churn out electricity too cheap to meter, and the rest of it. (It occurs to me that we could rework a term coined by the late Alvin Toffler and refer to the whole gaudy mess as Future Schlock.) Yet there’s another delusion, subtler but even more misleading, that pervades current notions about the future and promises an even more awkward collision with unwelcome realities.
That delusion? The notion that we can decide what future we’re going to get, and get it.
It’s hard to think of a belief more thoroughly hardwired into the collective imagination of our time. Politicians and pundits are always confidently predicting this or that future, while think tanks earnestly churn out reports on how to get to one future or how to avoid another. It’s not just Klaus Schwab and his well-paid flunkeys at the World Economic Forum, chattering away about their Orwellian plans for a Great Reset; with embarrassingly few exceptions, from the far left to the far right, everyone’s got a plan for the future, and acts as though all we have to do is adopt the plan and work hard, and everything will fall into place.
What’s missing in this picture is any willingness to compare that rhetoric to reality and see how well it performs. Over the last century or so we’ve had plenty of grand plans that set out to define the future, you know. We’ve had a War on Poverty, a War on Drugs, a War on Cancer, and a War on Terror, just for starters—how are those working out for you? War was outlawed by the Kellogg-Briand Pact in 1928, the United States committed itself to provide a good job for every American in the Full Employment and Balanced Growth Act of 1978, and of course we all know that Obamacare was going to lower health insurance prices and guarantee that you could keep your existing plan and physician. Here again, how did those work out for you?
This isn’t simply an exercise in sarcasm, though I freely admit that political antics of the kind just surveyed have earned their share of ridicule. The managerial aristocracy that came to power in the early twentieth century across the industrial world defined its historical mission as taking charge of humanity’s future through science and reason. Rational planning carried out by experts guided by the latest research, once it replaced the do-it-yourself version of social change that had applied before that point, was expected to usher in Utopia in short order. That was the premise, and the promise, under which the managerial class took power. With a century of hindsight, it’s increasingly clear that the premise was quite simply wrong and the promise was not kept.
Could it have been kept? Very few people seem to doubt that. The driving force behind the popularity of conspiracy culture these days is the conviction that we really could have the glossy high-tech Tomorrowland promised us by the media for all these years, if only some sinister cabal hadn’t gotten in the way. Exactly which sinister cabal might be frustrating the arrival of Utopia is of course a matter of ongoing dispute in the conspiracy scene; all the familiar contenders have their partisans, and new candidates get proposed all the time. Now that socialism is back in vogue in some corners of the internet, for that matter, the capitalist class has been dusted off and restored to its time-honored place in the rogues’ gallery.
There’s a fine irony in that last point, because socialist management was no more successful at bringing on the millennium than the capitalist version. Socialism, after all, is the extreme form of rule by the managerial aristocracy. It takes power claiming to place the means of production in the hands of the people, but in practice “the people” inevitably morphs into the government, and that amounts to cadres drawn from the managerial class, with their heads full of the latest fashionable ideology and not a single clue about how things work outside the rarefied realm of Hegelian dialectic. Out come the Five-Year Plans and all the other impedimenta of central planning…and the failures begin to mount up. Fast forward a lifetime or so and the Workers’ Paradise is coming apart at the seams.
A strong case can be made, in fact, that managerial socialism is one of the few systems of political economy that is even less successful at meeting human needs than the managerial corporatism currently staggering to its end here in the United States. (That’s why it folded first.) The differences between the two systems are admittedly not great: under managerial socialism, the people who control the political system also control the means of production, while under managerial corporatism, why, it’s the other way around. Thus I suggest it’s time to go deeper, and take a hard look at the core claim of both systems—the notion that some set of experts or other, whether the experts in question are corporate flacks or Party apparatchiks, can make society work better if only they have enough power and the rest of us shut up and do what we’re told.
That claim is more subtle and more problematic than it looks at first glance. To make sense of it, we’re going to have to talk about the kinds of knowledge we can have about the world.
The English language is unusually handicapped in understanding the point I want to make, because most languages have two distinct words for the kinds of knowledge we’ll be talking about, and English has only one word—“knowledge”—that has to do double duty for both of them. In French, for example, if you want to say that you know something, you have to ask yourself what kind of knowledge it is. Is it abstract knowledge based on an intellectual grasp of principles? Then the verb you use is connaître. Is it concrete knowledge based on experience? Then the verb you use is savoir. Colloquial English has tried to fill the gap by coining the phrases “book learning” and “know-how,” and we’ll use these for now.
The first point that needs to be made here is that these kinds of knowledge are anything but interchangeable. If you know about cooking, say, because you’ve read lots of books on the subject and can easily rattle off facts at the drop of a hat, you have book learning. If you know about cooking because you’ve done a lot of it and can whip up a tasty meal from just about anything, you have know-how. Those are both useful kinds of knowledge, but they’re useful in different contexts, and one doesn’t convert readily into the other. You can know lots of facts about cooking and still be unable to produce an edible meal, for example, and you can be good at cooking and still be unable to say a meaningful word about how you do it.
We can sum up the two kinds of knowledge we’re discussing in a simple way: book learning is abstract knowledge, and know-how is concrete knowledge.
Let’s take a moment to make sense of this. Each of us, in earliest infancy, encounters the world as a “buzzing, blooming confusion” of disconnected sensations, and our first and most demanding intellectual challenge is the process that Owen Barfield has termed “figuration”—the task of assembling those sensations into distinct, enduring objects. We take an oddly shaped spot of bright color, a smooth texture, a kinesthetic awareness of gripping and of a certain resistance to movement, a taste, and a sense of satisfaction, and assemble them into an object. It’s the object we will later call “bottle,” but we don’t have that connection between word and experience at first. That comes later, after we’ve mastered the challenge of figuration.
So the infant who can’t yet speak has already amassed a substantial body of know-how. It knows that this set of sensations corresponds to this object, which can be sucked on and will produce a stream of tasty milk; this other set corresponds to a different object, which can be shaken to produce an entertaining noise, and so on. When you see an infant looking out with that odd distracted look so many of them have, as though it’s thinking for all it’s worth, you’re not mistaken—that’s exactly what it’s doing. Only when it has mastered the art of figuration, and gotten a good basic body of know-how about its surroundings, can it get to work on the even more demanding task of learning how to handle abstractions.
That process inevitably starts from the top down, with very broad abstractions covering vast numbers of different things. That’s why, at a certain stage in a baby’s growth, all four-legged animals are “goggie” or something of the same sort; later on, the broad abstractions break up, first into big chunks and then into smaller ones, until finally you’ve got a child with a good general vocabulary of abstractions. The process of figuration continues; in fact, it goes on throughout life. Most of us are good enough at it by the time of our earliest memories that we don’t even notice how quickly we do it. Only in special cases do we catch ourselves at it—certain optical illusions, for example, can be figurated in two competing ways, and consciously flipping back and forth between them lets us see the process at work.
All this makes the relationship between figurations and abstractions far more complex than it seems. Since each abstraction is a loosely defined category marked by a word, there are always gray areas and borderline cases, like those plants that are right on the line between trees and shrubs. The situation gets much more challenging, however, because abstractions aren’t objective realities. We don’t get handed them by the universe. We invent them to make sense of the figurations we experience, and that means our habits, biases, and individual and collective self-interest inevitably flow into them. That would be problematic even if figurations and abstractions stayed safely distinct from one another, but they don’t.
Once a child learns to think in abstractions, the abstractions they learn begin to shape their figurations, so that the world they experience ends up defined by the categories they learn to think with. That’s one of the consequences of language—and it’s also one of the reasons why book learning, which consists entirely of abstractions, is at once so powerful and so dangerous: your ideology ends up imprinting itself on your experience of the world. There’s a further mental operation that can help you past that; it’s called reflection, and involves thinking about your thinking, but it’s hard work and very few people do much of it, and the only kind that’s popular in an abstraction-heavy society — the kind where you check your own abstractions against an approved set to make sure you don’t think any unapproved thoughts — just digs you in deeper. As a result, most people go through their lives never noticing that their worlds are being defined by an arbitrary set of categories with which they’ve been taught to think.
Here are some examples. Many languages have no word for “orange.” People who grow up speaking those languages see the lighter shades of what we call “orange” as shades of yellow, and the darker shades as shades of red. They don’t see the same world we do, since the abstractions they’ve learned to think with sort out their figurations in different ways. In some Native American languages, some colors are “wet” and others are “dry,” and people who grow up speaking those languages experience colors as being more or less humid; the rest of us don’t. Then there’s Chinook jargon, the old trade language of the Pacific Northwest, which was spoken by native peoples and immigrants alike until a century ago. In that language, there are only four colors: tkope, which means “white;” klale, which means “dark;” pil, which means red or orange or yellow or brightly colored; and spooh, which means “faded,” like sun-bleached wood or a pair of old blue jeans. Can you see a cherry and a lemon as being shades of the same color? If you’d grown up speaking Tsinuk wawa from earliest childhood, you would.
Those examples are harmless. Many other abstractions are not, because privilege and power are among the things that guide the shaping of abstract knowledge, and when education is controlled by a ruling class or a governmental bureaucracy, the abstractions people learn veer so far from experience that not even heroic efforts at figuration can bridge the gap. In the latter days of the Soviet Union, to return to an earlier example, the abstractions flew thick and fast, painting the glories of the Workers’ Paradise in gaudy colors, and insisting that any delays in the onward march of Soviet prosperity would soon be fixed by the skilled central planning of managerial cadres. Meanwhile, for the vast majority of Soviet citizens, life became a constant struggle with hopelessly dysfunctional bureaucratic systems, and waiting in long lines for basic necessities was an everyday fact of life.
None of that was accidental. The more tightly you focus your educational system on a set of approved abstractions, and the more inflexibly you assume that your ideology is more accurate than the facts, the more certain you can be that you will slam headfirst into one self-inflicted failure after another. The Soviet managerial aristocracy never grasped that, and so the burden of dealing with the gap between rhetoric and reality fell entirely on the rest of the population. That was why, when the final crisis came, the descendants of the people who stormed the Winter Palace in 1917, and rallied around the newborn Soviet state in the bitter civil war that followed, simply shrugged and let the whole thing come crashing down.
We’re arguably not far from similar scenes here in the United States, for the same reasons: the gap between rhetoric and reality gapes just as wide in Biden’s America as it did in Chernenko’s Soviet Union. When a ruling class puts more stress on using the right abstractions than on getting the right results, those who have to put up with the failures—i.e., the rest of us—withdraw their loyalty and their labor from the system, and sooner or later, down it comes.
In the meantime, as we all listen to the cracking noises coming up from the foundations of American society, I’d like to propose that we consider the possibility that the future cannot be managed, and that all those plans and programs and grand agendas are by definition on their way to the same dumpster as the Five-Year Plans of the Soviet Union and the various Wars on Abstract Nouns proclaimed by the United States. Coming up with a plan is easy; getting people to do anything about it is hard; getting future events to cooperate—well, you can do the math as well as I can. It’s already clear to anyone who’s paying attention that we’re not going to get the Tomorrowland future bandied about for so many years by the pundits and marketing flacks of the corporate state: the flying cars, spaceships, nuclear power plants, and the rest of it have all been tried and all turned out to be white elephants, hopelessly overpriced for the limited value they provide. Maybe it’s time to consider the possibility that no other grand plan will work any better.
Does that mean that we shouldn’t prepare for the future? Au contraire, it means that we can do a much better job of preparing for the futures. There’s not just one of them, you see. There never is. The same habit of bad science fiction writers that editors used to lampoon with the phrase “It was raining on Mongo that Monday”—really? The same weather, all over an entire planet?—pervades current notions of “the” future. Choose any year in the past and look at what happened over the next decare or two to different cities and countries and continents, and you’ll find that their futures were unevenly distributed: some got war and some got peace, growth in one place was matched by contraction in another, and the experience of any decade you care to name was radically different depending on where you experienced it. That’s one of the things that the managerial aristocracy, with its fixation on abstract knowledge, reliably misses.
We know some things about the range of possible futures ahead of us. We know that fossil fuels and other nonrenewable resources are going to be increasingly expensive and hard to get by historic standards; we know that the impact of decades of unrestricted atmospheric pollution will continue to destabilize the climate and drive unpredictable changes in rainfall and growing seasons; we know that the infrastructure of the industrial nations, which was built under the mistaken belief that there would always be plenty of cheap energy and resources, will keep on decaying into uselessness; we know that habits and lifestyles dependent on the extravagant energy and resource usage that was briefly possible in the late twentieth century are already past their pull date. These things are certain—but they don’t tell us that much. What technologies and social forms will replace the clanking anachronism of industrial society over the decades immediately ahead? We don’t know that, and indeed we can’t know it.
We can’t know it because the future is not an abstraction. It’s not something neat and manageable that can be plotted in advance by corporate functionaries and ordered for just-in-time delivery. It’s an unknown region, and our preconceptions about it are the most important obstacle in the way of seeing it for what it is. That is to say, if you’re setting out to explore unfamiliar territory, deciding in advance what you’re going to find there and marching off in a predetermined direction to find it is a great way to end up neck-deep in a swamp as the crocodiles close in.
If you want a less awkward end to your great adventure, try heading into the unknown with eyes and ears open wide, pay attention to what’s happening around you even (or especially) when it contradicts your beliefs and presuppositions, and choose your path based on what you find rather than what you think has to be there. Choose your tools and traveling gear so that it can cope with as many things as possible, and when you pick your companions, remember that know-how is much more useful than book learning. That way you can travel light, meet the unexpected with some degree of grace, and have a better chance of finding a destination that’s worth reaching.
NOW WITH OVER +8500 USERS. people can Join Knowasiak for free. sign up on Knowasiak.com
E-Bike technology is amazing. How will it improve, and how will we use it? Thanks to Steve Anderson for his engineering tweets and for answering a tire question. Any mistakes are my own!
Why People Love E-Bikes
The best feeling on a bike is cruising down a hill, barely pedaling, and taking in the surroundings. E-bikes are like that all the time. “How do you know if someone has an e-bike? They’ll tell you!” goes the joke.
E-Bike Design for the Rest of Us
E-bikes range in price from $1000 to $10,000. The market is still lacking a Toyota Corolla equivalent.
Radical Design Simplification
Hybrid-electric cars are an engineering abomination. They have both gas and electric powertrains, ensuring they are more expensive and complicated than their competitors. Most e-bikes are the same, carrying both a mechanical and an electric powertrain.
What can we do to get a people’s e-bike that has 30 miles of range, requires no maintenance, travels 15-20 mph, and retails around $500?
If we have an electric drive, there is no need for a mechanical one. A German manufacturer named Schaeffler recently released its “Free Drive”. Pedaling turns a motor instead of a chain and powers the bike or charges the batteries (electric motors run in reverse generate electricity). The system is about 5% less efficient than a chain drive, but e-bikes use battery power, so it’s no sweat for the rider. The motors powering the bike are simple in-wheel hub motors that are cheap and efficient.
Many current e-bikes have fancy carbon belts, pricey mid-drive motors attached to the belt/chain, and expensive torque sensors that sense how fast you are pedaling to modulate the speed. Instead, you could have a Free Drive system with an inexpensive motor that generates electricity proportional to how hard you pedal, providing a signal to the bike.
DC motors have fallen out of favor for “brushless DC motors.” BLDC motors have a longer life, better low-end torque, and are easy to control electronically. These are a good solution, but improvements in motor technology driven by automotive R+D should trickle down.
The trend will be lighter motors that use more common materials. They will hand more control off to software, allowing physical simplification.
Hub motors come in two varieties – geared and direct drive. Direct drive motors are usually twice the weight of a geared motor because they need more magnetic material to operate at low RPMs. Their performance on hills is inefficient.
Geared hub motors spin at much higher RPMs and use planetary gears to turn the wheel at rider-friendly RPMs. These motors are cheaper and perform better on hills, but several common design flaws remain.
The planetary gears are usually plastic to limit sound and vibration problems from poor quality. They have short lifetimes. Better quality gears like all-metal helical cut gears solve this problem.
Use of “Freewheels”
Second, most geared hub motors have a “freewheel” that helps to coast. This one-way clutch is a waste. More efficient electric motors have less drag, and the battery can provide the extra juice, regardless. The freewheel also prevents regenerative braking.
Basic models might have one motor, while faster models have motors in both wheels.
Who needs brakes, am-I-right? E-bikes might not need brakes. The motor can do light regenerative braking. Because bikes are lightweight with terrible aerodynamics, regenerative braking does not add much extra range. But the 5%-10% gained still matters, and it provides low-temperature braking that doesn’t wear out.
For emergency braking AC motors, there is a method called DC injection braking. Direct current from the batteries can be injected into the AC motors causing them to lock up. DC motors can do something called plugging or reverse current braking. These methods might end up being safer since they can be done electronically, like anti-lock brakes in cars.
The concept of deleting brakes is pretty raw. Most DC injection braking and plugging applications are on large, fixed motors. Replacing mechanical brakes with motor braking would require a long history of field use before it would be prudent.
Bicycles are already absurdly efficient. E-bike battery packs are small as a result. Their small size means a less energy-dense battery might be better if it obviates the need for a more complex battery management system (BMS).
E-bikes should use whatever chemistry is available, cheap, and durable. Right now, that is lithium iron phosphate (LFP). LFP has more cycles and handles 100% charge better than lithium-cobalt-nickel formulations. It is also less prone to fires. The battery cells in a bike would weigh 2-4 pounds. The controller, pack, and BMS would probably have more mass.
Whether the battery should be removable for charging is a debate. The heavier the bike is, the more removable batteries matter. I lean towards making it removable even if it adds some complexity and cost.
In the future, designs might use cheaper, less energy-dense battery chemistries like sodium-ion. Others might maintain weight to increase range as battery energy density improves.
The faster the rider pedals, the more electricity they create. The motor controller translates the power the rider generates into a set speed. The rider can pedal backward to signal the bike to brake, again with the bike controllers getting signals from how hard the pedaling is. A Bluetooth-linked phone can display any bike metrics desired, eliminating a screen and extra wiring.
The bicycle design we all know today was known as a “safety bicycle.” The center of gravity is low and between the wheels, improving stability and braking efficiency. Larger wheels help reduce bumps. Bicycle frames on bikes without chain drives have more design flexibility, but physics will keep them from straying far.
New frame designs need to accommodate batteries and power electronics within the frame and allow easy installation of those parts. Aluminum is a good compromise between cost and weight. Carbon fiber and titanium are too expensive, while steel is too heavy. The bike needs to be reasonable to lift.
Maybe the worst part about hub motors is changing tires. Bike tires get less than 1/10 the miles that car tires do. It makes sense to spend a little more money on more durable tires. Again, we can trade some rolling resistance for durability if needed. We won’t get car tire runs, but they can be a lot better than 2000-3000 miles.
Falling Component Prices
Motors, batteries, and power electronics make up a large portion of e-bike costs. The blitz in automative scale and R+D is dropping the price of these components rapidly. Chains, derailleurs, and belts are not decreasing in cost as fast.
Deleting the mechanical drive train saves weight. The weight of the components should fall with improving technology, as well. Current e-bikes are very heavy unless you are paying $5000+ for a fancy, lightweight design. The weight penalty can shrink.
Design in Quality
Most bikes are sold as toys or as overcomplicated hobbyist contraptions. A Toyota Corolla analog needs fewer parts of better quality.
The current bicycle supply chain optimizes for producing poor-quality bikes in small lots (<500). Our bikes need mass-produced automotive-grade parts that will last a lifetime instead.
Van Moof is a prem