All the pretty dark horses: examining the value relevance of traditional accounting metrics in valuation when it comes to the data analytics ecosystem
From the introduction of my finance master's thesis on how markets value knowledge capital
From a business perspective, notwithstanding certain classical approaches in the omerta, imagine the following rhetorical scenario: if a gun were held to your head and you were forced to give a number, to quantify something, how much would you say a relationship with the CIA is worth?
A billion? Five billion? Ten? Twenty? Palantir Technologies has one such relationship, and a rather strong one at that, given that the CIA’s venture arm In-Q-Tel was an original investor in the company. In the decades since then, the relationship has only grown stronger through deployments which have in turn themselves opened infinite doors across the defense establishment, created barriers to entry no competitor can replicate, and generated revenue streams with near-zero bankruptcy risk. Yet on their balance sheet, this relationship has a recorded value of exactly zero dollars.
Or how do you quantify the knowledge of how seven million developers worldwide know and use your query language daily? MongoDB spent fifteen years building not just a database engine but an entire ecosystem where developers think in MongoDB’s terms, build in MongoDB’s environment, and solve problems the MongoDB way. This collective knowledge and userbase, distributed across millions of professionals and their minds, creates collective switching costs measured in billions. The accounting value of this developer army? Also zero.
What price do you put on knowing what business as usual - that is business continuity, our normal, and what it looks like? Datadog watches thousands of companies’ infrastructure every second of every day.
They know when your servers are about to fail before you do. They have seen every type of crash, every pattern of degradation, every signal of impending disaster across every architecture that matters. This accumulated knowledge of what normal looks like, built from trillions of data points, makes their platform irreplaceable. Try finding that asset on their balance sheet, and if it is somewhere, finding it as or less than zero.
What about valuing the memory of every disaster? Verisk has fifty years of insurance loss data covering every hurricane, earthquake, flood, and catastrophe. No amount of money can buy you the ability to go back to 1971 and start collecting claims data yet they have it. Verisk’s repository of data and the actuarial models informing it have intel and have seen patterns no competitor can replicate because no competitor has the history nor the data as comprehensive as they do. This irreplaceable historical record in the form of data, sheer data, pure knowledge, which is also the knowledge upon which the entire insurance industry depends, is largely invisible in accounting terms yet again. What is the worth of seeing every cyberattack before it happens?
CrowdStrike’s Falcon platform has observed millions of attacks, learned every hacker’s signature, catalogued every vulnerability.4 Their threat intelligence improves with each attempted breach, making tomorrow’s protection better than today’s. This compound learning from years of attacks across thousands of enterprises has no book value on financial statements. We are in both a civilizational and industrial transition where the most valuable things are increasingly invisible and tucked away in the vast swathes of data that are being constantly aggregated and analyzed in a self-reinforcing system. Indeed, the ability to turn data into actionable intelligence, the abilities of algorithms that constantly learn and adapt, all of the sheer institutional knowledge embedded in software and reflected in its creation, the broader network effects of data integration, and the second and third-order effects of better decision-making - all of these aspects of knowledge capital now drive economic value.
Questions about knowledge capital are endless because the work knowledge capital does is endless.
How do you capture articulating the worth of preventing a terrorist attack on a balance sheet? How do you depreciate the value of optimizing a supply chain? How does one even begin to deconstruct data and its worth in numbers? Organized crime and its violence serve as a salient point of entry via analogy precisely because the stakes to both body and being are so high, much like how the rising tides of the stakes involving data take on in both society and business worldwide are also so high. This fundamental measurement gap in business between quantification and data alongside our knowledge of it motivates the specific research question this thesis addresses: Does accounting information explain share prices of intangible-intensive companies in the data analytics sector?
Consider how the tension is structural. For example, under American accounting principles, most internally generated intangibles including new code, model training, and curated datasets are expensed as incurred. Only narrow channels of internal-use software may be capitalized in lieu of being expensed. Even then, the line between expense and asset is differentiated based on development stage and implementation mechanics. International standards on this take on a different tone as capitalizing is possible for development when strict criteria are met, yet interestingly, recent guidance on this front makes clear that many cloud-related configuration and customization costs remain expenses to be expensed.. These are not minor technicalities or variations in style, but rather frameworks from accounting approaches which thus determine whether data and its engineering spend is seen or unseen, visible on the balance sheet or disappearing into historical income statements, and ultimately thus how tightly book values and earnings can anchor price accordingly.
As such, in this thesis, I operationalize value relevance as the cross-sectional association between prices and accounting variables, following standard capital markets research. The thesis design utilizes price-level regressions underpinned by residual income valuation theory, with per-share scaling to reduce scale effects and 10 enable comparability across firms. Prices are measured 90 days after quarter end so that the dependent variable reflects information public at the time of its measurement. I also include earnings per share (EPS), book value per share (BVPS), research and development expense per share (RDPS) as a proxy for knowledge investment, and recognized intangible assets per share (IntangPS). The analysis separates observations using research and development (R&D) intensity, defined as R&D expense over sales, with a five percent threshold for intangible-intensive firm-quarters.
By specifically using a sector sample of fifty-one publicly listed data analytics firms with quarterly observations from 2015 to 2025, I find that traditional accounting variables explain only 2.3% of price variation for intangible-intensive firms versus 11% for traditional firms. However, when older, established companies are excluded from the sample, the explanatory power for intangible-intensive firms jumps to 17.5%, an eight-fold increase. This pattern holds across alternative R&D intensity thresholds of three percent and ten percent. The results provide and preview a sharp answer In the pooled window, the link is weak for intangible-intensive observations, but once older and more mature firms are removed from the list, the same cohort of data analytics ecosystem companies explains roughly a third of price variation.
As such, that pattern says the issue is not that markets cannot value data businesses; it is that the mapping from statements to value changes with age as unrecognized knowledge capital compounds. In other words, we are watching the market try to price ways of knowing while traditional accounting is like trying to weigh ideas. The difference can read as being subtle but its epistemic infrastructure is deeply fundamentally different.
After all, we somehow know a business’s relationship with the CIA is indeed worth something (with an intuition that it is something rather significant) but what that is exactly fails to be quantified properly by the metrics we measure the planet, property, and equipment that would for example be represented alongside the rendering of the holdings where those business meetings with the CIA have taken place.
Given all this, this thesis study makes three contributions. One, it provides up-to-date, sector-specific evidence on value relevance in a setting where intangible investment is central to value creation. Two, it shows that measured value relevance depends critically on firm maturity and sample composition, which helps explain mixed findings in prior aggregate studies. Three, it connects specific design choices to well-known discussions in value relevance work, including scaling, the treatment of losses, and the role of recognized versus unrecognized intangibles. But beyond these technical contributions, this thesis is a curious exploration motivated by the desire to know how we measure value in the 21st century. Its philosophical and qualitative foundations are aiming to document what seem to the author as the early stages of a transition from an economy of things to an economy of thoughts.
Somewhere along the way, traditional and classical approaches to understanding these economies both work and don’t. The remainder of this thesis proceeds as follows: Section 2 is the literature review focusing on reviewing relevant theory and evidence, measurement methods, and along the way explains why price-level models fit the research question, and links known challenges such as scale and losses to this setting. Section 3 is the methodology which sets out how this thesis has been constructed and tested and its hypotheses. Section 4 then describes the data used to do so alongside providing the descriptive statistics. In turn, Section 5 reports the results of the aforementioned sections, while Section 6 contains robustness checks on this information. Finally, the thesis’s conclusion in Section 7 ends with a stance on its limitations as well as its implications for practice and standard setting.



This is a fascinating exploration of the knowldge capital valuation problem! The Verisk example is particularly compelling - that 50-year catastrophe data repository is indeed irreplaceable, and your point about not being able to go back to 1971 to start collecting is exactly right. The finding that traditional accounting explains only 2.3% of price variation for intangible-intensive firms versus 11% for traditional firms is striking. What I find most interesting is that explanatory power jumps to 17.5% when older firms are excluded - this suggests the market might be grappling with how to value legacy data assets that compound over time. The tension you highlight between expensing versus capitalizing data engineering spend is real: current accounting effectively treats Verisk's data curation efforts as if they have no future value, which seems fundamentally misaligned with economic reality. Looking forward to seeing how accounting standards might evolve to capture these knowledge assets more meaningfully.