Industry

Why Open Source Matters

"Hey Nick, can you write a quick blog post on why Open Source matters?"

Yes sure, I'll knock out a "quick post" on a phrase and a culture that's defined my entire career and dominated most aspects of my professional career.

Anyway, before we can discuss why it matters, we need to define what we mean when we say free software and more specifically, Open Source. The “four freedoms” as originally stated by Richard Stallman and the Free Software Foundation (FSF) in the mid-’80s set the scene and the expectations quite clearly:

  • The freedom to run the program as you wish, for any purpose (freedom 0).
  • The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
  • The freedom to redistribute copies so you can help others (freedom 2).
  • The freedom to distribute copies of your modified versions to others (freedom 3). Doing this gives the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

A program can be considered “free software” if it provides users with all of these freedoms. The Open Source Initiative (OSI) also has a definition of Open Source, however not everyone aligns themselves with the OSI (or the FSF), so while in theory that should be canon, the reality is it's quite often not the case. Some of these are straight-up "don't be a dick", but others are a little more nuanced. Let's get my bias out of the way - I do align myself with the OSI and so I do believe this definition to be the standard by which a given open-source licence and its subsequent application to a project should be held. However, I appreciate we also must be pragmatic, so let's talk about Open Source's upsides. But first, a history lesson.

History

Early in my career, the professional world was largely dominated by proprietary software vendors, in particular the likes of Microsoft, IBM, Oracle and so on. Commercial off-the-shelf - or COTS - was standard, and outside of academia Open Source offerings hadn't caught on, beyond research projects. Found a bug? Raise a support ticket and wait. Want a new feature? Same, unless you had especially deep pockets. Security patches? You were placing the utmost trust in your vendor that the patches they provided fixed the problem and didn't introduce any new ones. Need to be able to debug exactly what's going on in your stack? You could only really get so far before the closed-source nature of the software meant you were guessing based on the outcome and the behaviour of the black box you were interacting with.

It was a very different time indeed, and this was also pre-Google, pre-GitHub, and many many other sites which we take for granted and which have helped facilitate the notion of social coding and the proliferation of software and projects shared amongst like-minded individuals. Open source still existed as a platform for developers to collaborate globally, but it was much more limited to a smaller, more technical community.

In 1998 the term "open source" was formalised by a group of folks involved in the free software movement as a reaction to Netscape's release of their source code. The OSI was founded shortly afterwards, and suddenly people involved in this movement and those on the periphery had a cause around which they could rally, and finally the beginnings of a definition to be able to point as well as an organisation that would help extoll the virtues of why open source software (OSS) mattered.

It's tempting to gaze back through rose-tinted spectacles at this simpler time in which largely welcoming technical cliques proliferated, but we had a fight on our hands when it came to arguing the strength of OSS in any commercial context i.e. the enterprise. Legitimate organisations like the OSI helped, and it's hard to ignore the fact that there was, at the time, a counter-culture aspect to a lot of this. As a young enthusiast who believed and who could demonstrate how much better OSS could be, the fight against the commercial leviathans and the desire to rail against everything they stood for was too tempting to ignore. It felt cool to be a part of this. However, practically speaking the arguments for OSS today remain just the same as they were back then - even if it’s no longer considered counter-culture.

Benefits of Open Source

Security and Transparency

As true in the 90s as it is today, OSS software and its transparency offer an incredible leg-up for those wanting to learn more and provide an opportunity for folks to be able to improve the code. That transparency is also foundational to the unparalleled security of OSS, since now anyone can verify the code for security, privacy, and compliance, rather than relying on closed-source vendors and their fallacious End User Licensing Agreements (EULAs) which always absolve them from any responsibility as far as the reliability or basic functionality of their software is concerned.

Cost Effectiveness

The easy one to argue for back in the day, and quite often the point that got OSS's foot in the door at companies large and small. Free software is free ("only if your time is worthless!" as the joke goes), but it's a tempting one especially as a lot of enterprise software vendors continually squeeze their customers for more money, holding new features and fixes ransom unless you upgrade to - and pay for - the latest release. Not to mention the corporate malware which must be deployed to keep track of all that software, or the byzantine rules which are designed only to confuse.

Community and Ecosystem

I've touched on these points briefly, but today what started as something of a counterculture is now the de facto way of developing software. Every large organisation and every software vendor that matters actively contributes to OSS projects. Some might argue that OSS actually even saved the likes of Microsoft in the long run. There exist numerous independent organisations to help manage conflicts of interest and to keep the ecosystem alive and healthy, such as the Linux Foundation or the OpenInfra Foundation.  When a company embraces OSS and contributes back upstream it can have huge benefits that may not be immediately obvious, both for the individuals as well as the organisations themselves - from recruitment through to a demonstration of technical provenance.

Customisation and Flexibility

Open source is especially important here as it gives individuals and companies control over their own destinies. All too often we see closed-source COTS vendors be acquired, only for the new parent company to change the licensing model or do away entirely with products, leaving companies that were previously depending on these for core, foundational aspects of their IT completely bereft and completely held to ransom. The Broadcom acquisition of VMware is a prime example of this. Adopting OSS means that this can never happen, even with a shift in licensing since there almost always exists the option to fork a project and take matters into your own hands.

Open Source and AI

How does all this relate to AI? Profoundly so, as it turns out, and largely for the same reasons as outlined above. Technologies such as large language models (LLMs) and their implementations in services such as ChatGPT have captured the world’s imagination and changed the face of technology for good.  Like it or not, these technologies are here to stay and it’s up to us to successfully push the boundaries even further and to leverage the results in a way that benefits us all.

OpenUK, a not-for-profit which exists to further the adoption of open technology in the UK, publishes an annual report on the “State of Open”, and the most recent edition speaks to government-level considerations of AI innovation and leadership.  The key points made argue in favour of openness, making it clear that research and endeavours in this area when performed under an open-source governance have the potential to benefit all - boosting the economy and helping to ensure ethical outcomes.  On the face of it the argument seems straightforward, however there are a number of areas which are challenging when it comes to adopting an Open Source model.

Challenges with Open Sourcing Large AI Models

Despite clear arguments in favour of open source, there still exists a lot of disagreement as to if and how it should be applied in the context of AI.  Open sourcing models democratises access to cutting-edge AI technologies, enabling researchers to be able to build on top and improve them in various contexts, and in an ideal world this would translate to faster innovation by way of community improvements.

Those opposed to open sourcing these models are typically concerned with misuse, especially with the potential to spread misinformation via deepfakes or through the creation of sophisticated malicious tools.

Data as a Key Component of AI Development

Typically, what we consider to be “AI” is made up of two key components - a dataset, and a model.  This is a gross oversimplification, for example, this paper defines 14 individual dimensions that should be considered including research and documentation, however for the purposes of this post two are sufficient. The former is used to ‘train’ the latter, and they should be considered distinct from a licensing perspective. ChatGPT is a perfect example - it’s trained on a large amount of public data which would fall under a variety of licences, but the model itself is entirely closed-source and proprietary. This in itself leads to many questions - if the model generates some example code, is there a licence that should be associated with it? Can this even be traced? While OpenAI themselves hold control over how the model is presented and subsequently what it’s capable of, by doing so they’re hindering the ability of other individuals and organisations to research and further contribute to the model’s capabilities and performance.  And of course, GPT itself is built on public open research from Google on its famous “Attention is all you need” paper.

Open Source AI Governance

Given how quickly this field is moving and how there remain many unanswered questions, it’s no surprise that organisations like the OSI are scrambling to develop new frameworks to help define what “Open Source AI” means, so, for example, there is a new “Open Source AI Definition” currently in draft here.  A lot of work remains to finalise these terms to provide the context and the framework necessary for enabling future open-source work in AI and related areas, much in the same way as it did back in the late ‘90s.

Some companies, notably Meta, are releasing their models ostensibly as “open source” but with restrictions in place, in LLaMa’s case by introducing restrictions in the applied licensing as well as the associated datasets used to train the model in the first place.  Whilst these are championed as open source, they don’t necessarily meet the criteria for it as such which can cause a great deal of confusion and undermine the true intended spirit and nature of open source itself.  Again, organisations like OpenUK - working with the UK Government - as well as bodies such as OSI are doing their best to challenge and ultimately work with companies such as Meta to find a suitable path forward.

Despite the tone of this blog post, I’m no open source zealot - I believe in being pragmatic and trying to find a common ground, but I do think we have to be careful especially when companies are using “open source” as a marketing term when in actual fact their approach is compromised in ways that undermine what open source should be about.

Conclusion

Open Source Software as a concept is undeniably one of the greatest force multipliers in the sphere of human knowledge, enabling unrestricted collaboration, innovation, and the sharing of ideas in a way that transcends borders and disciplines.  By making aspects such as source code, the tools, and the ideas free and accessible, open source creates an ecosystem where anyone, regardless of background or their resources, can learn, build upon, and contribute back to the work of others.

At Nscale, we’re strong believers in these very same concepts as it helps us commercially by providing a common ground to meet our customers in various ways, such as being able to demonstrate and contribute optimisations to common AI and ML frameworks and SDKs.  It also enriches our own engineering endeavours by working with upstream communities on the foundational projects of how we manage our infrastructure stack as well as the services we offer to our customers such as Kubernetes and Slurm.

In many ways, open source might be experiencing a mid-life crisis as it’s challenged with meeting the demands of AI as well as companies struggling to find a successful business model (often resulting in unpopular changes to their software’s licences), but I do believe that it’s here to stay and anything other than its continued adoption and pragmatic refinement would ultimately represent a regression for our industry.

Nick Jones
Head of Engineering
Bio

Nick is an experienced engineering leader with a career spanning over two decades across a wide variety of industries and sectors. As an OpenUK and a CNCF Ambassador he's heavily active in the Cloud Native and Open Infrastructure communities. He's passionate about new technologies and methodologies, especially those in relation to Open Source, virtualisation, orchestration, automation, and all forms of cloud computing.

Access thousands of GPUs tailored to your requirements.