Parvisgo blog

With the widespread use of LLMs could we be in danger of creating an echo-chamber of AI generated code? To train all sate of the art models; ChatGPT, Claude, Grok extensive amount of human made content and code was needed (besides the insane amount of compute). A substantial amount of knowledge, preparation and trial-and-error went into making these codebases that provide utility to people and other developers. All this content was gathered from the internet and sometimes from not-so-public sources. 1 Despite some initial skepticism as the AI space still enjoys record levels 2 of investments the quality of coding assassinates is steadily increasing. This advancement seems to reach a plato soon at the same time adoption is becoming more widespread. 3 Argument about plato + learning from generated content leads to decrease in performance Content slop and it’s spread The question emerges about the medium- and long-term future of the code scene. How long can the current momentum be sustained and what implications does the current trends carry? The current generation of AI models are practically next token generators, statistical predictors of the next words that follow in the sequence. Despite this limitations the achievements of the 8 year old transformer architecture are still impressive. However, these models are only capable to spew back the content that they were trained on. 4 Tool use seems to improve on these limitations ( e.g.: by using search ) The mind-blowing amount of data made these models seem smart and innovative. It tackles simple coding tasks with easy. In a few seconds it creates a pac-man clone application using React. But what about a new version of React where a change in the API was introduced or a new syntax was added? There was no training data available. Will the agent be able to generate the needed code? Apparently not. 5 Could the consequence become that new APIs and framework’s adoption will slow down significantly or even become impossible? Going on with this argument, will this limit the emergence of new programming languages as well? If there is no steadily available content to teach an LLM a new language could it’s adoption completely be blocked, or will it just begin slowly until enough content to teach the model is available? We can create a small experiments about this; e.g.: by checking the most popular languages and frameworks on StackOverflow or Github and check at what point the LLMs capability doesn’t reach the bar of a usable or working code. example The next point is about human incentives. If the widely used AI is capable to generate usable applications with the current APIs but it can’t adapt to changes, will business become reluctant to make changes? If the introduction of anything new blocks its’s adoption, will this create a gap in the market, a slowdown or block it’s spread? Could this cement commonly used frameworks and APIs, if the cost down the line is a slower development? Security and spread of bad code Positive possibilities: Questions: If we assume that the use of coding assistants become more and more widespread as we more rely on it and generate more-and-more code, could the quality of the generated code degrade? In each new iteration of a cutting edge model more and more content from the internet is included can a tipping point be reached where the abilities of an LLM no longer improve but head into the opposite direction. ...