The Artists Using Artificial Intelligence to Dream Up the Future of Music

Holly Herndon arrived in New York mid-April, just days after successfully defending her doctoral dissertation in composition at Stanford’s Center for Computer Research in Music and Acoustics. The avant-pop musician and newly-minted PhD will spend her time in the city shuttling between various interviews about her new album PROTO, then head to her adopted home of Berlin to rehearse the PROTO material with a six-member vocal ensemble. After that, she’ll be in New York again, with the ensemble, to kick off a tour that will also take her back to the Bay Area, with a stop in Los Angeles in between. Next, a show in Chicago. Then, at some point, maybe, a day off.

“It’s fucking hard,” she says about the travel. And it’s not cheap to jet around the world with a large group of backing musicians. “That’s why everyone is a DJ now,” she continues. “It’s so much more cost-effective. It’s so expensive to get work visas for Europeans.”

In addition to the vocalists, Herndon is also touring with a new collaborator—one that, incidentally, doesn’t require a work visa, a plane ticket, or even a paycheck at the end of the night. It’s a piece of machine learning software, called Spawn, which made its debut on PROTO. With the help of Spawn, the album charts a striking departure from Herndon’s previous work, grafting uncanny stutters and ululations onto her throbbing bass-heavy palette. Unlike penny-pinching executives at Uber, Amazon, and Facebook, Herndon isn’t actually interested in replacing the ensemble’s human performers—or herself—with a computer. “We didn’t want to focus on this notion of the automated composer,” she says. “To me, that’s like the least interesting thing you could do with a neural net.”

She’s referring to artificial neural networks: pieces of software, like Spawn, which are modeled loosely on the structure of the human brain. Neural nets are capable of “learning” patterns from large datasets, which they absorb as “training,” then generating new material based on what they’ve learned. Depending on the training data, that new material could be an audio file, a digital image, a melody written out on sheet music, or an autocorrect suggestion for an email or text message. Use of neural networks is increasingly common in cutting-edge computer-based music and art, but also in corporate applications ranging from the banal to the sinister.

Herndon is among the most prominent in a growing field of artists who use neural networks with the aim of accessing new aesthetic paradigms beyond the limits of human expression. Like the personal computer and even the synthesizer before it, artificial intelligence encourages playful experimentation with a near-endless variety of new forms. At the same time, the technology also invites artists to confront political anxieties related to its more troubling uses, in areas like facial recognition, data surveillance, and labor automation. As machine learning is integrated into larger and larger swaths of our lives—from smartphone photo filters to celebrity lookalike internet porn to the text of news articles like this one—these artists are probing at some of the most pressing questions facing contemporary society.

For example: what happens when the A.I. becomes smart enough to do the work of the artists themselves? Debates about what Herndon refers to as “the automated composer” stretch back at least as far as the 1980s and ‘90s, when a composer named David Cope began using artificial intelligence techniques that would be primitive by today’s standards to generate facsimiles of pieces by master composers like Bach and Mozart that were sophisticated enough to convince audiences that they were hearing the real thing. It would be easy for musicians to revolt against this new technology that threatens to make them irrelevant, or to resign themselves to a future in which human creativity no longer holds a meaningful role. For Herndon, both of these reactions are missing the point. “This is happening, people are moving this full-speed-ahead, it’s going to be a really big part of music moving forward,” she says. “And that’s why we’re trying to create a counter-narrative to it.”

A.I.-powered music in the model of Cope still requires input data, in the form of music written by people (or by an artificial intelligence), and is ultimately only capable of producing variations on that music, as surprising or lifelike as those variations may be. “We can either go down that route and have endless Muzak from the past, or we can think about it in different ways and wonder how we can learn from a different kind of intelligence,” Herndon continues. “I’m more interested in not how we can write ourselves out of the creative process, but rather expand the creative process by augmenting what we’re doing with this technology.” Can an AI and human, working together as peers, create something genuinely new?

Joined by technical collaborators Mat Dryhurst and Jules LaPlace, Herndon has spent the last few weeks working out the few remaining kinks in Spawn for its live debut in New York. The software has a much longer history than that, taking shape over the course of the last two years in a loft space in Berlin, where the group has trained their model to imitate sounds present in vast datasets of recorded audio. “I started working with Spawn, just training her on my voice to try to figure out which neural nets we wanted to use,” Herndon says, describing the “birth” of this “A.I. baby” she been raising. With the groggy affection of a new mother, she takes me through the many stages of the youngster’s development, tracing her growth from a “DIY souped-up gaming PC” to a legitimate collaborator on their new album PROTO.

Herndon’s vision for PROTO involved integrating Spawn holistically into her process, treating the software as a member of the ensemble rather than as its omniscient director. Spawn doesn’t dictate the shapes of the compositions, but it does make choices about how to play them—choices that a human musicians may not have made on their own. “We wanted to see Spawn more as a performer rather than as a composer,” Herndon says. “So that’s why we weren’t dealing with form necessarily, we’re dealing more with performance. She’s kind of improvising in a way.”

For PROTO’s first single “Godmother,” Herndon trained Spawn on a dataset of her own speaking and singing voice, grafting the results onto the stems of a track she made collaboratively with the experimental footwork producer Jlin. Spawn’s contributions to the song sound like an alien beatboxer, stripping down and recreating Herndon’s phonemes in a guttural collage that follows the rhythms and textures of Jlin’s music.

Training data for much of the rest of the album came from audio recorded during a public performance at Berlin’s ISM Hexadome, where Herndon led the crowd through various chants and gestures, intended as a conceptual rebuke to what Herndon sees as societal misconceptions about artificial intelligence. “One of our performers, Evelyn Saylor, was playing the role of people who are putting so much faith into A.I., as if it’s almost a religion that could save us from ourselves and remove culpability from our own behavior,” she says.

Herndon and her ensemble had the audience clap their hands and snap their fingers, generating massive, room-filling waves of sound that echoed throughout the space. “It was this very eerie mass clicking,” Herndon’s collaborator Jules LaPlace says about the training session. “This literal cloud of sound that came out of these people all snapping their fingers and making these group sounds which would be impossible for a single person to make.” Spawn absorbed this audio and came up with sounds that LaPlace describes as having a “placeless quality,” which resembled the original recordings when analyzed mathematically, but not at the level of human perception.

PROTO encourages listeners to think both like a human and a machine: to hear sounds like Spawn’s strange approximation of finger-snapping for their purely musical qualities, but also to attempt to rationalize them from Spawn’s point of view. What did it hear in those original recordings that caused it to respond like this? The album’s opening track, “Birth,” begins with the rolling texture of multiple voices falling in and out of phase, until a triumphant organ and a single glitching, stuttering lead voice come to dominate the mix. “You can hear it trying to guess what the next sample could be, and that’s why it gets stuck on a vowel: because we hold vowels longer,” Herndon says about the piece. “Obviously, it’s going to go like, ‘Uhhhh,’ and then it finally figures out to go on to the next thing. That kind of thing is really beautiful, because you can hear the logic behind it.”

Herndon and her collaborators aren’t working in a vacuum. Many of the tools they’ve relied on for Spawn have also been taken up by other artists and musicians working in this space. DADABOTS, a Boston-based computer music duo, use a modified version of the same same SampleRNN process that powers Spawn to wildly different effect: generating black metal and math rock tracks from scratch, without any actual instruments or recording involved. Training their system on the catalogs of bands like Krallice, Meshuggah, and Dillinger Escape Plan, the pair emulate the racing drums and soaring solos of these harsher genres with bizarre accuracy.

“We trained hundreds of nets—different genres, different architectures, different parameters,” Zack Zukowski of DADABOTS explains via email. “Eventually, we discovered what sounded best to us: neural death metal, neural math rock, neural skatepunk, neural free jazz, and neural beatbox.”

On some level, the project seems emblematic of the “automated composer” approach to A.I. music, fundamentally dedicated to mimicking styles of the past. But true to their name, DADABOTS also have a healthy sense of the absurd. They are as fascinated by the failures of A.I. to convincingly replicate existing music as they are by its successes. On their first album, the duo trained a machine learning model on the musical catalog of the Beatles, an unsurprising favorite of machine learning enthusiasts of all types. The album, entitled Deep the Beatles, makes an explicit connection between the neural network’s training data and its musical output—a connection listeners wouldn’t necessarily make without being told. At times, the music sounds like Beatles tapes chopped up and rearranged; other times, it sounds more like a John Cage symphony of radio static.

“Most generative machine learning is about maximizing the likelihood of the training data, a.k.a. imitation,” Zukowski says. “Once we can get A.I. systems to optimize this kind of loss function, it’ll be a breakthrough for creativity. For now, our aim is to make art that is closer to the essence of a band than their own music, fall terribly short of this, and laugh at the result.”

Herdon’s sleek electronic chorales couldn’t sound more different from DADABOTS’ deranged pastiche. But throughout our conversation, she also emphasizes the underlying relationship between training data and musical output. For PROTO, she only used audio recorded during the album’s compositional process as training data for Spawn, rather than exposing the A.I. to the work of other artists. This decision was both aesthetic and ethical-economic. She believes that concerns about data collection and usage raised by A.I. applications like facial recognition and targeted social media advertising will also be important to consider in the world of A.I. music. The ability to quickly generate copies of other artists’ music could raise new legal questions for an industry that has long struggled to find equitable solutions for royalties rights regarding sampling and songwriting interpolation.

“You could imagine being able to make a voice model of Aretha Franklin or Tupac, and then being able to make new works in an entirely new context that they never would have approved of,” Herndon says. “As soon as you record something, as soon as something’s machine readable, it’s essentially automatically entering into a training corpus that you have then no say in—past, present, and future.”

For the Iranian-born, London-based composer Ash Koosha, questions of authorship, identity, and performance are nearly as important as the music itself. Koosha has spent the last few years working on an A.I.-powered virtual singer named Yona, which he believes is capable of filling a role already carved out by some of today’s biggest contemporary pop musicians. “Yona has a name, she has a personality, and she has an appearance now,” he says. “She performs the music, and as she improves, her unique voice is on the path to becoming a full electronic pop musician.”

On the 2018 release C, Yona sounds like a mathematical approximation of high-concept pop acts like SOPHIE, Grimes, and Kim Petras, artists who sometimes deliberately blur the recognizably “human” aspects of their own music in favor of a heightened digital artificiality. Here, instead of a pop singer trying to sound like a machine, it’s a machine trying to sound like a pop singer. Tracks like “Oblivious” and “Sin” challenge preconceptions of expressive vocal possibility, inhabiting the strange space between human and computer. And with a CGI avatar and artfully-curated Instagram, Yona asks whether having a corporeal body matters at all for a celebrity when so much of fans’ consumption is mediated through a smartphone screen. (In the last aspect especially, she’s not so different from the CGI-generated social media influencer character Lil Miquela, who has also dabbled in pop music.)

Yona is an early prototype for Koosha’s company Auxuman, which is attempting to bring a “new breed of virtual musicians, performers and cultural icons” to the masses. “We’re aiming to build a fleet of musicians in the next two years,” Koosha tells me on a call from his home in London. “We want to focus on having ten types of different artists that are all gonna go big in their own way. We’re focused on how we can maximize their abilities artificially to see what comes out of it.”

Koosha sees Auxuman both as an art project and as a functional company, one that he hopes will offer a solution to problems facing the music industry at a time when art and commerce are more entwined than ever. Tracing the evolution of the pop star through a history of record contracts, endorsement deals, and Instagram #sponsored posts, he describes a proliferation of images and corporate associations that are necessary for artists to succeed at a large scale. He believes that these and other cultural-economic pressures may to fuel creative burnout in young artists after only a few years of viable career-building potential. “I’m personally excited to see how I can reduce that in a sense,” he says. “To be able to limit how companies are inclined to find these young artists who push the product. Maybe these virtual characters can be their billboards now.”

Yona made her live debut at this year’s Rewire Festival in the Netherlands, where Koosha brought the pop star’s image to life as a 3D projection. In an impressive video of the performance later uploaded to Instagram, a Yona avatar dances barefoot, her rendering somewhat reminiscent of the 2014 film Ex Machina. Koosha calls the performance a success, and says that it helped him confirm the project’s viability as a legitimate touring act. “The only thing that she needs to do to perform is to send out a hard disk and set of instructions for production on-site,” he says. “What that means is that you basically cut a lot of the costs traditionally associated with touring.”

Koosha doesn’t shy away from the language of automation, and responds in the affirmative when asked if the Yona project is an attempt to automate the role of the human pop star. Throughout our conversation, he routinely invokes the liberating potential that his digital avatars can provide to touring musicians, many of which he believes would rather be working on new music than jetting from stage to stage as branded identities.

Koosha’s company obviously stands to benefit from the success of this idea. Whether human musicians would see the same upside is a more complicated question. With album sales in perpetual freefall and streaming services paying out fractions of a cent in royalties for each song played, the tours and endorsement deals that Auxuman seeks to automate are among the only reliable revenue streams left for a musician. It’s hard to imagine how Auxuman could survive as a company without taking a cut of those earnings. In April, the company raised $200,000 in funding from venture capital firms to help along its vision.

Koosha is reluctant to discuss in detail the specific ways in which artificial intelligence contributes to Yona’s music and persona, citing concerns about protecting Auxuman’s intellectual property. He describes a neural network that generates lyrics based on texts that might be meaningful to the character of Yona, and another that synthesizes the sound of her voice. Discussing the songs on C, he says, “a lot of those, the initial composition is made by computational intelligence, and a lot of that has been put together by me, as only a producer or engineer.”

His reticence can create the impression that for Auxuman and Yona, the involvement of A.I. may be as useful from a marketing perspective as it is from a creative one. Generative pop lyrics and melodies are obvious places to integrate machine learning into the music-making process, but for now, people are indisputably better at writing them. Aspects of the project, like its avatars and live performances, seem clearly human-engineered, even as they play up the technology’s appealing science-fiction associations. Whatever the A.I. is actually doing, on Yona’s album C, its contributions mostly serve to emulate the sounds of contemporary pop. Unlike Herndon’s music, or the glitchy experimental albums that Koosha releases under his own name, Yona and Auxuman are premised on taking current musical models and extrapolating them into a future that closely resembles the present.

One week before Herndon’s first performance with Spawn in New York, the writer and musician Claire L. Evans took the stage at Google’s annual developer conference in Silicon Valley. In her lecture, Evans explained how she used tools from Magenta—a Google division described by the company as “an open source research project exploring the role of machine learning as a tool in the creative process”—in the creation of a fothcoming new album by her indie-dance project YACHT.

YACHT’s model of A.I. music is closer to Koosha’s than Herndon’s: using it to generate lyrics and melodies rather than sounds, treating it as a composer rather than a performer. And they’re generally more willing than Koosha to explain how the process works. It involves feeding their own music into a neural network, seeing what would happen if they asked the A.I. to emulate the sensibilities of their previous work. “For us, we decided that every single song that we were going to create with this process had to be interpolated from existing melodies from our back catalog,” Evans said in the talk. “We hoped that this would result in songs that had that indefinable YACHT feeling, which we don’t know how to quantify and I don’t think the model can, either.”

Dressed in a bright baseball cap and sneakers, Evans described the ways that machine learning has encouraged the band to rethink their songwriting craft. Converting their entire 82-song catalog into short MIDI segments—files that function as a sort of digital sheet music, readable by synthesizers and computers—the trio started with Magenta’s MusicVAE model to generate new melodies based on patterns found in the input data. Evans and her YACHT bandmates sculpted the raw output into proper songs, aiming to flesh out and refine the neural network’s idiosyncratic output without masking its involvement entirely.

“It was about really trying to create a sort of structure around this process that would keep us from being overwhelmed, and would also allow us to enact our will and our taste on top of all that,” Evans tells me after the conference. “There’s a lot of tools that can help you generate fragmentary things like melodies or text, but we’re not quite at a point now with machine learning where it can help us make structured pop songs.”

On the album’s lead single “(Downtown) Dancing,” which will be released this month, Evans strings together bizarre lyrics about “punch” and “party boots” over a stomping kick and popping funk bassline. The words were also generated by a neural net, which was trained, like Koosha’s, on a large body of texts that were somehow meaningful to the project. “(Downtown) Dancing” sounds like the network’s attempt to write a disco hit. It also sounds unmistakably like YACHT. The music is interesting on a conceptual level, but aside from the loopy lyrics, it’s hard to hear how the A.I. informed its composition, either positively or negatively. 

About two minutes in, there’s a brief appearance from Magenta’s NSynth, a small, open-source hardware device that uses machine learning to blend together sounds from different synth patches—like a sitar and a bass guitar, as demonstrated in a 2018 Magenta promotional video. Evans admits that she and Bechtolt were initially unimpressed by the instrument, and had trouble finding a place for it on their album. “We wanted to be like, ‘Okay, if we’re going to do this AI record, let’s try to use an AI instrument.’ But we didn’t really like the way that it sounded,” she says. “We thought that the way that the audio sample rate worked, it just sounds really reedy and lo-fi. We didn’t understand why this multimillion-dollar experiment can make, like, a loopy, lo-fi sound.”

Speaking in the days following the Google conference, Douglas Eck, one of the founding engineers behind Magenta, describes the NSynth project as the team’s first real attempt to use neural networks to generate sound. “I think it can only be called a proof of concept,” he tells me. “If you listen really closely to the NSynth samples, they’re pretty noisy, they’re 16k, they’re mono. You don’t go to NSynth and say, ‘Oh lets just throw away all of our sample-based synthesis and all of our FM-based synthesis,’” he continues, referring to non-AI technologies that have powered synthesizers for decades.

Watching the 2018 demo video, it’s hard to hear what NSynth is doing that couldn’t be accomplished with one of those time-honored non-AI synthesis techniques, or others like convolution or physical modeling. YouTube viewers seem as underwhelmed by the NSynth as Evans was: “ai is cool and all but honestly this sounds kinda bad,” reads one representative comment. It’s a reminder that, despite recent advances, the full capabilities of A.I. as a creative tool often still feel like a far-off promise—even when Google research money is involved.

YACHT is an acronym for “Young Americans Challenging High Technology,” and the band’s releases have often have a conceptual bent aimed at criticizing big tech. (The approach landed them in a minor controversy in 2016, when they were roundly criticized for a stunt PR campaign involving a “leaked” sex tape that didn’t actually exist.) Asked how their relationship to technology has changed now that they’re integrating tools from and giving presentations to one of the world’s biggest tech companies, Evans says that the band got to a point where it became difficult to “[say] anything meaningful about social media while trapped inside of it.”

“On the last record, we were trying to do commentary about our modern platform capitalism while being inside of it. We were doing a lot of social media-based projects and culturejammy things, which, as you know, obviously have gone south left and right,” she says, referring to the sex tape controversy. “Yes, we’re kind of in bed with Google, but we’ve also been working with lots of other creative technologists who are not with Google. The last thing we want is to be like a Google ad.”

Her bandmate and partner, Jona Bechtolt, quickly disputes her characterization: “Claire said ‘We’re in bed with Google’ and I don’t think that’s the case.” He adds that YACHT is not being paid by Google to use or promote the Magenta tools, and that, as an open-source technology, the tools are available to anyone who wants to try them. (In fairness, whether it’s an Apple commercial sync or a contract with a streaming platform, plenty of YACHT’s contemporaries are “in bed” with these sorts of companies, to some degree. The monopolizing effects of big tech make it hard for any musician—or any person at all—not to be.)

The Magenta project is just one small facet of Google’s A.I. enterprise. Douglas Eck, the Magenta engineer, is self-effacing about his role compared to other departments: “They’re trying to cure cancer, and I’m trying to get machines to write three-minute songs.” Google has also used A.I. to train facial recognition models and provide governments with internet censorship options. Last year, the company experienced an internal controversy when it won a contract with the U.S. Department of Defense to provide A.I. technology for analyzing drone footage. And in April, the company’s A.I. ethics board dissolved after internal outcry about its inclusion of figures like Kay Coles James, president of the Heritage Foundation, a right-wing think tank that that traffics in transphobia and climate change denial. Projects like Magenta, useful though they may be for artists, can also feel like Google’s attempts to build goodwill for a technology that it also uses toward questionable ends. Jules LaPlace, Herndon’s collaborator, sums up this dynamic, echoing Evans: “Google has its own reasons for pushing this kind of research. It becomes like an advertisement for Google.

“It’s interesting to be working with this stuff in a musical context, but you really see when looking at the research that this is not what it’s for,” LaPlace continues. After spending the last few years working on Herndon’s album and other A.I. projects with visual artists, he began to see a darker side of machine learning as it’s used in fields like speech synthesis and facial recognition. “It’s kinda cute that you can apply it in some way and get a usable sonic result. But you do see that a lot of research is biometric in nature.”

But there’s nothing inherently evil about machine learning itself. And as Herndon says, the technology is here to stay, whether we like it or not. She remains optimistic about its potential, regardless of what companies like Google are doing with it. “I think that maybe the meta-conversation is about agency,” she says. In conversations about A.I. and more generally, she believes that people must recognize their own ability to influence the trajectory of the technologies that permeate our lives, and to nudge them in more humane directions. “We as a community need to make the decisions in order to go where we want to go,” she continues. “In order to make those decisions, we have to have a vision for it. We can’t just be like ‘No’ all the time. We should be critical, of course, but also we have to come up with our vision or ideal is. We have to have a fantasy.” It’s up to artists like her to imagine a compelling counter-narrative for artificial intelligence, one that isn’t driven by cynicism and profit, but beauty and possibility.

CORRECTION (6/6/19): A previous version of this story mischaracterized the nature of a Google contract with the U.S. Department of Defense. 

IMPACT

you may like

Scroll to Top