AI ki hakomat ko adayegi dene mein bilkul mumkin hai, woh bhi un tech companies ke dawon ke bawajood jo kehti hain ke generative AI models copyright ko nazar andaz karti hain aur jab zarurat ho to maqbul taur par maqbool hai. Ab, tanzeemat ko zimmedar banane ke liye, regulaatrs ko kadam uthana chahiye kyunki yeh industry is kaam mein nakam ho rahi hai.
SEBASTOPOL, CALIFORNIA – Generative artificial intelligence mojooda copyright qanoon ko anjaane aur takleef deh tareeqon mein stretch karti hai. Haal hi mein US Copyright Office ne hidayat jari ki hai ke image-generating AI ka output copyrightable nahi hai agar usme insani tawun shamil nahi tha jo prompts ko paida karta tha. Lekin yeh kai sawalon ko khara chhod deta hai: Kitni creativity ki zarurat hai, aur kya yeh wohi qisam ki creativity hai jo ek kalum se kalum likhne wale kaligraf ki hoti hai? Doosre cases text ke hain (aam taur par novels aur novelists ke), jahan kuch log yeh keh rahe hain ke copyrighted material par model ko train karna khud mein copyright infringement hai, woh bhi agar model uss text ko apne output mein reproduce na kare. Lekin texts parhna hamare ilm hasil karne ke tareeqe ka hissa raha hai jitni muddat se likhi zuban ki maujudgi ka. Jabke hum kitaabon ko kharidne ke liye paisa dete hain, hum unse seekhne ke liye paisa nahi dete.
Is situation ko hum kaise samajh sakte hain? AI ke daur mein copyright qanoon ka kya matlaab hona chahiye? Technologist Jaron Lanier apne data dignity ke khayal ke saath ek jawab pesh karte hain, jo ke is tarah se data ko farq karta hai ke model ko train karna aur model ko istemaal karna. Pehla ko Lanier mehfooz hifazati kaam ke tor par dekh raha hai, jabke output bilkul copyright infringement kar sakta hai.
Yeh farq kuch wajahon se khushkash lagta hai. Pehle toh mojooda copyright qanoon “naye cheezein shamil karne wale… jo kuch naya milata hai” ko hifazati hai, aur yeh kafi wazeh hai ke AI models yehi kaam kar rahi hain. Iske alawa, yeh bhi hai ke bade language models (LLMs) jese ke ChatGPT, mein, kisi specific novel ka pura text nahi hota, jise woh seedha copy paste kar rahe hote hain.
Balki, model ek bara set hai jo parameters par mabni hai – training ke doran shamil kiye gaye sare content par. Yeh parameters ek shabd ka agla aane ka ihtimal darust karne ke liye istemal hota hai. Jab yeh probability engines ek Shakespearean sonnet nikalte hain jo ke Shakespeare ne kabhi likha hi nahi, toh yeh transformative hai, chahe woh sonnet kitna bhi bekar ho. Lanier iss ko ek behtar model banane ka amal samajhta hai jo ke sabko khidmat pesh karta hai – hatta ke un authors ko bhi jo is model ko train karne ke liye apni likhi hui cheezein dete hain. Yeh ise transformative banata hai aur isko hifazati ke laayak banata hai. Lekin is concept mein (jise Lanier puri tarah maan raha hai) ek masla hai: mojooda AI models ko train karna aur output generate karna mein meaningful farq karna mumkin nahi hai.
AI developers models ko chhote input dene ke zariye train karte hain aur unse pochte hain ke agla word predict karein billions dafa, raste mein parameters ko thoda sa badal dete hain taake predictions ko behtar banaya ja sake. Lekin phir yehi process output generate karne ke liye istemal hoti hai, aur yahan copyright ke nazariye se masla peda hota hai.
Ek model ko likhne ko kaha gaya ke woh Shakespeare ki tarah likhe, toh woh shabd “To” se shuru ho sakta hai, jisse ke thoda sa zyada imkan ho ke agla shabd “be” ho, jisse ke thoda sa zyada imkan ho ke agla shabd “or” ho – aur isi tarah se aage bhi. Lekin phir bhi, yeh mumkin nahi hai ke yeh output ko training data se jor diya ja sake.
“Or” shabd kahan se aya? Haan toh yeh Hamlet ki mashhoor soliloquy mein agla shabd hai, lekin model ne Hamlet ko copy nahi kiya. Isne bas “or” ko wo shabd se uthaya jo laakhon shabdon mein usne chun sakte the, sab statistics ke adhar par. Yeh woh cheez nahi hai jo hum insani creativity samajhte hain. Model bas yeh koshish kar raha hai ke uski output humein samajh aaye, uske adhar par probability ko maximize kar raha hai.
Lekin phir bhi, jab zarurat ho toh kaise authors ko unki mehnat ka mawafiq hissa milega? Current generative AI chatbots ke sath provenance trace karna mumkin na ho, lekin yeh kahani yahaan khatam nahi hoti. ChatGPT ke release ke ek saal ke andar, developers ne existing foundation models ke upar applications banane shuru kiye hain. Kuch retrieval-augmented generation (RAG) ka istemal kar rahe hain taake AI ko yeh “maloom ho sake” ke wo content jo uske training data mein nahi tha. Agar aapko kisi product catalog ke liye text generate karna hai, toh aap apni company ke data ko upload kar sakte hain aur phir AI model ko yeh instructions de sakte hain: “Sirf is prompt mein shamil data ka istemal karke jawab dena.”
Halaanki RAG proprietary information ko istemal karne ka ek tareeqa tha bina training ke