Don’t use our content to train AI systems

0
Don’t use our content to train AI systems

Although Google wants all online content available for AI training, the New York Times clearly wants to opt out.

The Times has made numerous changes to its terms of service – all aimed at preventing AI companies from using the media organization’s content to train their systems.

Why we care. Many large language models are trained using website content (see: Search the 15.7 million websites in Google’s C4 dataset). While Google is exploring alternatives or supplemental ways of controlling crawling and indexing beyond robots.txt, many brands (e.g., Reddit) are making it clear right now they don’t want their content used to improve the products and increase the profits for Google, Microsoft and OpenAI – at least not without compensation. You may want to consider adding some similar AI-related messaging to your website’s terms page.

What has changed. The New York Times updated its terms of service page Aug. 3. It includes AI-specific additions that apply to its content (which it defines as “including, but not limited to text, photographs, images, illustrations, designs, audio clips, video clips, ‘look and feel,’ metadata, data, or compilations”).

In the “Prohibited use of the services” section:

  • (3) use the Content for the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.

Will AI companies compensate publishers? OpenAI and the Associated Press signed a deal last month. OpenAI licensed the AP’s news article archive dating back to 1985 for training.

Google and the New York Times Co. already have a lucrative “commercial agreement” in place, but that deal is about working together on “tools for content distribution and subscriptions.”

Microsoft is also promising publishers some sort of revenue sharing. However, most of the benefits will apparently go to members of its Start program.

FOLLOW US ON GOOGLE NEWS

 

Read original article here

Denial of responsibility! Search Engine Codex is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More