This website collects cookies to deliver better user experience, you agree to the Privacy Policy.
Accept
Sign In
The Texas Reporter
  • Home
  • Trending
  • Texas
  • World
  • Politics
  • Opinion
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Books
    • Arts
  • Health
  • Sports
  • Entertainment
Reading: TikTok’s father or mother launched an internet scraper that is gobbling up the world’s on-line knowledge 25-times sooner than OpenAI
Share
The Texas ReporterThe Texas Reporter
Font ResizerAa
Search
  • Home
  • Trending
  • Texas
  • World
  • Politics
  • Opinion
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Books
    • Arts
  • Health
  • Sports
  • Entertainment
Have an existing account? Sign In
Follow US
© The Texas Reporter. All Rights Reserved.
The Texas Reporter > Blog > Business > TikTok’s father or mother launched an internet scraper that is gobbling up the world’s on-line knowledge 25-times sooner than OpenAI
Business

TikTok’s father or mother launched an internet scraper that is gobbling up the world’s on-line knowledge 25-times sooner than OpenAI

Editorial Board
Editorial Board Published October 4, 2024
Share
SHARE

ByteDance appears prefer it’s desirous to make up for misplaced time relating to scraping the net for knowledge wanted to coach its generative AI fashions.

The China-based father or mother firm of video app TikTok launched its personal internet crawler or scraper bot, dubbed Bytespider, someday in April, in line with analysis from Kasada, an organization that focuses on bot administration for corporations with on-line knowledge. The existence of the bot was additionally confirmed by Darkish Guests, which displays scraper bots.

ByteDance’s bot has rapidly develop into some of the, if not the only most, aggressive scrapers on the web, the analysis exhibits. It’s scraping knowledge at a charge that’s many multiples of different main corporations, resembling (Google, Meta, Amazon, OpenAI, and Anthropic, which use their very own scraper bots to assist create and enhance their massive language or multimodal fashions, often called LLMs or LMMs.

Sam Crowther, the CEO of Kasada, stated since Bytespider confirmed up, it’s been scraping knowledge at about 25 instances the speed of GPTbot, which scrapes knowledge for OpenAI’s ChatGPT platform and underlying fashions, for example. Bytespider has been scraping at 3,000 instances the speed of ClaudeBot, from Anthropic, which operates the Claude platform.

Because the months have passed by, Bytespider has develop into much more aggressive, in line with Kasada. Knowledge exhibits enormous spikes in scraping exercise from Bytespider over every of the final six weeks.

Representatives of TikTok and ByteDance didn’t reply to emails in search of remark.

ByteDance’s aggressive scraping comes regardless of the potential of TikTok being banned within the U.S within the coming months. President Joe Biden has signed laws that requires ByteDance to promote TikTok, as a result of nationwide safety issues, or shut it down.

The Bytespider bot, very like these of OpenAI and Anthropic, doesn’t respect robots.txt, the analysis exhibits. Robots.txt is a line of code that publishers can put into a web site that, whereas not legally binding in any method, is meant to sign to scraper bots that they can’t take that web site’s knowledge. 

Internet scraping goes again many years, primarily by serps to collect hyperlinks to internet pages. However the rise of generative AI instruments has added a brand new dimension and made the follow a prime supply of lawsuits and controversy. Individuals and organizations whose work has been scraped argue their copyright is being infringed within the course of. The entire fashions that underly generative AI instruments had been educated on huge quantities of on-line knowledge, successfully the whole lot out there on the net, significantly written info. Tech corporations use scraper bots to primarily copy all of it for all at no cost and put it into their datasets.

“It’s like they’re trying desperately to catch up,” Crowther stated of the aggressive scraping being finished by Bytespider. Simply final 12 months, ByteDance was reportedly thus far behind within the generative AI race that it was utilizing OpenAI to assist construct ByteDance’s personal LLM, which is towards OpenAI’s phrases of service. Earlier this 12 months, ByteDance launched a chat-based LLM known as Duabo, however work on that mannequin would have been accomplished previous to the buildup of newer coaching knowledge scraped by Bytespider.

It’s “clear” that ByteDance is at work on a brand new LLM, in line with one particular person aware of the corporate. As for what ByteDance plans to do with a brand new LLM, an individual aware of the corporate’s ambitions stated one aim has to do with the search operate for TikTok.

Final week, TikTok launched an replace to its present search operate targeted on key phrases for advertisements, principally permitting advertisers to go looking in actual time for phrases which might be trending on TikTok. It permits entrepreneurs to construct an advert with related key phrases that will ostensibly assist the advert present up on the screens of extra customers.

A brand new AI mannequin with knowledge on newer web traits and matters might broaden and enhance TikTok’s search setting additional, in line with the particular person aware of the corporate’s ambitions. 

“Given the audience and the amount of use, TikTok with a search environment that is a completely biddable space with keywords and topics, that would be very interesting to a lot of people spending a ton of money with Google right now,” the particular person stated.

Are you a TikTok or ByteDance worker or somebody with perception or a tip to share? Contact Kali Hays securely by way of Sign at +1-949-280-0267 or at kali.hays@fortune.com.

Advisable publication
Knowledge Sheet: Keep on high of the enterprise of tech with considerate evaluation on the business’s greatest names.
Enroll right here.

TAGGED:25timesdatafastergobblinglaunchedOnlineOpenAIparentscraperTikToksWebworlds
Share This Article
Twitter Email Copy Link Print
Previous Article 5 finest instruments and methods for multifamily lease retention
Next Article Job continuity in America – Offended Bear

Editor's Pick

Barbies and Sizzling Wheels will price extra as Trump retains toying with tariffs

Barbies and Sizzling Wheels will price extra as Trump retains toying with tariffs

Appears to be like like President Donald Trump is lastly getting his want: Children will likely be getting fewer dolls…

By Editorial Board 4 Min Read
Alpine’s Sizzling Hatch EV Has a Constructed-In, ‘Gran Turismo’ Model Driving Teacher

One other win over its Renault 5 sibling is a multi-link rear…

3 Min Read
Louis Vuitton Is Dropping a New Perfume As a result of It’s Sizzling | FashionBeans

We independently consider all beneficial services and products. Any services or products…

2 Min Read

Latest

MAGA loyalist Elise Stefanik eyes comfort prize after Trump dumped her

MAGA loyalist Elise Stefanik eyes comfort prize after Trump dumped her

Loyalty to President Donald Trump might not get you a…

May 15, 2025

Distant employees could have  worse psychological well being, survey finds

Return-to-office mandates have gotten the norm…

May 15, 2025

Sean Penn’s ‘Tough’ Look Sparks Issues Following Podcast Interview

Studying Time: 3 minutes It’s no…

May 15, 2025

Sony’s New Noise-Canceling Headphones Are the Finest Wi-fi Over-Ears Proper Now

Sony has spatial audio on board,…

May 15, 2025

Treasury to kick off seek for new boss of banking watchdog | Cash Information

The Treasury is making ready to…

May 15, 2025

You Might Also Like

Why Anthropic’s MCP is poised to revolutionize AI-driven e-commerce
Business

Why Anthropic’s MCP is poised to revolutionize AI-driven e-commerce

Welcome to Eye on AI! On this version...The UAE closes in on a deal for superior Nvidia chips days after…

8 Min Read
Netflix’s subsequent dwell present shall be … Star Search?
Business

Netflix’s subsequent dwell present shall be … Star Search?

Netflix's newest option to lean into dwell programming is a little bit of a head scratcher. The streaming service has…

2 Min Read
There is a belief hole in health-care AI. This is learn how to bridge it
Business

There is a belief hole in health-care AI. This is learn how to bridge it

In well being care, the promise of pace is highly effective—however it’s belief that makes actual progress attainable. Synthetic intelligence…

6 Min Read
Ulta Magnificence faucets the ability of Beyoncé’s Cowboy Carter tour
Business

Ulta Magnificence faucets the ability of Beyoncé’s Cowboy Carter tour

Good morning! Two feminine CEOs negotiate a serious merger, the Diddy trial continues, and Ulta will get in on Cowboy…

3 Min Read
The Texas Reporter

About Us

Welcome to The Texas Reporter, a newspaper based in Houston, Texas that covers a wide range of topics for our readers. At The Texas Reporter, we are dedicated to providing our readers with the latest news and information from around the world, with a focus on issues that are important to the people of Texas.

Company

  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • WP Creative Group
  • Accessibility Statement

Contact Us

  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability

Term of Use

  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices

© The Texas Reporter. All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?