overview for pup

Google Search adds a “web” filter, because it is no longer focused on web results in c/[email protected]

[–] [email protected] 2 points 1 month ago

Indexing and lookups on datasets as big as companies like Google and Amazon are running also take trillions of operations to complete, especially when you take into account the constant reindexing that needs to be done. In some cases, encoding data into a neural network is actually cheaper than storing the data itself. You can see this in practice with gaussian splatting point cloud capture, where they are training networks to guide points in the cloud at runtime, rather than storing the position of trillions of points over time.

Google Search adds a “web” filter, because it is no longer focused on web results in c/[email protected]

[–] [email protected] 2 points 1 month ago

I firmly believe it will slow down significantly. My prediction for the future is that there will be a much bigger focus on a few “base” models that will be tweaked slightly for different roles, rather than “from the ground up” retraining like we see now. The industry is already starting to move in that direction.

Google Search adds a “web” filter, because it is no longer focused on web results in c/[email protected]

[–] [email protected] 12 points 1 month ago* (last edited 1 month ago) (5 children)

While I agree in principle, one thing I’d like to clarify is that TRAINING is super energy intensive, once the network is trained, it’s more or less static. Actually using the network isn’t dramatically more energy than any other indexed database lookup.

XWayland 24.1 Released With Explicit Sync, Better Rootful Experience in c/[email protected]

[–] [email protected] 5 points 1 month ago (1 children)

Actually, Windows does allow you to use an alternate “compositor”— a feature which is used quite frequently in the industrial/embedded space. Windows calls them “custom shells”. The default is Explorer, but it can be set to any executable.

https://learn.microsoft.com/en-us/windows/iot/iot-enterprise/customize/shell-launcher

Israeli Minister Reportedly Asks Military To Kill Palestinians Instead Of Arresting Them To Manage Overcrowding In Prisons in c/[email protected]

[–] [email protected] 8 points 2 months ago

When I read the headline I just assumed this must be an onion article. That is not good.

Elon Musk’s X botched an attempt to replace “twitter.com” links with “x.com” in c/[email protected]

[–] [email protected] 2 points 2 months ago

It’s really not though. It’s actually pretty simple under the hood.

Threads' New Terms & Conditions Affects the Fediverse in c/[email protected]

[–] [email protected] 0 points 10 months ago (1 children)

I’m aware the model doesn’t literally contain the training data, but for many models and applications, the training data is by nature small enough, and the application is restrictive enough that it is trivial to get even snippets of almost verbatim training data back out.

One of the primary models I work on involves code generation, and in those applications we’ve actually observed verbatim code being output by the model from the training data, even if there’s a fair amount of training data it’s been trained on. This has spurred concerns about license violation on open source code that was trained on.

There’s also the concept of less verbatim, but more “copied” style. Sure making a movie in the style of Wes Anderson is legitimate artistic expression, but what about a graphic designer making a logo in the “style of McDonalds”? The law is intentionally pretty murky in this department, with even some colors being trademarked for certain categories in the states. There’s not a clear line here, and LLMs are well positioned to challenge what we have on the books already. IMO this is not an AI problem, it’s a legal one that AI just happens to exacerbate.

Threads' New Terms & Conditions Affects the Fediverse in c/[email protected]

[–] [email protected] 0 points 10 months ago (3 children)

That’s not what’s happening though, they are using that data to train their AI models, which pretty irreparably embeds identifiable aspects of it into their model. The only way to remove that data from the model would be an incredibly costly retrain. It’s not literally embedded verbatim anywhere, but it’s almost as if you took an image of a book. The data is definitely different, but if you read it (i.e. make the right prompts, or enough of them), there’s the potential to get parts of the original data back.

Dropbox Axes Unlimited Cloud Storage for Businesses in c/[email protected]

[–] [email protected] 7 points 10 months ago

That has been a feature in all of their competitors for 10+ years.

Cryptography may offer a solution to the massive AI-labeling problem in c/[email protected]

[–] [email protected] 1 points 10 months ago

It would definitely stop pretty much any counterfeit if they added some rudimentary depth data into the image format as well, within the signed contents. That way simply taking a picture of a monitor would be obviously detectable, and not alterable without removing the signing. It wouldn’t have to be a high resolution depth map at all either.

Don't expect quick fixes in 'red-teaming' of AI models. Security was an afterthought in c/[email protected]

[–] [email protected] 5 points 10 months ago

The key to safe AI use is to treat the AI the same as the user. Let them automate tasks on behalf of the user (after confirmation) in their scope. That way no matter how much the model is manipulated, it can only ever perform the same tasks as the user.

2023-08-09.jpg in c/[email protected]

[–] [email protected] 1 points 10 months ago

I am an American and I use it religiously for the record. Especially for version numbers. Major.minor.year.month.day.hour.minute-commit. It sorts easy, is specific, intuitive, and makes it clear which version you’re using/working on.

pup_atlas