theregister.com

vrighter, to opensource in Open Source Initiative tries to define Open Source AI

it is only open source if i can build it myself. Which I can’t if you just give me the weights.

The weights are the “compiled” version of the dataset. It’s the dataset that’s the source, not the weights

chebra,
@chebra@mstdn.io avatar

@vrighter @ylai
That is a really bad analogy. If the "compilation" takes 6 months on a farm of 1000 GPUs and the results are random, then the dataset is basically worthless compared to the model. Datasets are easily available, always were, but if someone invests the effort in the training, then they don't want to let others use the model as open-source. Which is why we want open-source models. But not "openwashed" where they call it "open" for non-commercial, no modifications, no redistribution

vrighter,

the results are random therefore the dataset is useless.

tell that to any fpga toolchain

delirious_owl,
@delirious_owl@discuss.online avatar

So the cover art I made for a friend’s album isn’t open source, even though I released it as CC BY-SA… because you can’t make it yourself?

bitfucker,

I think technically, the source should be the native format of whatever image manipulation program that you use. For vector graphics, there is svg format but the native editor is still preferable. Otherwise, whoever gets the end copy cannot easily modify or reproduce it, only copy it. But it of course depends on the definition of “easy” and a lot of other factors. Licensing is hard and it is because I am not a lawyer.

sweng,

It would depend on the format what is counted as source, and what isn’t.

You can create a picture by hand, using no input data.

I challenge you to do the same for model weights. If you truly just sit down and type away numbers in a file, then yes, the model would have no further source. But that is not something that can be done in practice.

delirious_owl,
@delirious_owl@discuss.online avatar

I challenge you to recreate the Mona Lisa.

My point is that these models are so complex that they’re closer to art than anything reproduce

sweng, (edited )

I don’t see your point? What is the “source” for Mona Lisa I would use? For LLMs I could reproduce them given the original inputs.

Creating those inputs may be an art, but so could any piece of code. No one claims that code being elegant disqualifies it from being open source.

delirious_owl,
@delirious_owl@discuss.online avatar

Are you sure that you can reproduce the model, given the same inputs? Reproducibility is a difficult property to achieve. I wouldn’t think LLMs are reproduce.

sweng,

In theory, if you have the inputs, you have reproducible outputs, modulo perhaps some small deviations due to non-deterministic parallelism. But if those effects are large enough to make your model perform differently you already have big issues, no different than if a piece of software performs differently each time it is compiled.

delirious_owl,
@delirious_owl@discuss.online avatar

That’s the theory for some paradigms that were specifically designed to have the property of determinism.

Most things in the world, even computers, are non-deterministic

Nondeterminism isn’t necessarily a bad thing for systems like AI.

leopold,

I would consider the “source code” for artwork to be the project file, with all of the layers intact and whatnot. The Photoshop PSD, the GIMP XCF or the Krita KRA. The “compiled” version would be the exported PNG/JPG.

You can license a compiled binary under CC BY if you want. That would allow users to freely decompile/disassemble it or to bundle the binary for their purposes, but it’s different from releasing source code. It’s closed source, but under a free license.

vrighter,

you released it under a non open source license. So very clearly: no it is not

leopold,

CC BY-SA is considered open source. CC BY-NC is not.

delirious_owl,
@delirious_owl@discuss.online avatar

Wut. That license is literally compatible with the GPL

ylai,

The situation is somewhat different and nuanced. With weights there are tools for fine-tuning, LoRA/LoHa, PEFT, etc., which presents a different situation as with binaries for programs. You can see that despite e.g. LLaMA being “compiled”, others can significantly use it to make models that surpass the previous iteration (see e.g. recently WizardLM 2 in relation to LLaMA 2). Weights are also to a much larger degree architecturally independent than binaries (you can usually cross train/inference on GPU, Google TPU, Cerebras WSE, etc. with the same weights).

sweng,

How is that different then e.g. patching a closed-sourced binary? There are plenty of community patches to old games to e.g. make them work on newer hardware. Architectural independence seems irrelevant, it’s no different than e.g Java bytecode.

ylai, (edited )

This is a very shallow analogy. Fine-tuning is rather the standard technical approach to reduce compute, even if you have access to the code and all training data. Hence there has always been a rich and established ecosystem for fine-tuning, regardless of “source.” Patching closed-source binaries is not the standard approach, since compilation is far less computational intensive than today’s large scale training.

Java byte codes are a far fetched example. JVM does assume a specific architecture that is particular to the CPU-dominant world when it was developed, and Java byte codes cannot be trivially executed (efficiently) on a GPU or FPGA, for instance.

And by the way, the issue of weight portability is far more relevant than the forced comparison to (simple) code can accomplish. Usually today’s large scale training code is very unique to a particular cluster (or TPU, WSE), as opposed to the resulting weight. Even if you got hold of somebody’s training code, you often have to reinvent the wheel to scale it to your own particular compute hardware, interconnect, I/O pipeline, etc… This is not commodity open source on your home PC or workstation.

sweng,

The analogy works perfectly well. It does not matter how common it is. Pstching binaries is very hard compared to e.g. LoRA. But it is still essentially the same thing, making a derivative work by modifying parts of the original.

ylai, (edited )

How does this analogy work at all? LoRA is chosen by the modifier to be low ranked to accommodate some desktop/workstation memory constraint, not because the other weights are “very hard” to modify if you happens to have the necessary compute and I/O. The development in LoRA is also largely directed by storage reduction (hence not too many layers modified) and preservation of the generalizability (since training generalizable models is hard). The Kronecker product versions, in particular, has been first developed in the context of federated learning, and not for desktop/workstation fine-tuning (also LoRA is fully capable of modifying all weights, it is rather a technique to do it in a correlated fashion to reduce the size of the gradient update). And much development of LoRA happened in the context of otherwise fully open datasets (e.g. LAION), that are just not manageable in desktop/workstation settings.

This narrow perspective of “source” is taking away the actual usefulness of compute/training here. Datasets from e.g. LAION to Common Crawl have been available for some time, along with training code (sometimes independently reproduced) for the Imagen diffusion model or GPT. It is only when e.g. GPT-J came along that somebody invested into the compute (including how to scale it to their specific cluster) that the result became useful.

applepie, to technology in Apple limits third-party browser engine work to EU devices

You gonna use Tim creeps browser boy and you will enjoy it.

Moonrise2473, to technology in Apple limits third-party browser engine work to EU devices

As usual doing malicious compliance, like when they pretended that iOS and iPadOS were two completely separate operating systems and so iPadOS shouldn’t need to support third party app stores as EU said “iOS”

notfromhere,

Honestly why didn’t EU include all mobile device operating systems or just all operating systems with greater than some number of users?

Moonrise2473,

Probably because there aren’t any, they can’t specifically say “iOS”.

I’m not aware of any other operating system (except the ones in game consoles or dedicated hw) that doesn’t allow the user to install other software not approved by the manufacturer

TheMonkeyLord,
@TheMonkeyLord@sopuli.xyz avatar

Would not mind at all if consoles got lumped in and forced to allow alternative app stores

taxet_,

Actually the more I think about it the more it seems like the only, legally fair decision. Either all of them are demanded to allow alternative app stores or none of them are. Why should the consoles be any different in this regard? 🤔

Sternhammer,
@Sternhammer@aussie.zone avatar

Indeed. Apple always gets criticised for the 30% ‘Apple Tax’ but the console manufacturers get a free pass for the same thing. Bizarre.

thingsiplay, to technology in Apple limits third-party browser engine work to EU devices

Same for side loading apps. If the rest of the world / governments does not care, then Apple won’t care in the rest of the world too.

KingThrillgore, to privacy in Telegram CEO calls out rival Signal, claiming it has ties to US government
@KingThrillgore@lemmy.ml avatar

Blaming the Americans is a signature “Russia has fucked with this company” trademark.

YeetPics, to privacy in Telegram CEO calls out rival Signal, claiming it has ties to US government
@YeetPics@mander.xyz avatar

I wonder if it’s legit or just another attempt at manipulating markets

xilona, to privacy in Telegram CEO calls out rival Signal, claiming it has ties to US government

If one is to compare apple to apples, imho the decision to choose between Signal, Whatsapp and Telegram and other “messengers” is obvious and clear.

Signal is fully open source! You can run it on-premises, if you know your business!

Why are we not talking about it?

I hope my comment will not be discarded/removed as not being in sync with the narative… 😉

mox,

Signal is fully open source! You can run it on-premises, if you know your business!

Why are we not talking about it?

Unless something has drastically changed recently, the official Signal service won’t interoperate with anyone else’s instance. That makes its source code practically useless for general-purpose messaging, which might explain why few are talking about it.

xilona,

My point is that you have all the open source software components needed to run secure communications, on your own premises, for your own users/community in case you are not trusting Signal’s infrastructure.

If you know any other similar alternative with strong encryption open source protocols please let me know! I love learning new things everyday!

Cheers!

mox, (edited )

on your own premises, for your own users/community in case you are not trusting Signal’s infrastructure.

Yes, that’s an example of data (and infrastructure) sovereignty. It’s good for self-contained groups, but is not general-purpose messaging, since it doesn’t allow communication with anyone outside your group.

If you know any other similar alternative with strong encryption open source protocols please let me know! I love learning new things everyday!

Matrix can do this. It also has support for communicating across different server instances worldwide (both public and private), and actively supports interoperability with other messaging networks, both in the short term through bridges and in the long term through the IETF’s More Instant Messaging Interoperability (MIMI) working group.

XMPP can do on-premise encrypted messaging, too. Technically, it can also support global encrypted messaging with fairly modern features, with the help of carefully selected extensions and server software and clients, although this quickly becomes impractical for general-purpose messaging, mainly because of availability and usability: Managed free servers with the right components are in short supply and often don’t last for long, and the general public doesn’t have the tech skills to do it themselves. (Availability was not a problem when Google and Facebook supported it, but that support ended years ago.) It’s still useful for relatively small groups, though, if you have a skilled admin to maintain the servers and help the users.

xilona,

Thank you very much for the info!

h6d2n,

simplex ;)

yogthos, (edited ) to privacy in Telegram CEO calls out rival Signal, claiming it has ties to US government

I’m always amazed how people come out of the woodwork to defend Signal any time any criticism of it comes up. It’s become a sacred cow that cannot be questioned. Whatever you may think of Telegram should bear zero weight on your views of Signal.

The reality is that developers of Signal have close ties to US security agencies. It’s a centralized app hosted in US and subject to US laws. It’s been forcing people to use their phone numbers to register, and this creates a graph of real world contacts people have. This alone is terrible from security/privacy perspective. It doesn’t have reproducible builds on iOS, which means you have no guarantee regarding what you’re actually running. These are just a handful of things that are publicly known.

And then we know stuff like this happens. NSA suggested using specific numbers for encryption that it knew how to factor quickly. The algorithm itself was secure, but the specific configuration of how the algorithm was implemented allowed for the exploit thehackernews.com/…/nsa-crack-encryption.html

These kinds of backdoors are very difficult to audit for because if you don’t know what to look for then you won’t have any reason to suspect a particular configuration to be malicious. Given the relationship between people working on Signal and US government, this is a real concern.

The same kind of scrutiny people apply to Telegram and other messaging apps should absolutely be applied to Signal as well.

devraza,
@devraza@lemmy.ml avatar

I’d just like to add that you can use a temporary phone number service to sign up to Signal as you only need a phone number to register, not to actually use Signal.

The_Dark_Knight, to privacy in Telegram CEO calls out rival Signal, claiming it has ties to US government

Idk how secure telegram is but cmon signal is shady AF . They won’t let fdroid have it cause they want to sign their own keys or some shit but there is a speculation its because they can roll out custom apk to targets which governments want which is just not possible if it is hosted by someone like fdroid . Even telegram allows that and they even allow third party apps which signal won’t .

SimpleX and briar is the best option if your actually worried about privacy .

This comment is copy pasted from another thread where I had the same opinion

TheAnonymouseJoker,

Signal stans do not have an answer to this. OMEMO is verifiable, rest of the stuff around it is not. Signal even had a time when they did not update the backend open source code for over 6 months.

TheAnonymouseJoker, to privacy in Telegram CEO calls out rival Signal, claiming it has ties to US government

Signal and Telegram are not rivals, though? Signal aims to be a E2EE chat platform, while Telegram works like a public forum in realtime chat format. Signal/WhatsApp are different from Telegram/Discord. They are not the same type of platforms.

Durov is comparing apples and oranges, and anyone falling for this whining, calling Telegram bad is an idiot.

possiblylinux127, to privacy in Telegram CEO calls out rival Signal, claiming it has ties to US government
Gutless2615, to privacy in Telegram CEO calls out rival Signal, claiming it has ties to US government

I think Telegram has always been a honeypot

rottingleaf,

An FSB (or AP, don’t know which, the main thing is it’s Russian) honeypot at that.

extant,

There’s no oversight for any of these agencies and they have the means and incentive to backdoor cryptography, what would stop them from doing this morality? There’s no possible way that they both aren’t compromised and all we’re seeing now is them firing pot shots at each other trying to convince the reader to join their honeypot because its sweeter.

tastysnacks,

No sure if you mean government agencies but if you do, there’s definitely oversight. Don’t think that your Congress peoples aren’t in on it too.

electricprism, to privacy in Telegram CEO calls out rival Signal, claiming it has ties to US government

Pot trying to call out Kettle.

F. Doubt.

sunstoned, to privacy in Telegram CEO calls out rival Signal, claiming it has ties to US government

Ma-trix! Ma-trix!

tuckerm, to privacy in Telegram CEO calls out rival Signal, claiming it has ties to US government
tuckerm avatar

I know that Telegram has a lot of users, so I'm not describing all of them here. But I've noticed that it seems especially popular among people who kind of like to "play pretend" as underground hackers. You know, the kind of person who likes to imagine that the government would be after them.

This mudslinging feels like more of a marketing campaign than anything else. An info op that will work well on the Telegram users who like to imagine that they have outmaneuvered all the info ops.

rottingleaf,

Yes. And those pretenders are always people who can’t install Synapse and “delete” their messages thinking that’s very smart.

autonomoususer,

Because we keeping saying Signal and Telegram instead of Anti-Libre Software, Service as a Software Substitute, and Centralised.

We should reach them in their spaces, moding, hacking, piracy and beginner programming channels.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • fightinggames
  • All magazines