10 interesting stories served every morning and every evening.




1 611 shares, 38 trendiness, words and minutes reading time

TSMC eyes Germany as possible location for first Europe chip plant

HSINCHU, Taiwan — Taiwan Semiconductor Manufacturing Co., the world’s largest con­tract chip­maker, said on Monday it is con­sid­er­ing build­ing its first European semi­con­duc­tor plant in Germany as the global race to on­shore chip pro­duc­tion heats up.

Chairman Mark Liu said TSMC is en­gag­ing in talks with multiple clients” about the fea­si­bil­ity of build­ing a chip wafer plant in the coun­try.

We’re in the pre­lim­i­nary stage of re­view­ing whether to go to Germany,” Liu told share­hold­ers at the com­pa­ny’s an­nual gen­eral meet­ing. It’s still very early, but we are se­ri­ously eval­u­at­ing it, and [a de­ci­sion] will de­pend on our cus­tomers’ needs.”

The com­ments are the lat­est sign that the world’s most valu­able chip com­pany is shift­ing away from its decades­long strat­egy of con­cen­trat­ing the ma­jor­ity of its chip pro­duc­tion in Taiwan. The com­pany is al­ready build­ing a $12 bil­lion chip fa­cil­ity in Arizona and is con­sid­er­ing con­struct­ing its first-ever wafer plant in Japan.

For the lat­ter pro­ject, Liu said, the com­pany is dis­cussing with Japanese clients ways to lower op­er­at­ing costs.

The cost to build and op­er­ate a chip plant in Japan is much higher than do­ing so in Taiwan. … We are di­rectly dis­cussing with our clients ways to nar­row the cost gap there,” Liu said. Once we go through the due dili­gence process, our goal is to at least be break-even in costs.”

TSMCs global ex­pan­sion comes as ma­jor economies around the world call for more semi­con­duc­tor pro­duc­tion to be brought on­shore. Chips are the heart and soul of elec­tron­ics, from smart­phones and data cen­ters to satel­lites and mil­i­tary equip­ment, and gov­ern­ments are link­ing their sup­ply di­rectly to na­tional se­cu­rity.

The ad­vanced chip pro­duc­tion plant in Arizona will be TSMCs first chip fa­cil­ity in the U. S. in two decades. Production there is set to start in early 2024.

Liu said the plant will mainly ad­dress the de­mand for in­fra­struc­ture- and na­tional se­cu­rity-re­lated chips, as re­quested by clients, rather than for con­sumer elec­tron­ics chips.

In Washington’s lat­est sup­ply chain re­view re­port, the White House specif­i­cally pointed out that the con­cen­tra­tion of ad­vanced chip pro­duc­tion in Taiwan cre­ates a vul­ner­a­bil­ity for global semi­con­duc­tor sup­ply chains.

TSMC sup­plies chips to al­most all the key global chip de­vel­op­ers, from Apple, Qualcomm and Advanced Microelectronics Devices to Intel, Infineon and Sony. U. S. clients ac­count for 70% of TSMCs rev­enue, while those from Japan ac­count for 4.72% and Europe 5.24%.

The com­pa­ny’s founder, for­mer Chairman Morris Chang, recently warned that rush­ing to bring semi­con­duc­tor on­shore will en­tail mas­sive costs with­out pro­vid­ing self-suf­fi­ciency in chips that ma­jor economies are af­ter.

...

Read the original on asia.nikkei.com »

2 462 shares, 49 trendiness, words and minutes reading time

Toyota is quietly pushing Congress to slow the shift to electric vehicles

The US is slow­ing mov­ing to­ward adopt­ing poli­cies that would put more elec­tric ve­hi­cles on the road, but for Toyota, it’s not slow enough. The Japanese au­tomaker, which is the largest car com­pany in the world, has been qui­etly lob­by­ing pol­i­cy­mak­ers in Washington, DC to re­sist the urge to tran­si­tion to an all-elec­tric fu­ture — partly be­cause Toyota is lag­ging be­hind the rest of in­dus­try in mak­ing that tran­si­tion it­self.

According to The New York Times, a top Toyota ex­ec­u­tive has met with con­gres­sional lead­ers be­hind closed doors in re­cent weeks to ad­vo­cate against the Biden ad­min­is­tra­tion’s plans to spend bil­lions of dol­lars to in­cen­tivize the shift to EVs. The ex­ec­u­tive, Chris Reynolds, has ar­gued that hy­brids, like the Toyota Prius, as well as hy­dro­gen-pow­ered fuel cell ve­hi­cles should also be in the mix.

In ad­di­tion, Toyota is also push­ing back against EV-friendly pol­icy through the auto in­dus­try’s main DC-based lob­by­ing group, the Alliance for Automotive Innovation. The group, which rep­re­sents the ma­jor car com­pa­nies and their sup­pli­ers and is chaired by Reynolds, has been ar­gu­ing against the Biden ad­min­is­tra­tion’s plan to adopt the so-called California com­pro­mise as its of­fi­cial po­si­tion, the Times re­ports.

Last year, a group of car com­pa­nies made a deal on tailpipe emis­sions with California, which had been seek­ing to set tougher rules than the US as a whole. Under President Donald Trump, the Environmental Protection Agency had sought to strip California of its power to set its own emis­sions stan­dards. But un­der Biden, that rule was re­versed, al­low­ing California and other states to im­pose tougher stan­dards.

Toyota, which sided with the Trump ad­min­is­tra­tion in its bat­tle with California, was not part of the orig­i­nal com­pro­mise. And the com­pany has ar­gued against EV-friendly poli­cies in India and in its na­tive coun­try, Japan, as well.

Toyota’s be­hind-the-scene ef­forts to slow the mo­men­tum be­hind EV-friendly poli­cies is sur­pris­ing, given its sta­tus as an early adopter of bat­tery-pow­ered trans­porta­tion. With the re­lease of the Toyota Prius in 1997, the com­pany helped pave the way for Tesla and oth­ers by prov­ing that ve­hi­cles with al­ter­na­tive pow­er­trains could be im­mensely pop­u­lar. And more re­cently, the au­tomaker has re­vealed plans to re­lease 70 new mod­els by 2025, in­clud­ing bat­tery-elec­tric, hy­dro­gen fuel cell, and gas-elec­tric hy­brids.

But that does­n’t hide the fact that Toyota has fallen far be­hind its com­peti­tors, ap­pear­ing con­tent to rest on its lau­rels while the rest of the in­dus­try has lapped it sev­eral times. Companies like Nissan, General Motors, and Volkswagen have been sell­ing pure bat­tery-elec­tric ve­hi­cles for years, while also re­veal­ing their plans to phase out gas cars com­pletely. And Toyota’s fail­ure to em­brace EVs is not a new con­cept; The New York Times noted as much in this ar­ti­cle from 2009.

Toyota’s top ex­ec­u­tives, in­clud­ing bil­lion­aire CEO Akio Toyoda, have been on the record call­ing the trend to­ward elec­tric ve­hi­cles overhyped” in part be­cause of emis­sions as­so­ci­ated with power plants — which is a fa­vorite talk­ing point used by the oil and gas in­dus­try.

The com­pany came un­der fire re­cently af­ter it was re­vealed that it was the largest cor­po­rate donor to Republican law­mak­ers who ob­jected to the cer­ti­fi­ca­tion of the 2020 pres­i­den­tial elec­tion. A ma­jor­ity of those politi­cians also dis­pute the sci­en­tific con­sen­sus on cli­mate change. Toyota ini­tially de­fended the con­tri­bu­tions, but then later said it would halt them. You know things are bad for the com­pany when a Toyota spokesper­son has to con­firm to the Times that the au­tomaker does in­deed be­lieve that cli­mate change is real.

Toyota’s ar­gu­ment that hy­brids and fuel-cell ve­hi­cles should also be in­cluded in the con­ver­sa­tion is not a bad one. Hybrid ve­hi­cles in par­tic­u­lar are an im­por­tant step­ping stone to the wider adop­tion of EVs, es­pe­cially as the charg­ing in­fra­struc­ture is still in its in­fancy.

But that ar­gu­ment might carry more weight if the au­tomak­er’s track record on fuel econ­omy was ac­tu­ally, well, good. According to the EPA, Toyota has slipped in its rank­ing in fuel ef­fi­ciency across its en­tire fleet, go­ing from an in­dus­try leader to near the bot­tom with GM and Ford. This comes as the com­pany has pushed the sale of huge gas-guz­zling trucks and SUVs, which tend to com­mand a larger profit than smaller sedans and hatch­backs.

...

Read the original on www.theverge.com »

3 435 shares, 24 trendiness, words and minutes reading time

Understanding Rust futures by going way too deep

So! Rust fu­tures! Easy peasy lemon squeezy. Until it’s not. So let’s do the easy thing, and then in­stead of wait­ing for the hard thing to sneak up on us, we’ll go for it in­ten­tion­ally.

Choo choo here comes the easy part 🚂💨

Shell ses­sion$ cargo new way­tood­eep

Created bi­nary (application) `waytoodeep` pack­age

We in­stall cargo-edit in case we don’t have it yet, so we can just cargo add

later:

Shell ses­sion$ cargo in­stall cargo-edit

Updating crates.io in­dex

Downloaded cargo-edit v0.7.0

Downloaded 1 crate (57.6 KB) in 0.47s

Ignored pack­age `cargo-edit v0.7.0` is al­ready in­stalled, use –force to over­ride

Then we pick an async run­time, be­cause those fu­tures won’t poll them­selves… and we’ll pick tokio for no rea­son other than: that’s what I’ve been us­ing a bunch these past few months.

Shell ses­sion$ cargo add tokio@1.9.0 –features full

Updating https://​github.com/​rust-lang/​crates.io-in­dex in­dex

Adding tokio v1.9.0 to de­pen­den­cies with fea­tures: [“full”]

Then we change up our main so it uses a de­fault tokio ex­ecu­tor (cargo new

gen­er­ated one for us, but it’s not ad­e­quate here):

Rust code// in `src/main.rs`

#[tokio::main]

async fn main() {

println!(“Hello from a (so far com­pletely un­nec­es­sary) async run­time”);

Shell ses­sion$ cargo run

Finished dev [unoptimized + de­bug­info] tar­get(s) in 0.01s

Running `target/debug/waytoodeep`

Hello from a (so far com­pletely un­nec­es­sary) async run­time

But let’s add some other nice things I just like to have in my pro­jects.

First, for er­ror han­dling - we’re writ­ing an app, we’re go­ing to get a bunch of dif­fer­ent types from dif­fer­ent li­braries, it’d be neat if we could have one type to unify them all.

eyre gives us that (just like any­how)!

And since I like pretty col­ors I’ll use color-eyre

Shell ses­sion$ cargo add color-eyre@0.5.11

Updating https://​github.com/​rust-lang/​crates.io-in­dex in­dex

Adding color-eyre v0.5.11 to de­pen­den­cies

Now we need to in­stall color-eyre as the de­fault panic han­dler, and I snuck in some en­vi­ron­ment vari­able mod­i­fi­ca­tion so we get back­traces by de­fault.

Rust codeuse col­or_eyre::Re­port;

#[tokio::main]

async fn main() -> ResultShell ses­sion$ cargo run

Finished dev [unoptimized + de­bug­info] tar­get(s) in 0.02s

Running `target/debug/waytoodeep`

Hello from a (so far com­pletely un­nec­es­sary) async run­time

Okay good! Now if we have an er­ror from some­where, we’ll see the full stack trace, like so:

And fi­nally, be­cause I like my logs to be struc­tured, let’s add

trac­ing and to print them with nice col­ors in the ter­mi­nal, let’s add

trac­ing-sub­scriber.

Shell ses­sion$ cargo add trac­ing@0.1.26 trac­ing-sub­scriber@0.2.19

Updating https://​github.com/​rust-lang/​crates.io-in­dex in­dex

Adding trac­ing v0.1.26 to de­pen­den­cies

Adding trac­ing-sub­scriber v0.2.19 to de­pen­den­cies

We al­ready have a setup func­tion so we’ll just in­stall trac­ing-sub­scriber in there.. and we’ll change that println! to an info!. Also, again, some en­vi­ron­ment vari­able ma­nip­u­la­tion so that if noth­ing is set, we de­fault to the

info log level for all crates.

Rust codeuse col­or_eyre::Re­port;

use trac­ing::info;

use trac­ing_­sub­scriber::En­vFil­ter;

#[tokio::main]

async fn main() -> ResultShell ses­sion$ cargo run

Finished dev [unoptimized + de­bug­info] tar­get(s) in 0.02s

Running `target/debug/waytoodeep`

Jul 25 17:03:46.993 INFO way­tood­eep: Hello from a comfy nest we’ve made for our­selves

Alright, we’re ready to do some­thing use­ful!

When de­cid­ing which ar­ti­cle to read dur­ing their cof­fee break, peo­ple usu­ally open sev­eral web­sites at the ex­act same mo­ment, and read whichever ar­ti­cle loads first.

And that’s a fact. You can quote me on that be­cause, well, who’s go­ing to go and ver­ify that? That sounds like a lot of work. Just trust me on this.

So let’s write a pro­gram that does ex­actly that.

You guessed it! Let’s bring in re­qwest - al­though I don’t love its API, it’ll work nicely with the rest of our stack here.

Also we’ll tell re­qwest to use rustls be­cause

screw OpenSSL, that’s why.

Shell ses­sion$ cargo add re­qwest@0.11.4 –no-default-features –features rustls-tls

Updating https://​github.com/​rust-lang/​crates.io-in­dex in­dex

Adding re­qwest v0.11.4 to de­pen­den­cies with fea­tures: [“rustls-tls”]

Rust code#[tokio::main]

async fn main() -> Result

And off we go!

Shell ses­sion$ cargo run

Compiling way­tood­eep v0.1.0 (/home/amos/ftl/waytoodeep)

Finished dev [unoptimized + de­bug­info] tar­get(s) in 3.05s

Running `target/debug/waytoodeep`

Jul 25 17:12:32.276 INFO way­tood­eep: Hello from a comfy nest we’ve made for our­selves

Jul 25 17:12:32.409 INFO way­tood­eep: Got a re­sponse! url=https://​fasterthanli.me con­tent_­type=Some(“text/​html; charset=utf-8”)

And this is what I mean by struc­tured log­ging. Well, part of it any­way. In that line here:

Rust code info!(%url, con­tent_­type = ?res.headers().get(“content-type”), Got a re­sponse!“);

We have a mes­sage, Got a re­sponse!, then a tag named url whose value is the

Display-formatting of the bind­ing named url, and a tag named con­tent_­type, whose value is the

Debug-formatting of the ex­pres­sion res.head­ers().get(“con­tent-type”).

Easy peasy! name = %value for Display, name = ?value, for Debug, and if both name and value have the same… name, we can use the short forms

%value and ?value.

Of course there’s also spans, which are great, and to me the whole point of this is you can then send them to APM plat­forms like Datadog or Honeycomb or who­ever, but this is­n’t an ar­ti­cle about trac­ing.

Just to il­lus­trate though, if we in­stall a JSON trac­ing sub­scriber in­stead, this is what we get:

Shell ses­sion$ cargo run

Compiling way­tood­eep v0.1.0 (/home/amos/ftl/waytoodeep)

Finished dev [unoptimized + de­bug­info] tar­get(s) in 3.09s

Running `target/debug/waytoodeep`

{“timestamp”:“Jul 25 17:17:21.531″,“level”:“INFO”,“fields”:{“message”:“Hello from a comfy nest we’ve made for our­selves”},“tar­get”:“way­tood­eep”}

{“timestamp”:“Jul 25 17:17:21.709″,“level”:“INFO”,“fields”:{“message”:“Got a re­sponse!”,“url”:“https://​fasterthanli.me”,“con­tent_­type”:“Some("text/html; charset=utf-8\“)”},“tar­get”:“way­tood­eep”}

Which should be enough to pique your in­ter­est.

Okay, now let’s fetch two things!

Rust codepub const URL_1: &str = https://​fasterthanli.me/​ar­ti­cles/​whats-in-the-box;

pub const URL_2: &str = https://​fasterthanli.me/​se­ries/​ad­vent-of-code-2020/​part-13;

…so that it’s a fair com­par­i­son. Both these ar­ti­cles are hosted on my own web­site, and it’s def­i­nitely not a mar­ket­ing scheme, in­stead it’s so that the fetch time is com­pa­ra­ble and there’s a chance one will fin­ish fetch­ing be­fore the other (and that will change ran­domly over time).

Rust codea­sync fn fetch_thing(client: &Client, url: &str) -> Result

And use it:

Rust code#[tokio::main]

async fn main() -> Result

...

Read the original on fasterthanli.me »

4 404 shares, 41 trendiness, words and minutes reading time

Police Are Telling ShotSpotter to Alter Evidence From Gunshot-Detecting AI

Police Are Telling ShotSpotter to Alter Evidence From Gunshot-Detecting AIProsecutors in Chicago are be­ing forced to with­draw ev­i­dence gen­er­ated by the tech­nol­ogy, which led to the po­lice killing of 13-year-old Adam Toledo ear­lier this year. On May 31 last year, 25-year-old Safarain Herring was shot in the head and dropped off at St. Bernard Hospital in Chicago by a man named Michael Williams. He died two days later. Chicago po­lice even­tu­ally ar­rested the 64-year-old Williams and charged him with mur­der (Williams main­tains that Herring was hit in a drive-by shoot­ing). A key piece of ev­i­dence in the case is video sur­veil­lance footage show­ing Williams’ car stopped on the 6300 block of South Stony Island Avenue at 11:46 p.m.—the time and lo­ca­tion where po­lice say they know Herring was shot.How did they know that’s where the shoot­ing hap­pened? Police said ShotSpotter, a sur­veil­lance sys­tem that uses hid­den mi­cro­phone sen­sors to de­tect the sound and lo­ca­tion of gun­shots, gen­er­ated an alert for that time and place.Ex­cept that’s not en­tirely true, ac­cord­ing to re­cent court fil­ings. That night, 19 ShotSpotter sen­sors de­tected a per­cus­sive sound at 11:46 p.m. and de­ter­mined the lo­ca­tion to be 5700 South Lake Shore Drive—a mile away from the site where pros­e­cu­tors say Williams com­mit­ted the mur­der, ac­cord­ing to a mo­tion filed by Williams’ pub­lic de­fender. The com­pa­ny’s al­go­rithms ini­tially clas­si­fied the sound as a fire­work. That week­end had seen wide­spread protests in Chicago in re­sponse to George Floyd’s mur­der, and some of those protest­ing lit fire­works.But af­ter the 11:46 p.m. alert came in, a ShotSpotter an­a­lyst man­u­ally over­rode the al­go­rithms and reclassified” the sound as a gun­shot. Then, months later and af­ter post-processing,” an­other ShotSpotter an­a­lyst changed the alert’s co­or­di­nates to a lo­ca­tion on South Stony Island Drive near where Williams’ car was seen on cam­era.A screen­shot of the ShotSpotter alert from 11:46 PM, May 31, 2020 show­ing that the sound was man­u­ally re­clas­si­fied from a fire­cracker to a gun­shot.“Through this hu­man-in­volved method, the ShotSpotter out­put in this case was dra­mat­i­cally trans­formed from data that did not sup­port crim­i­nal charges of any kind to data that now forms the cen­ter­piece of the pros­e­cu­tion’s mur­der case against Mr. Williams,” the pub­lic de­fender wrote in the mo­tion.The doc­u­ment is what’s known as a Frye mo­tion—a re­quest for a judge to ex­am­ine and rule on whether a par­tic­u­lar foren­sic method is sci­en­tif­i­cally valid enough to be en­tered as ev­i­dence. Rather than de­fend ShotSpotter’s tech­nol­ogy and its em­ploy­ees’ ac­tions in a Frye hear­ing, the pros­e­cu­tors with­drew all ShotSpotter ev­i­dence against Williams. The case is­n’t an anom­aly, and the pat­tern it rep­re­sents could have huge ram­i­fi­ca­tions for ShotSpotter in Chicago, where the tech­nol­ogy gen­er­ates an av­er­age of 21,000 alerts each year. The tech­nol­ogy is also cur­rently in use in more than 100 cities.Moth­er­board’s re­view of court doc­u­ments from the Williams case and other tri­als in Chicago and New York State, in­clud­ing tes­ti­mony from ShotSpotter’s fa­vored ex­pert wit­ness, sug­gests that the com­pa­ny’s an­a­lysts fre­quently mod­ify alerts at the re­quest of po­lice de­part­ments—some of which ap­pear to be grasp­ing for ev­i­dence that sup­ports their nar­ra­tive of events.Had the Cook County State’s Attorney’s of­fice not with­drawn the ev­i­dence in the Williams case, it would likely have be­come the first time an Illinois court for­mally ex­am­ined the sci­ence and source code be­hind ShotSpotter, Jonathan Manes, an at­tor­ney at the MacArthur Justice Center, told Motherboard.“Rather than de­fend the ev­i­dence, [prosecutors] just ran away from it,” he said. Right now, no­body out­side of ShotSpotter has ever been able to look un­der the hood and au­dit this tech­nol­ogy. We would­n’t let foren­sic crime labs use a DNA test that had­n’t been vet­ted and au­dited.”Sam Klepper, se­nior vice pres­i­dent for mar­ket­ing and prod­uct strat­egy at ShotSpotter, told Motherboard in an email that the com­pany has no rea­son to be­lieve the pros­e­cu­tor’s de­ci­sion re­flects a lack of faith in its tech­nol­ogy.ShotSpot­ter ev­i­dence and em­ployee tes­ti­mony has been ad­mit­ted in 190 court cases, he wrote. Whether ShotSpotter ev­i­dence is rel­e­vant to a case is a mat­ter left to the dis­cre­tion of a pros­e­cu­tor and coun­sel for a de­fen­dant … ShotSpotter has no rea­son to be­lieve that these de­ci­sions are based on a judg­ment about the ShotSpotter tech­nol­ogy,” he said.The Chicago Police Department, Cook County State’s Attorney’s Office, Mayor Lori Lightfoot’s of­fice, and Alderman Chris Taliaferro, who chairs the city coun­cil’s pub­lic safety com­mit­tee, did not re­spond to in­ter­view re­quests or ques­tions.In 2016, Rochester, New York, po­lice look­ing for a sus­pi­cious ve­hi­cle stopped the wrong car and shot the pas­sen­ger, Silvon Simmons, in the back three times. They charged him with fir­ing first at of­fi­cers.The only ev­i­dence against Simmons came from ShotSpotter. Initially, the com­pa­ny’s sen­sors did­n’t de­tect any gun­shots, and the al­go­rithms ruled that the sounds came from he­li­copter ro­tors. After Rochester po­lice con­tacted ShotSpotter, an an­a­lyst ruled that there had been four gun­shots—the num­ber of times po­lice fired at Simmons, miss­ing once.Paul Greene, ShotSpotter’s ex­pert wit­ness and an em­ployee of the com­pany, tes­ti­fied at Simmons’ trial that subsequently he was asked by the Rochester Police Department to es­sen­tially search and see if there were more shots fired than ShotSpotter picked up,” ac­cord­ing to a civil law­suit Simmons has filed against the city and the com­pany. Greene found a fifth shot, de­spite there be­ing no phys­i­cal ev­i­dence at the scene that Simmons had fired. Rochester po­lice had also re­fused his mul­ti­ple re­quests for them to test his hands and cloth­ing for gun­shot residue.Cu­ri­ously, the ShotSpotter au­dio files that were the only ev­i­dence of the phan­tom fifth shot have dis­ap­peared. Both the com­pany and the Rochester Police Department lost, deleted and/​or de­stroyed the spool and/​or other in­for­ma­tion con­tain­ing sounds per­tain­ing to the of­fi­cer-in­volved shoot­ing,” ac­cord­ing to Simmons’ civil suit. Greene ac­knowl­edged at plain­tiff’s crim­i­nal trial that em­ploy­ees of ShotSpotter and law en­force­ment cus­tomers with an au­dio ed­i­tor can al­ter any au­dio file that’s not been locked or en­crypted.” A jury ul­ti­mately ac­quit­ted Simmons of at­tempted mur­der and a judge over­turned his con­vic­tion for pos­ses­sion of a gun, cit­ing ShotSpotter’s un­re­li­a­bil­ity.Ex­cerpt from Silvon Simmons civil law­suit against ShotSpotter and the Rochester Police Department.Greene—who has tes­ti­fied as a gov­ern­ment wit­ness in dozens of crim­i­nal tri­als—was in­volved in an­other al­tered re­port in Chicago, in 2018, when Ernesto Godinez, then 27, was charged with shoot­ing a fed­eral agent in the city. The ev­i­dence against him in­cluded a re­port from ShotSpotter stat­ing that seven shots had been fired at the scene, in­clud­ing five from the vicin­ity of a door­way where video sur­veil­lance showed Godinez to be stand­ing and near where shell cas­ings were later found. The video sur­veil­lance did not show any muz­zle flashes from the door­way, and the shell cas­ings could not be matched to the bul­lets that hit the agent, ac­cord­ing to court records.Dur­ing the trial, Greene tes­ti­fied un­der cross-ex­am­i­na­tion that the ini­tial ShotSpotter alert only in­di­cated two gun­shots (those fired by an of­fi­cer in re­sponse to the orig­i­nal shoot­ing). But af­ter Chicago po­lice con­tacted ShotSpotter, Greene re-an­a­lyzed the au­dio files.“An hour or so af­ter the in­ci­dent oc­curred, we were con­tacted by Chicago PD and asked to search for—es­sen­tially, search for ad­di­tional au­dio clips. And this does hap­pen on a semi-reg­u­lar ba­sis with all of our cus­tomers,” Greene told the court, ac­cord­ing to a tran­script of the trial. He later ruled that there were five ad­di­tional gun­shots that the com­pa­ny’s al­go­rithms did not pick up.Greene also ac­knowl­edged at trial that we freely ad­mit that any­thing and every­thing in the en­vi­ron­ment can af­fect lo­ca­tion and de­tec­tion ac­cu­racy.”Ex­cerpt from the tran­script of Paul Greene’s ex­pert wit­ness tes­ti­mony dur­ing the trial of Ernesto Godinez.ShotSpotter an­a­lysts agree with the ma­chine clas­si­fi­ca­tion over 90% of the time,” Klepper, from ShotSpotter, wrote to Motherboard. In a tiny num­ber of cases, our cus­tomers re­quest us to per­form a lo­ca­tion analy­sis to val­i­date the ac­cu­racy of the lo­ca­tion. If we find an er­ror, we pro­vide a more ac­cu­rate lo­ca­tion to the cus­tomer to as­sist the in­ves­ti­ga­tion.” Prior to the trial, the judge ruled that Godinez could not con­test ShotSpotter’s ac­cu­racy or Greene’s qual­i­fi­ca­tions as an ex­pert wit­ness. Godinez has ap­pealed the con­vic­tion, in large part due to that rul­ing.“The re­li­a­bil­ity of their tech­nol­ogy has never been chal­lenged in court and no­body is do­ing any­thing about it,” Gal Pissetzky, Godinez’s at­tor­ney, told Motherboard. Chicago is pay­ing mil­lions of dol­lars for their tech­nol­ogy and then, in a way, pre­vent­ing any­body from chal­leng­ing it.”At the core of the op­po­si­tion to ShotSpotter is the lack of em­pir­i­cal ev­i­dence that it works—in terms of both its sen­sor ac­cu­racy and the sys­tem’s over­all ef­fect on gun crime. The com­pany has not al­lowed any in­de­pen­dent test­ing of its al­go­rithms, and there’s ev­i­dence that the claims it makes in mar­ket­ing ma­te­ri­als about ac­cu­racy may not be en­tirely sci­en­tific.Over the years, ShotSpotter’s claims about its ac­cu­racy have in­creased, from 80 per­cent ac­cu­rate to 90 per­cent ac­cu­rate to 97 per­cent ac­cu­rate. According to Greene, those num­bers aren’t ac­tu­ally cal­cu­lated by en­gi­neers, though.“Our guar­an­tee was put to­gether by our sales and mar­ket­ing de­part­ment, not our en­gi­neers,” Greene told a San Francisco court in 2017. We need to give them [customers] a num­ber … We have to tell them some­thing. … It’s not per­fect. The dot on the map is sim­ply a start­ing point.”In May, the MacArthur Justice Center an­a­lyzed ShotSpotter data and found that over a 21-month pe­riod 89 per­cent of the alerts the tech­nol­ogy gen­er­ated in Chicago led to no ev­i­dence of a gun crime and 86 per­cent of the alerts led to no ev­i­dence a crime had been com­mit­ted at all.Klep­per dis­puted those find­ings to Motherboard, say­ing that the data source used to draw their con­clu­sions, on its own, re­sults in an in­com­plete pic­ture of an in­ci­dent” be­cause a gun may have been fired even if there is no doc­u­mented po­lice ev­i­dence that it was.He also said that Greene’s tes­ti­mony in the San Francisco trial had noth­ing to do with the de­ter­mi­na­tion of our ac­tual his­tor­i­cal ac­cu­racy rate. While mar­ket­ing and sales have ap­pro­pri­ate in­put on our ser­vice level guar­an­tees for our con­tracts, ac­tual ac­cu­racy rates are based on de­tec­tions that we record.”Mean­while, a grow­ing body of re­search sug­gests that ShotSpotter has not led to any de­crease in gun crime in cities where it’s de­ployed, and sev­eral cus­tomers have dropped the com­pany, cit­ing too many false alarms and the lack of re­turn on in­vest­ment.One re­cent study of ShotSpotter in St. Louis found that ShotSpotter has lit­tle de­ter­rent im­pact on gun-re­lated vi­o­lent crime in St. Louis. [Automated gun de­tec­tion sys­tems] also do not pro­vide con­sis­tent re­duc­tions in po­lice re­sponse time, nor aid sub­stan­tially in pro­duc­ing ac­tion­able re­sults.”Klep­per con­tested those and other re­search find­ings, say­ing that the stud­ies’ con­clu­sions do not re­flect what we see.” He pointed to a 2021 study by New York University School of Law’s Policing Project that de­ter­mined that as­saults (which in­clude some gun crime) de­creased by 30 per­cent in some dis­tricts in St. Louis County af­ter ShotSpotter was in­stalled. The study au­thors dis­closed that ShotSpotter has been pro­vid­ing the Policing Project un­re­stricted fund­ing since 2018, that ShotSpotter’s CEO sits on the Policing Project’s ad­vi­sory board, and that ShotSpotter has pre­vi­ously com­pen­sated Policing Project re­searchers.Chicago is one of the most im­por­tant cities in ShotSpotter’s port­fo­lio and is in­creas­ingly be­com­ing a bat­tle­ground over its use.If a court ever agrees to ex­am­ine the foren­sic vi­a­bil­ity of ShotSpotter, or if pros­e­cu­tors con­tinue to drop the ev­i­dence when chal­lenged, it could have mas­sive ram­i­fi­ca­tions. From January 2017 through June 2021, ShotSpotter re­ported 94,313 gun­fire in­ci­dents in the city, an av­er­age of 20,958 per year, ac­cord­ing to data ob­tained by Motherboard through a pub­lic records re­quest. Chicago is ShotSpotter’s sec­ond biggest client, af­ter New York City, ac­count­ing for 13 per­cent of the com­pa­ny’s rev­enue dur­ing the first quar­ter of 2021. But Chicago’s $33 mil­lion con­tract with the com­pany is com­ing to an end and city of­fi­cials must de­cide this August whether or not to re­new it.Mean­while, the city is grap­pling with new re­search, a rise in shoot­ings, cases like the Williams and Godinez tri­als, and tragedies that have prompted re­newed crit­i­cism of the tech­nol­ogy. It was a ShotSpotter alert in the early-morn­ing hours of March 29 that dis­patched po­lice to a street in Little Village where they even­tu­ally shot and killed 13-year-old Adam Toledo, who was un­armed at the time.That and other re­cent events have sparked a new cam­paign by com­mu­nity and civil rights groups in Chicago call­ing on city of­fi­cials to drop ShotSpotter.“These tools are send­ing more po­lice into Black and Latinx neigh­bor­hoods,” Alyx Goodwin, a Chicago or­ga­nizer with the Action Center on Race and the Economy, one of the groups lead­ing the cam­paign, told Motherboard. Every ShotSpotter alert is putting Black and Latinx peo­ple at risk of in­ter­ac­tions with po­lice. That’s what hap­pened to Adam Toledo.”’Motherboard re­cently ob­tained data demon­strat­ing the stark racial dis­par­ity in how Chicago has de­ployed ShotSpotter. The sen­sors have been placed al­most ex­clu­sively in pre­dom­i­nantly Black and brown com­mu­ni­ties, while the white en­claves in the north and north­west of the city have no sen­sors at all, de­spite Chicago po­lice data that shows gun crime is spread through­out the city.Com­mu­nity mem­bers say they’ve seen lit­tle ben­e­fit from the tech­nol­ogy in the form of less gun vi­o­lence—the num­ber of shoot­ings in 2021 is on pace to be the high­est in four years—or bet­ter in­ter­ac­tions with po­lice of­fi­cers.“If you had re­la­tion­ships with any of the peo­ple on the block, you would­n’t need the tech­nol­ogy, cause we could tell you,” Asiaha Butler, pres­i­dent of the Resident Association of Greater Englewood, told Motherboard. Instead, the tech­nol­ogy seems to have given po­lice an­other ex­cuse not to build re­la­tion­ships with res­i­dents. When shots ring out in the neigh­bor­hood, po­lice may re­spond faster, but it’s an over-militarized po­lice pres­ence. You see a lot of them. It’s not a friendly in­ter­ac­tion,” she said.ORIG­I­NAL REPORTING ON EVERYTHING THAT MATTERS IN YOUR INBOX.By sign­ing up to the VICE newslet­ter you agree to re­ceive elec­tronic com­mu­ni­ca­tions from VICE that may some­times in­clude ad­ver­tise­ments or spon­sored con­tent.

...

Read the original on www.vice.com »

5 394 shares, 54 trendiness, words and minutes reading time

About the security content of iOS 14.7.1 and iPadOS 14.7.1

About the se­cu­rity con­tent of iOS 14.7.1 and iPa­dOS 14.7.1

This doc­u­ment de­scribes the se­cu­rity con­tent of iOS 14.7.1 and iPa­dOS 14.7.1.

For our cus­tomers’ pro­tec­tion, Apple does­n’t dis­close, dis­cuss, or con­firm se­cu­rity is­sues un­til an in­ves­ti­ga­tion has oc­curred and patches or re­leases are avail­able. Recent re­leases are listed on the Ap­ple se­cu­rity up­dates page.

For more in­for­ma­tion about se­cu­rity, see the Ap­ple Product Security page.

Available for: iPhone 6s and later, iPad Pro (all mod­els), iPad Air 2 and later, iPad 5th gen­er­a­tion and later, iPad mini 4 and later, and iPod touch (7th gen­er­a­tion)

Impact: An ap­pli­ca­tion may be able to ex­e­cute ar­bi­trary code with ker­nel priv­i­leges. Apple is aware of a re­port that this is­sue may have been ac­tively ex­ploited.

Information about prod­ucts not man­u­fac­tured by Apple, or in­de­pen­dent web­sites not con­trolled or tested by Apple, is pro­vided with­out rec­om­men­da­tion or en­dorse­ment. Apple as­sumes no re­spon­si­bil­ity with re­gard to the se­lec­tion, per­for­mance, or use of third-party web­sites or prod­ucts. Apple makes no rep­re­sen­ta­tions re­gard­ing third-party web­site ac­cu­racy or re­li­a­bil­ity. Contact the ven­dor for ad­di­tional in­for­ma­tion.

Please don’t in­clude any per­sonal in­for­ma­tion in your com­ment.

Thanks for your feed­back.

Ask other users about this ar­ti­cle

Ask other users about this ar­ti­cle

See all ques­tions on this ar­ti­cle

...

Read the original on support.apple.com »

6 371 shares, 31 trendiness, words and minutes reading time

Are you a robot?

Please make sure your browser sup­ports JavaScript and cook­ies and that you are not block­ing them from load­ing. For more in­for­ma­tion you can re­view our Terms of Service and Cookie Policy.

...

Read the original on www.bloomberg.com »

7 309 shares, 20 trendiness, words and minutes reading time

PUA and/or Trojan detection · Issue #14489 · qbittorrent/qBittorrent

Have a ques­tion about this pro­ject? Sign up for a free GitHub ac­count to open an is­sue and con­tact its main­tain­ers and the com­mu­nity.

By click­ing Sign up for GitHub”, you agree to our terms of ser­vice and pri­vacy state­ment. We’ll oc­ca­sion­ally send you ac­count re­lated emails.

Already on GitHub? Sign in

to your ac­count

...

Read the original on github.com »

8 284 shares, 12 trendiness, words and minutes reading time

What I Wish I Knew About CSS When Starting Out As A Frontender

What I Wish I Knew About CSS When Starting Out As A FrontenderCSS can be hard to grasp when you’re start­ing out. It can seem like magic wiz­ardry and you can very eas­ily find your­self play­ing whack-a-mole ad­just­ing one prop­erty only to have some­thing else break. It is frus­trat­ing, and that was my ex­pe­ri­ence for quite a long time be­fore things sud­denly seemed to click”.Reflecting back on this time, I think there are a few key con­cepts that were vi­tal to things fi­nally all mak­ing sense and fit­ting to­gether. These were:There are also some use­ful con­cepts to keep in mind when build­ing reusable and com­pos­able com­po­nents. A key way I think about build­ing vi­sual UI com­po­nents is that ba­si­cally every­thing can be bro­ken down into a bunch of rec­tan­gles on a page. It can be over­whelm­ing to con­sider all as­pects of a page at once, so break things down men­tally into a se­ries of rec­tan­gu­lar com­po­nents and ig­nore every­thing that does­n’t mat­ter for the piece that you’re work­ing on at that point.When you con­sider the size of an el­e­ment (i.e. its height and width), there are two dif­fer­ent mod­els by which to mea­sure this and you can ad­just this with the box-siz­ing prop­erty.The size of an el­e­ment only in­cludes its con­tent, and not its padding or bor­der.The size of an el­e­ment is in­clu­sive of its padding and bor­der. When you set width: 100% with con­tent-box, the con­tent will be 100% of the width of the par­ent el­e­ment, but any bor­ders and padding will make the el­e­ment even wider.bor­der-box makes a lot more sense, and I find it much eas­ier to rea­son about. To ap­ply bor­der-box to all el­e­ments on the page, make sure you have some­thing like this snip­pet in your stylesheet (a lot of CSS re­sets will in­clude this any­way):css1*, ::before, ::after {2 box-siz­ing: bor­der-box;3}4Go­ing for­ward, I’m only go­ing to be con­sid­er­ing bor­der-box, as it’s our pre­ferred box model at Kablamo. val­ues some­times col­lapse with an ad­ja­cent el­e­men­t’s , tak­ing the max­i­mum of the mar­gins be­tween them rather than the com­bi­na­tion of both. The rules around this can be some­what com­plex, and has a doc­u­ment de­scrib­ing them: Mastering mar­gin col­laps­ing is not ap­plic­a­ble to table cellsE­le­ments are usu­ally laid out in the doc­u­ment in the or­der that they ap­pear in the markup. The dis­play prop­erty con­trols how an el­e­ment and/​or its chil­dren are laid out. You can read about dis­play in more de­tail at . al­lows con­tent to flow kind of like text and to fit with other in­line con­tent, sort of like tetris pieces means the el­e­ment ef­fec­tively be­haves like a rec­tan­gle con­tain­ing all of its chil­dren that grows in height to fit con­tent ( is 100% of the par­ent con­tent box by de­fault). Effectively, line breaks are in­serted be­fore and af­ter the el­e­ment. is like a mix­ture of both and . Its con­tents will be con­tained within a rec­tan­gle, but that rec­tan­gle can be laid out as part of in­line con­tent. and are more ad­vanced lay­out al­go­rithms for ar­rang­ing chil­dren ac­cord­ing to cer­tain rules. These are the bread and but­ter of build­ing flex­i­ble, re­spon­sive lay­outs and are well-worth learn­ing about in more depth. Learning these has been gam­i­fied in Flexbox Froggy and Grid Garden.The po­si­tion prop­erty af­fects how el­e­ments are po­si­tioned with re­spect to the flow of the doc­u­ment in com­bi­na­tion with po­si­tion­ing prop­er­ties (top, left, right, bot­tom, in­set).El­e­ments are by de­fault, which means that po­si­tion­ing prop­er­ties have no ef­fect on the po­si­tion of the el­e­ment and the el­e­ment is laid out nor­mally. Statically po­si­tioned el­e­ments also do not have their own stack­ing con­text, which means that set­ting will also have no ef­fect. is like po­si­tion sta­tic, ex­cept that , , and other po­si­tion­ing prop­er­ties act like an off­set of where the el­e­ment should be vi­su­ally laid out (although sib­ling and an­ces­tor el­e­ments will be­have as though it is still in the orig­i­nal po­si­tion). takes the el­e­ment out of the flow of the doc­u­ment and an­ces­tors/​sib­lings will be laid out as if the el­e­ment were not pre­sent. Positioning prop­er­ties spec­ify off­sets of where the el­e­ment should be vi­su­ally laid out rel­a­tive to the near­est non- par­ent. You can add to an an­ces­tor to use it as the an­chor point for an el­e­ment with . along with po­si­tion­ing prop­er­ties means that the el­e­ment is re­moved from the nor­mal flow of the doc­u­ment and in­stead laid out rel­a­tive to the view­port. If po­si­tion­ing prop­er­ties are spec­i­fied, they will en­sure the el­e­ment is laid out al­ways with that off­set to the view­port (i.e. they ap­pear fixed in place and don’t move, even when scrolling). is a bit more com­plex, but you can think about it as a hy­brid of and . You can read more about the ex­act mech­a­nism at .The vi­sual as­pect of reusable and com­pos­able com­po­nents should ide­ally be self-con­tained. Outer mar­gins are gen­er­ally a bad idea. Margin col­lapse on outer bor­ders means com­po­nent lay­out is more com­pli­cated to rea­son about and the ef­fec­tive bound­ary is not the same as the vi­sual bound­ary (such as a bor­der). This can mean that com­po­nents be­have less pre­dictably and you can no longer think of them as just a rec­tan­gle.Es­sen­tially, you want to make your com­po­nents easy to just in­clude in an ap­pli­ca­tion with­out wor­ry­ing if they break out­side of their sim­ple rec­tan­gle.Over­all, I think these are most of the guid­ing prin­ci­ples that I wish I un­der­stood a lot sooner than I ac­tu­ally did. I spent a lot of time fum­bling around, tweak­ing CSS back and forth, and not re­ally un­der­stand­ing why things weren’t be­hav­ing as ex­pected.Even­tu­ally it did click” for me and I felt like I sud­denly un­der­stood things.AWS CloudFormation now al­lows you to cre­ate your own cus­tom re­source types with the new Resource Provider Toolkit. This post walks you through the new toolk­it’s fea­tures and how to cre­ate your own cus­tom re­source type.

...

Read the original on engineering.kablamo.com.au »

9 274 shares, 22 trendiness, words and minutes reading time

What's bad about Julia?

Julia is my fa­vorite pro­gram­ming lan­guage. More than that ac­tu­ally, per­haps I’m a bit of a fan­boy. Sometimes, though, the cease­less cel­e­bra­tion of Julia by fans like me can be a bit too much. It pa­pers over le­git­i­mate prob­lems in the lan­guage, hin­der­ing progress. And from an out­sider per­spec­tive, it’s not only in­suf­fer­able (I would guess), but also ob­fus­cates the true pros and cons of the lan­guage. Learning why you may not want to choose to use a tool is just as im­por­tant as learn­ing why you may.

This post is about all the ma­jor dis­ad­van­tages of Julia. Some of it will just be rants about things I par­tic­u­larly don’t like - hope­fully they will be in­for­ma­tive, too. A post like this is nec­es­sar­ily sub­jec­tive. For ex­am­ple, some peo­ple be­lieve Julia’s lack of a Java-esque OOP is a de­sign mis­take. I don’t, so the post won’t go into that.

Julia can’t eas­ily in­te­grate into other lan­guages­The type sys­tem works poor­lyYou can’t ex­tend ex­ist­ing types with dataThe it­er­a­tor pro­to­col is too hard to use­Func­tional pro­gram­ming prim­i­tives are not well de­signed

The very first thing you learn about Julia is that it’s un­re­spon­sive. You open your fa­vorite IDE, launch a Julia REPL, start typ­ing… and see a not­i­ca­ble lag be­fore any text ap­pears. As far as first im­pres­sions go, that is­n’t ex­actly great, es­pe­cially for a lan­guage touted for its speed.

What’s hap­pen­ing is that Julia is com­pil­ing the code needed for its REPL and its in­te­gra­tion with your ed­i­tor. This runtime” com­pi­la­tion causes the lag we call com­pile time la­tency. Hence, the ef­fect is even larger if we pull in new code from ex­ter­nal pack­ages: A small script that uses the pack­ages BioSequences and FASTX may have a 2 sec­ond la­tency, even if the com­pu­ta­tion it­self takes mi­crosec­onds.

And it can get worse, still. Among Julians, la­tiency is of­ten re­ferred to as TTFP: Time To First Plot. Graphical plot­ting be­came the poster­boy for this prob­lem be­cause plot­ting in­volves a large amount of code that does rel­a­tively lit­tle work. Importing Plots and plot­ting the sim­plest line plot takes 8 sec­onds. However, be­ing the poster­boy for la­tency, Plots have got­ten a lot of at­ten­tion and en­gi­neer­ing ef­fort to re­duce its la­tency, so it’s hardly the worst pack­age. Packages like Turing or ApproxFun may add half a minute to la­tency - Turing took 40 sec­onds to start up on my lap­top. I’ve heard of or­ga­ni­za­tions whose code­base is in Julia where it takes 5 min­utes to start a Julia process and load their pack­ages.

So: How bad is this, re­ally?

Well, it de­pends on what you use Julia for. Remember, the la­tency is a one-time cost every time you start a Julia process. If you’re a data sci­en­tist who works for hours on end in a Jupyter note­book, ten or even 40 sec­onds of startup time is merely a small an­noy­ance. I’m in that cat­e­gory, broadly. When I start Julia, it rarely takes less than a few min­utes be­fore I shut down - and the Julia pro­grams I run from com­mand line takes min­utes to com­plete, too. But some tasks and use cases rely on run­ning lots of short Julia processes. These sim­ply be­come im­pos­si­ble. For ex­am­ple, the la­tency makes Julia a com­plete non-starter for:

Simple Unix com­man­d­line tools such as cd, rip­grep or ls Settings where re­spon­sive­ness is key, say soft­ware in a self-dri­ving car or air­plane Small com­pos­able scripts, e.g. as used in Snakemake work­flows

The la­tency also forces spe­cific work­flows for Julia users and de­vel­op­ers. When us­ing Python or Rust, you may be used to run­ning some tests from com­mand line, mod­i­fy­ing a source file in the ed­i­tor, then re-run­ning the tests un­til they work. This work­flow is not fea­si­ble in Julia - in­stead, you are es­sen­tially forced to into REPL dri­ven de­vel­op­ment, where you have a sin­gle Julia ses­sion you keep open while mod­i­fy­ing your code and ob­serv­ing the re­sults.

Julias la­tency is im­prov­ing, and there are hoops you can jump through to mit­i­gate this prob­lem some­what. But the prob­lem is fun­da­men­tally un­solv­able, be­cause it’s built into Julia on a ba­sic de­sign level. So, be­fore learn­ing Julia, ask your­self if this is a deal­breaker for you.

This one’s pretty easy to demon­strate:

Yep, ~150 MB mem­ory con­sump­tion for a hello-world script. Julia’s run­time is enor­mous - these megabytes are not just used by Julias com­piler, it ap­par­ently pre-al­lo­cates BLAS buffers, just in case the user wants to mul­ti­ply ma­tri­ces in their hello-world script, you know. Forget the la­tency, a back­ground con­sump­tion of 150 MB com­pletely ex­cludes us­ing Julia for any­thing but ap­pli­ca­tion-level pro­grams run­ning on a PC or a com­pute clus­ter. For any­thing else, be it mo­bile, em­bed­ded, dae­mon processes, etc, you’ll need to use some­thing else.

In fact, even for desk­top-level ap­pli­ca­tions, con­sum­ing 150 MB on the Julia run­time is push­ing it. Think of all the hate Electron gets for wast­ing re­sources. Every Julia pro­gram is in the same ball­park as Electron in this re­gard. A com­mand-line cal­cu­la­tor writ­ten in Julia con­sumes more mem­ory than the 2003 video game Command & Conquer: Generals.

Julia can’t eas­ily in­te­grate into other lan­guages

Another con­se­quence of Julia’s mas­sive run­time is that it makes it an­noy­ing to call into Julia from other lan­guages. If your Python script needs to rely on Julia, you’ll need to pay up front: Both the la­tency, and the 150-ish megabytes.

Compare this to a sta­tic lan­guage like C, where you can com­pile a C lib to a bi­nary that other pro­grams sim­ply calls into. Julians are usu­ally very proud of the large amount of code shar­ing and code reuse in the Julia com­mu­nity, but it’s worth not­ing that this shar­ing stops abruptly at the lan­guage bar­rier: We might be able to use a Rust li­brary in Julia with lit­tle fric­tion, but no-one would use a Julia li­brary if they could avoid it. So if you want to code up some uni­ver­sally used li­brary, you bet­ter go with a sta­tic lan­guage.

This is one point where I’ve changed per­spec­tive af­ter hav­ing tried cod­ing Rust. Before learn­ing Rust, when I only knew Python and Julia I would have said some­thing like:

Sure, sta­tic analy­sis is use­ful. But to en­sure pro­gram cor­rect­ness, you need tests any­way, and these tests will catch the vast ma­jor­ity of what would be com­pile-time er­rors. The small safety you lose in a dy­namic lan­guage is more than made up by the time saved, which you can use to write bet­ter tests.

How silly, past me, if only you knew! See, I taught my­self Rust by do­ing the Advent of Code 2020 in Rust. Being a neo­phyte, I was so bad at Rust that I had more than one com­piler er­ror per line of code on av­er­age. Everything was hard. And yet, for about two-thirds of the chal­lenges, the first time the pro­gram com­piled, it gave the cor­rect an­swer.

That was as­tound­ing to me. Working with Python or Julia, I ex­pected the pro­gram to crash. Programs al­ways crash at first, right? Well, they do in Julia un­til you’ve found the bugs by hit­ting them, and fixed them one by one. In fact, for me it was part of the de­vel­op­ment work­flow, it­er­a­tively write the so­lu­tion, run it, watch where it crashes, fix it, re­peat. The idea that you could just write the right pro­gram on the first try was wild. The ex­pe­ri­ence was not that my pro­gram be­came more safe in the sense that I could ship it with­out sweat on my brow. No, it was that it just worked, and I could com­pletely skip the en­tire de­bug­ging process that is core to the de­vel­op­ment ex­pe­ri­ence of Julia, be­cause I had got­ten all the er­rors at com­pile time.

And this was for small scripts. I can only imag­ine the pro­duc­tiv­ity boots that sta­tic analy­sis gives you for larger pro­jects when you can safely refac­tor, be­cause you know im­me­di­ately if you do some­thing wrong.

Back to Julia: It lies some­where in be­tween Python and Rust in terms of sta­tic analy­sis and safety. You can add type an­no­ta­tions to your func­tions, but the er­rors still only ap­pear at run­time, and it’s gen­er­ally con­sid­ered un-id­iomatic to use too many type an­no­ta­tions, with good rea­son. Linting and sta­tic analy­sis for Julia are slowly ap­pear­ing and im­prov­ing, but com­pared to Rust they catch just a small frac­tion of er­rors. When writ­ing generic pack­age code where types are mostly in­de­ter­mi­nate un­til run­time, they can’t do much type analy­sis.

Another is­sue with sta­tic analy­sis in Julia is that, be­cause writ­ing un-in­ferrable code is a com­pletely valid (if in­ef­fi­cient) cod­ing style, there is a lot of code that sim­ply can’t be sta­t­i­cally analysed. Similarly, you can have a Julia pack­age whose dy­namic style causes tonnes of issues” ac­cord­ing to the sta­tic an­a­lyzer, which nonethe­less work fine. If your pack­age de­pends on such a pack­age, your sta­tic analy­sis will be flooded with false pos­i­tives orig­i­nat­ing from the third-party code.

I’m a big fan of these tools, but hon­estly, in their cur­rent state, you can rely on the lin­ter to catch ty­pos or wrong type sig­na­tures, and on the sta­tic an­a­lyzer to an­a­lyze spe­cific func­tion calls you ask it to… but that’s about it.

Is it un­fair to crit­i­cise a dy­namic lan­guage for not hav­ing sta­tic analy­sis? Isn’t that im­plicit? Perhaps. But this post is about the weak­nesses of Julia, and no mat­ter how you jus­tify it, poor sta­tic analy­sis is most def­i­nitely a weak­ness.

Julia re­leased 1.0 in 2018, and has been com­mit­ted to no break­age since then. So how can I say the lan­guage is un­sta­ble?

Instability is­n’t just about break­ing changes. It’s also about bugs and in­cor­rect doc­u­men­ta­tion. And here, Julia is pretty bad. Having used Julia since just be­fore 1.0, I run into bugs in the core lan­guage reg­u­larly. Not of­ten, but per­haps once every cou­ple of months. I can’t re­call ever hav­ing run into a bug in Python.

If you doubt it, take a look at the open is­sues marked as bugs. Some of these are tran­sient bugs on mas­ter, but there are many, many old bugs you can still go in and trig­ger from the REPL on the sta­ble Julia re­lease. Here’s one I re­ported about a year ago, and which still has­n’t been fixed:

I don’t think it’s be­cause the Julia devs are care­less, or Julia is­n’t well tested. It’s just a mat­ter of bugs con­tin­u­ously be­ing dis­cov­ered be­cause Julia is rel­a­tively young soft­ware. As it ma­tures and sta­bi­lizes post 1.0, the num­ber of bugs have gone down and will con­tinue to do so in the fu­ture. But un­til it does, don’t ex­pect ma­ture, sta­ble soft­ware when us­ing Julia.

There is, how­ever, also the is­sue of un­sta­ble per­for­mance, where Julia is a uniquely awk­ward sit­u­a­tion. Other dy­namic lan­guages are slow, and peo­ple us­ing them write code ex­pect­ing them to be slow. Static lan­guages are fast, be­cause the com­piler has full type in­for­ma­tion dur­ing the com­pi­la­tion process. If the com­piler can’t in­fer the type of some­thing, the pro­gram won’t com­pile. Importantly, be­cause an in­fer­ence fail­ure in sta­tic lan­guages causes the com­pi­la­tion to fail, the com­pil­er’s in­fer­ence is part of the API, and must re­main sta­ble. Not so in Julia.

In Julia, what the com­piler knows about your code and the op­ti­miza­tions it does is a pure im­ple­men­ta­tion de­tail - at long as it pro­duces the cor­rect re­sult. Even in sit­u­a­tions where noth­ing can be in­ferred about the types Julia will run and pro­duce the cor­rect re­sult, just hun­dreds of times slower. That means that a com­piler change that causes a fail­ure of in­fer­ence and a 100x per­for­mance re­gres­sion is not a break­ing change. So, these hap­pens.

I mean, don’t get me wrong, they don’t hap­pen of­ten, and they usu­ally only af­fect part of your pro­gram, so the re­gres­sion is rarely that dra­matic. The Julia team re­ally tries to avoid re­gres­sions like that, and they’re usu­ally picked up and fixed on the mas­ter branch of Julia be­fore they make it to any re­lease. Still, if you’ve main­tained a few Julia pack­ages, I bet it has hap­pened to you more than once.

A more im­por­tant con­se­quence of Julia be­ing a young, im­ma­ture lan­guage is that the pack­age ecosys­tem is sim­i­larly im­ma­ture. Compared to the core lan­guage, which have a huge num­ber of users, and more de­vel­op­ers, the ecosys­tem set­tles more slowly. This has sev­eral con­se­quences for Julia:

First, com­pared to es­tab­lished lan­guages, lots of pack­ages are miss­ing. Especially if you work in a niche sub­ject, as most sci­en­tists do, you are much more likely to find a Python or R pack­age to fit your needs than a Julia pack­age. This sit­u­a­tion will ob­vi­ously im­prove over time, but right now, Julia is still quite far be­hind.

You’re also much more likely to find out­dated or un­main­tained pack­ages in Julia. This is not be­cause Julia pack­ages tend to fall into dis­re­pair more quickly than other lan­guages, I think, but rather be­cause pack­ages which has al­ready ex­isted for 20 years are more likely to last an­other five more years than pack­ages that have ex­isted for two years. It’s only been three years since Julia 1.0 came out, so if you find a blog post from 2015, any posted Julia code is un­likely to work, and the pack­ages have prob­a­bly re­leased sev­eral break­ing changes since then. In com­par­i­son, the Python pack­age Numpy has been around five times longer than Julia 1.0!

In soft­ware ecosys­tems, it also takes a while for ef­fort to con­sol­i­date to well-known pack­ages. In Python, every­body knows, for ex­am­ple, to use pan­das when work­ing with dataframes. It has be­come the de-facto stan­dard. And if it is to be de­throned, any con­tender must com­pare fa­vor­ably against pan­das, which means it must it­self be a solid, well-used pack­age.

Perhaps most crit­i­cally, the de­vel­oper tool­ing sur­round­ing Julia is also im­ma­ture, with lots of ba­sic func­tion­al­ity miss­ing. This is also a con­se­quence of the ecosys­tem sim­ply not be­ing ma­ture enough, with too lit­tle de­vel­op­ment ef­fort be­hind it (notably, no large com­pa­nies have made large con­tri­bu­tions to Julia, un­like every other lan­guage I know of). Here are a few ex­am­ples, hap­haz­ardly cho­sen:

Julia’s built-in Test pack­age is bare­bones, and does not of­fer setup and tear­down of tests, nor the func­tion­al­ity to only run a sub­set of the full test suite. The ed­i­tor ex­pe­ri­ence is not great with Julia. It’s get­ting bet­ter, but with the fore­most Julia IDE de­vel­oped by a few peo­ple in their spare time, it has all the crashes, slow­ness and in­sta­bil­ity you would ex­pect. Static analy­sis is brand new, and feels like it has­n’t yet set­tled into its fi­nal form. It also has no IDE in­te­gra­tion. There is no com­mon frame­work for bench­mark­ing and pro­fil­ing Julia code. In a sin­gle ses­sion, you may an­a­lyze the same func­tion with BenchmarkTools, @allocated, Profile, JET, JETTest, @code_native and Cthulhu, which each has to be loaded and launched in­di­vid­u­ally. This is­sue is par­tic­u­larly no­table when a new user faces per­for­mance is­sues and ask a Julia fo­rum what should I do”, and get 10 dif­fer­ent an­swers, each con­cern­ing one spe­cific sub-analy­sis that may cast light on one par­tic­u­lar cause of per­for­mance prob­lems. This is a huge time sink, and not a great user ex­pe­ri­ence. It should be pos­si­ble to gather sev­eral of these tools in a sin­gle analy­sis pack­age, but it has not yet been done.

This is the most con­tro­ver­sial of my prob­lems with Julia. People who don’t know Julia have no idea what I mean when I say the sub­typ­ing sys­tem is bad, and peo­ple who do know Julia are un­likely to agree with me. I’ll give a brief re­cap of how the sys­tem works for any­one not fa­mil­iar:

In Julia, types can be ei­ther ab­stract or con­crete. Abstract types are con­sid­ered incomplete”. They can have sub­types, but they can­not hold any data fields or be in­stan­ti­ated - they are in­com­plete, af­ter all. Concrete types can be in­stan­ti­ated and may have data, but can­not be sub­typed since they are fi­nal. Here is an imag­i­nary ex­am­ple:

You can de­fine meth­ods for ab­stract types, which are in­her­ited by all its sub­types (that is, be­hav­iour can be in­her­ited, but not data). But if a con­crete type de­fine the same method, that will over­write the ab­stract one:

# Generic func­tion, is slow

func­tion print(io::IO, seq::Nu­cleotideSe­quence)

for i in seq

print(io, i)

end

end

# Specialized func­tion, over­writes generic

func­tion print(io::IO, seq::DNASe­quence)

write(io, seq.x) # op­ti­mized write im­ple­men­ta­tion

end

So you can cre­ate type heiarchies, im­ple­ment generic fall­back meth­ods, and over­write them when­ever you want. Neat! What’s not to like? Well…

You can’t ex­tend ex­ist­ing types with data

Say you im­ple­ment some use­ful MyType. Another pack­age thinks it’s re­ally neat and wants to ex­tend the type. Too bad, that’s just not pos­si­ble - MyType is fi­nal and can’t be ex­tended. If the orig­i­nal au­thor did­n’t add an ab­stract su­per­type for MyType you’re out of luck. And in all prob­a­bil­ity, the au­thor did­n’t. After all, good coders usu­ally fol­low the YAGNI prin­ci­ple: Don’t pre-emp­tively im­ple­ment what you don’t need.

In e.g. Python, you are not go­ing to run into types you want to sub­class, but can’t. You can sub­class what­ever you damn well please. In Rust, the prob­lem is not even rec­og­niz­able: Any type you write can freely de­rive traits and is not at all con­strained by where it is placed in the type hi­er­ar­chy, be­cause there is no type hi­er­ar­chy.

Suppose, on the other hand, you find out the au­thor did ac­tu­ally add AbstractMyType. Then you can sub­type it:

… and now what? What do you need to im­ple­ment? What does the ab­stract type re­quire? What does it guar­an­tee? Julia of­fers ab­solutely no way of find­ing out what the ab­stract in­ter­face is, or how you con­form to it. In fact, even in Base Julia, fun­da­men­tal types like AbstractSet, AbstractChannel, Number and AbstractFloat are just not doc­u­mented. What ac­tu­ally is a Number, in Julia? I mean, we know what a num­ber is con­cep­tu­ally, but what are you opt­ing in to when you sub­type Number? What do you promise? Who knows? Do even the core de­vel­op­ers know? I doubt it.

A few ab­stract types in Julia are well doc­u­mented, most no­tably AbstractArray and its ab­stract sub­types, and it’s prob­a­bly no coin­di­dence that Julia’s ar­ray ecosys­tem is so good. But this is a sin­gu­lar good ex­am­ple, not the gen­eral pat­tern. Ironically, this ex­cep­tion is of­ten held up as an ex­am­ple of why the Julia type sys­tem works well.

Here is a fun chal­lenge for any­one who thinks it can’t be that bad”: Try to im­ple­ment a TwoWayDict, an AbstractDict where if d[a] = b, then d[b] = a. In Python, which has in­her­i­tance, this is triv­ial. You sim­ply sub­class dict, over­write a hand­ful of its meth­ods, and every­thing else works. In Julia, you have to de­fine its data lay­out first - quite a drag, since dic­tio­nar­ies have a com­pli­cated struc­ture (remember, you can’t in­herit data!). The data lay­out can be solved by cre­at­ing a type that sim­ply wraps a Dict, but the real pain of the im­ple­men­ta­tion come when you must some­how fig­ure out every­thing AbstractDict promises (good luck!) and im­ple­ment that.

Another prob­lem with re­ly­ing on sub­typ­ing for be­hav­iour is that each type can only have one su­per­type, and it in­her­its all of its meth­ods. Often, that turns out to not be what you want: New types of­ten has prop­er­ties of sev­eral in­ter­faces: Perhaps they are set-like, it­er­able, callable, print­able, etc. But no, says Julia, pick one thing. To be fair, iterable”, callable” and printable” are so generic and broadly use­ful they are not im­ple­mented us­ing sub­typ­ing in Julia - but does­n’t that say some­thing?

In Rust, these prop­er­ties are im­ple­mented through traits in­stead. Because each trait is de­fined in­de­pen­dently, each type faces a smor­gas­bord of pos­si­bil­i­ties. It can choose ex­actly what it can sup­port, and noth­ing more. It also leads to more code reuse, as you can e.g. sim­ply de­rive Copy and get it with­out hav­ing to im­ple­ment it. It also means there is an in­cen­tive to cre­ate smaller” traits. In Julia, if you sub­type AbstractFoo, you opt in to a po­ten­tially huge num­ber of meth­ods. In con­trast, it’s no prob­lem to cre­ate very spe­cific traits that con­cerns only a few - or one - method.

Julia does have traits, but they’re half-baked, not sup­ported on a lan­guage level, and hap­haz­ardly used. They are usu­ally im­ple­mented through mul­ti­ple dis­patch, which is also an­noy­ing since it can make it dif­fi­cult to un­der­stand what is ac­tu­ally be­ing called. Julia’s broad­cast­ing mech­a­nism, for ex­am­ple, is con­trolled pri­mar­ily through traits, and just find­ing the method ul­ti­mately be­ing called is a pain.

Also, since so much of Julia’s be­hav­iour is con­trolled through the type of vari­ables in­stead of traits, peo­ple are tempted to use wrap­per types if they want type A to be able to be­have like type B. But those are a ter­ri­ble idea, since it only moves the prob­lem and in fact makes it worse: You now have a new wrap­per type you need to im­ple­ment every­thing for, and even if you do, the wrap­per type is now of type B, and does­n’t have ac­cess to the meth­ods of A!

A good ex­am­ple of the sub­typ­ing sys­tem not work­ing is Julia’s stan­dard li­brary LinearAlgebra. This pack­age uses both wrap­per types and traits to try to over­come the lim­i­ta­tions of the type sys­tem, and suf­fers from both the workarounds. But an even clearer ex­am­ple of the fail­ure of the type sys­tem is its use of big unions, that is, func­tions whose type sig­na­ture has ar­gu­ments of the type A or B or C or D or E or …”. And these unions of types gets out of con­trol: If you have Julia at hand, try to type in LinearAlgebra. StridedVecOrMat and watch the hor­ror. The use of such an abom­i­na­tion is a symp­tom of an un­solved un­der­ly­ing prob­lem with the type sys­tem.

The con­sen­sus on id­iomatic Julia seem to be slowly drift­ing away from lean­ing on its type sys­tem to spec­ify con­straints, and to­wards duck­typ­ing and traits. I es­sen­tially see this as the com­mu­nity im­plic­itly be­gin­ning to ac­knowl­edge the prob­lems of the type sys­tem and try­ing to avoid it where pos­si­ble. All the in­di­vid­ual gripes in the post about the sys­tem are well known, even if few peo­ple would grant the sys­tem as whole is poor. It has, how­ever, been re­mark­ably hard to pro­vide good al­ter­na­tives or solve the in­di­vid­ual pain points. As Julia is ma­tur­ing, there is less and less space to re-in­vent or en­hance some­thing as core as the type sys­tem.

I ex­pect that in the fu­ture, Julians will move even fur­ther to­wards Python-esque duck­typ­ing. I pre­dict that while there will arise pack­ages that try to ad­dress some of these is­sues, they will be in dis­agree­ment about what to do, they will be niche, with­out good core lan­guage sup­port, and there­fore not re­ally solve the prob­lem.

The it­er­a­tor pro­to­col is too hard to use

By the it­er­a­tor pro­to­col”, I mean: How does a for loop work? The three lan­guages I’m fa­mil­iar with, Python, Rust and Julia, all han­dle this slightly dif­fer­ent. In Julia, the fol­low­ing code:

low­ers into some­thing equiv­a­lent to:

This means that, to im­ple­ment an it­er­a­tor, you need to im­ple­ment it­er­ate(x) and it­er­ate(x, state). It should re­turn noth­ing when the it­er­a­tion is done, and (i, nex­t_s­tate) when it still has el­e­ments. By the way, you also need to im­ple­ment a few traits, which Julia does not warn you about if you for­get, or im­ple­ment them wrongly. But I gripe about that else­where.

So: Why is it like that? One of the rea­son it was de­signed like that is that it makes the it­er­ate func­tion and the it­er­a­tor it­self state­less, since the state is stored in the lo­cal vari­able passed as an ar­gu­ment to the it­er­ate func­tion. It means you can’t have bugs like this Python bug:

>>> iter = (i+1 for i in range(3))

>>> length = sum(1 for i in iter)

>>> list(iter) # oops!

First, you ab­solutely can have the same bug as in Python, be­cause some it­er­a­tors are state­ful! For ex­am­ple, if you read a file:

And since there is no way of know­ing pro­gra­mat­i­cally (and cer­tainly not sta­t­i­cally) if an it­er­a­tor is state­ful, you bet­ter adopt a cod­ing style that as­sumes all it­er­a­tors are state­ful, any­way.

To be clear, the prob­lem is­n’t that Julia has state­less it­er­a­tors. Stateless it­er­a­tors have ad­van­tages, they may in fact be su­pe­rior and prefer­able where pos­si­ble. The real prob­lem is that it­er­a­tion is never state­less - in a loop, there must al­ways be state. When us­ing state­less it­er­a­tors, the prob­lem of keep­ing track of the state is not solved, but sim­ply moved else­where. Julia’s it­er­a­tors are stateless” in the worst pos­si­ble sense of the word: That the com­piler and the lan­guage does­n’t know about state, and there­fore of­floads the job of keep­ing track of it to the pro­gram­mer. Reasoning about state across time is a fa­mously hard prob­lem in pro­gram­ming, and with Julia’s it­er­a­tors, you get to feel 100% of that pain. Making the com­pil­er’s job eas­ier by of­fload­ing work to the pro­gram­mer is not how high-level lan­guages are sup­posed to work!

For ex­am­ple, sup­pose you cre­ate an it­er­a­tor that you need to process in two stages: First, you do some ini­tial­iza­tion with the first el­e­ments of the it­er­a­tor. Perhaps it’s an it­er­a­tor of lines and you need to skip the header. After that, you it­er­ate over the re­main­ing ar­gu­ments. You im­ple­ment this as the func­tions parse_­header and parse_rest In Julia, you need to ex­plic­itly pass state be­tween the func­tions - not to men­tion all the boil­er­plate code it in­tro­duces be­cause you now can’t it­er­ate over the it­er­a­tor in a for loop since that would restart” the it­er­a­tor. Well, maybe it would, who knows if it’s state­less!

If you’re a Julian read­ing this with scep­ti­cism, try im­ple­ment­ing an in­ter­leav­ing it­er­a­tor: It should take any num­ber of it­er­a­tors x1, x2, … xn and pro­duce a stream of their in­ter­leaved val­ues: x1_1, x2_x1, … nx_1, x1_2 … xn_m. Easy peasy in Python, a headache in Julia be­cause you have to jug­gle N states man­u­ally in the func­tion. Or try re-im­ple­ment­ing zip or a roundrobin it­er­a­tor.

Functional pro­gram­ming prim­i­tives are not well de­signed

I did­n’t re­ally no­tice this un­til I tried Rust, and Julia’s Transducers pack­age, both of whom im­ple­ments the foun­da­tions of func­tional pro­gram­ming (by this I mean map, fil­ter etc.) way bet­ter than Julia it­self does. This is­sue is not one sin­gle de­sign prob­lem, but rather a se­ries of smaller is­sues about how Julia’s it­er­a­tors are just… gen­er­ally not that well de­signed.

map, fil­ter and split are ea­ger, re­turn­ing Array. There is lit­er­ally no rea­son for this - it only makes the code slower and less generic. I can’t think of a sin­gle up­side - per­haps other than that it saves you typ­ing col­lect once in a while. Newer ver­sions of Julia in­tro­duced Iterators.map and Iterators.filter which are lazy, but us­ing them means break­ing back­wards com­pat­i­bil­ity, and also, you have to use the ugly iden­ti­fier Iterators. And for split, there is no such es­cape hatch - you just have to ac­cept it’s slow and un­nec­es­sar­ily al­lo­cat­ing. Functional pro­gram­ming func­tions like map and fil­ter can’t take func­tions. That is, I can­not call map(f) and get a mapper” func­tion. I usu­ally solve” this by defin­ing imap(f) = x -> Iterators.map(f, x) in the be­gin­ning of my files, but hon­estly, Julia’s it­er­a­tors should work like this by de­fault. What do you think the method each­line(::String) does? Does it it­er­ate over each line of a string? Haha, no, silly you. It in­ter­prets the string as a file­name, tries to open the file, and re­turns an it­er­a­tor over its lines. What? So, how do you ac­tu­ally it­er­ate over the lines in a string? Well, you have to wrap the string in IO ob­jects first. Yeah. that’s an­other gripe, there is no such type as a Path in Julia - it just uses strings. Why not? I hon­estly don’t know, other than per­haps the Julia devs wanted to get 1.0 out and did­n’t have time to im­ple­ment them.

But Jakob, you say, don’t you know about Takafumi Arakaki’s amaz­ing JuliaFolds ecosys­tem which reimag­ines Julia’s it­er­a­tor pro­to­col and func­tional pro­gram­ming and gives you every­thing you ask for? Yes I do, and it’s the best thing since sliced bread, BUT this ba­sic func­tion­al­ity sim­ply can’t be a pack­age. It needs to be in Base Julia. For ex­am­ple, if I use Arakaki’s pack­ages to cre­ate an iterator”, I can’t it­er­ate over it with a nor­mal Julia for loop, be­cause Julia’s for loops lower to calls to Base.iterate. Also, be­cause JuliaFolds is not Julia’s de­fault it­er­a­tor im­ple­men­ta­tion, and there­fore sees less us­age and de­vel­op­ment than Julia’s built-in it­er­a­tors, the pack­age suf­fers from some com­piler in­fer­ence is­sues and ob­scure er­rors.

...

Read the original on viralinstruction.com »

10 271 shares, 24 trendiness, words and minutes reading time

amirgamil/apollo: A Unix-style personal search engine and web crawler for your digital footprint.

Background

Thesis

Design

Architecture

Data Schema

Workflows

Document Storage

Shut up, how can I use it?

Notes

Future

Inspirations

Apollo is a dif­fer­ent type of search en­gine. Traditional search en­gines (like Google) are great for dis­cov­ery when you’re try­ing to find the an­swer to a ques­tion, but you don’t know what you’re look­ing for.

However, they’re very poor at re­call and syn­the­sis when you’ve seen some­thing be­fore on the in­ter­net some­where but can’t re­mem­ber where. Trying to find it be­comes a night­mare - how can you syn­thez­ize the great ma­te­r­ial on the in­ter­net when you for­got where it even was? I’ve wasted many an hour comb­ing through Google and my search his­tory to look up a good ar­ti­cle, blog post, or just some­thing I’ve seen be­fore.

Even with built in sys­tems to store some of my fa­vorite ar­ti­cles, pod­casts, and other stuff, I for­get things all the time.

Screw find­ing a nee­dle in the haystack. Let’s cre­ate a new type of search to choose which gem you’re look­ing for

Apollo is a search en­gine and web crawler to di­gest your dig­i­tal foot­print. What this means is that you choose what to put in it. When you come across some­thing that looks in­ter­est­ing, be it an ar­ti­cle, blog post, web­site, what­ever, you man­u­ally add it (with built in sys­tems to make do­ing so easy). If you al­ways want to pull in data from a cer­tain data source, like your notes or some­thing else, you can do that too. This tack­les one of the biggest prob­lems of re­call in search en­gines re­turn­ing a lot of ir­rel­e­vant in­for­ma­tion be­cause with Apollo, the sig­nal to noise ra­tio is very high. You’ve cho­sen ex­actly what to put in it.

Apollo is not nec­es­sarly built for raw dis­cov­ery (although it cer­tainly matches re­dis­cov­ery), it’s built for knowl­edge com­pres­sion and trans­for­ma­tion - that is look­ing up things that you’ve pre­vi­ously deemed to be cool

The first thing you might no­tice is that the de­sign is rem­i­nis­cent of the old dig­i­tal com­puter age, back in the Unix days. This is in­ten­tional for many rea­sons. In ad­di­tion to pay­ing homage to the greats of the past, this de­sign makes me feel like I’m search­ing through some­thing that is au­then­ti­cally my own. When I search for stuff, I gen­uinely feel like I’m trav­el­ling through the past.

Apollo’s client side is writ­ten in Poseidon. The client side in­ter­acts with the back­end via a REST-like API which pro­vides end­points for search­ing data and adding a new en­try.

The back­end is writ­ten in Go and is com­posed of a cou­ple of im­por­tant com­po­nents

The web server which serves the end­points

A to­k­enizer and stem­mer used dur­ing search queries and when build­ing the in­verted in­dex on the data

The ac­tual search en­gine which takes a query, to­k­enizes and stems it, finds the rel­e­vant re­sults from the in­verted in­dex us­ing those stemmed to­kens

then ranks re­sults with TF-IDF

A pack­age which pulls in data from a cou­ple of dif­fer­ent sources - if you want to pull data from a cus­tom data source, this is where you should add it.

Two schemas we use, one to first parse the data into some en­coded for­mat. This does not get stored, it’s purely an in­ter­me­di­ate be­fore we trans­form it into a record for our in­verted in­dex. Why is this im­por­tant?

* Because since any data gets parsed into this stan­darized for­mat, you can link any data source you want, if you build your own tool, if you store a lot of data in some ex­ist­ing one, you don’t have to man­u­ally add every­thing. You can pull in data from any data source pro­vided you give the API data in this for­mat.

type Data struct {

ti­tle string //a ti­tle of the record, self-ex­plana­tory

link string //links to the source of a record, e.g. a blog post, web­site, pod­cast etc.

con­tent string //actual con­tent of the record, must be text data

tags []string //list of po­ten­tial high-level doc­u­ment tags you want to add that will be

//indexed in ad­di­tion to the raw data con­tained

//smallest unit of data that we store in the data­base

//this will store each item” in our search en­gine with all of the nec­es­sary in­for­ma­tion

//for the in­ter­verted in­dex

type Record struct {

//unique iden­ti­fier

ID string `json:“id”`

//title

Title string `json:“title”`

//potential link to the source if ap­plic­a­ble

Link string `json:“link”`

//text con­tent to dis­play on re­sults page

Content string `json:“content”`

//map of to­kens to their fre­quency

TokenFrequency map[string]int `json:“tokenFrequency”`

Data comes in many forms and the more var­ied those forms are, the harder it’s to write re­li­able soft­ware to deal with it. If every­thing I wanted to in­dex was just stuff I wrote, life would be easy. All of my notes would prob­a­bly live in one place, so I would just have to grab the data from that data source and chill

The prob­lem is I don’t take a lot of notes and not every­thing I want to in­dex is some­thing I’d take notes of.

So what to do?

Apollo can’t han­dle all types of data, it’s not de­signed to. However in build­ing a search en­gine to in­dex stuff, there are a cou­ple of things I fo­cused on:

Any data that comes from a spe­cific plat­form can be in­te­grated. If you want to in­dex all your Twitter data for ex­am­ple,

this is pos­si­ble since all of the data can be ab­sorbed in a con­stant for­mat, con­verted into the com­pat­i­ble apollo for­mat, and sent off.

So data sources can be eas­ily in­te­grated, this is by de­sign in case I want to pull in data from per­sonal tools.

The harder thing is what about just, what I wil call, writing on the in­ter­net.” I read a lot of stuff on the Internet, much of which I’d like to be able to in­dex, with­out nec­es­sar­ily hav­ing to takes notes on every­thing I read be­cause I’m lazy. The dream would be to just be able to drop a link and have Apollo in­tel­li­gently try to fetch the con­tent, then I can in­dex it with­out hav­ing to go to the post and copy­ing the con­tent, which would be painful and too slow.

This was a large mo­ti­va­tion for the web crawler com­po­nent of the pro­ject

* If it’s writ­ing on the Internet, should be able to post link and aut­ofill pwd

* If it’s a pod­cast episode or any YouTube video, down­load text tran­scrip­tion e.g. this

* If you want to pull data from a cus­tom data source, add it as a file in the pkg/​apollo/​sources folder, fol­low­ing the same rules as some of the ex­am­ples and make sure to add it in the GetData() method of the source.go file in this pack­age

Local records and data from data sources are stored in sep­a­rate JSON files. This is for con­ve­nience.

I also per­son­ally store my Kindle high­lights as a JSON file - I use read.ama­zon.com and a read­wise ex­ten­sion to down­load the ex­ported high­lights for a book. I put any new book JSON files in a kin­dle folder in the outer di­rec­tory and every time the in­verted in­dex is re­com­puted, the kin­dle file takes any new book high­lights, in­te­grate them into the main kin­dle.json file stored in the data folder, then delete the old file.

Although I built Apollo first and fore­most for my­self, I also wanted other peo­ple to be able to use if they found it valu­able. To use Apollo lo­cally

Make sure you have Go in­stalled and youtube-dl which is how we down­load the sub­ti­tles of a video. You can this to in­stall it.

Navigate to the root di­rec­tory of the pro­ject: cd apollo .

Note since Apollo syncs from some per­sonal data sources, you’ll want to re­move them, add your own, or build stuff on top of them. Otherwise the ter­mi­nal wil com­plain if you at­tempt to run it, so:

Navigate to the pkg/​apollo/​sources in your pre­ferred ed­i­tor and re­place the body of the GetData func­tion with re­turn []schema. Data{}

Create a .env file and add PASSWORD= where is what­ever pass­word you want. This is nec­es­sary for adding or scrap­ing the data, you’ll want to prove you’re Amir” i.e. au­then­ti­cate your­self and then you won’t need to do this in the fu­ture. If this is not mak­ing sense, try adding some data on apollo.amir­bolous.com/​add and see what hap­pens.

Go back to the outer di­rec­tory (meanging you should see the files the way GitHub is dis­play­ing them right now) and run go run cmd/​apollo.go in the ter­mi­nal.

Navigate to 127.0.0.1:8993 on your browser

It should be work­ing! You can add data and in­dex data from the data­base

If you run into prob­lems, open an is­sue or DM me on Twitter

As a side note, al­though I want oth­ers to be able to use Apollo, this is not a commercial prod­uct” so feel free to open a fea­ture re­quest if you’d like one but it’s un­likely I will get to it un­less it be­comes some­thing I per­son­ally want to use.

* The in­verted in­dex is re-gen­er­ated once every n num­ber of days (currently for n = 3)

* Since this is not a com­mer­cial prod­uct, I will not be run­ning your ver­sion of this (if you find it use­ful) on my server. However, al­thought I de­signed this, first and fore­most for my­self, I want other peo­ple to be able to use if this is some­thing that’s use­ful, re­fer to How can I use this

* I had the choice be­tween us­ing Go’s gob pack­age for the data­base/​in­verted in­dex and JSON. The gob pack­age is def­i­nitely faster how­ever it’s only na­tive in Go so I de­cided to go with JSON to make the data avail­able in the fu­ture for po­ten­tially any non-Go in­te­gra­tions and be able to switch the in­fra­struc­ture com­pletely if I want to etc.

* I use a ported ver­sion of the Go snow­ball al­go­rithm for my stem­mer. Although I would have like to build my own stem­mer, im­ple­ment­ing a ro­bust one (which is what I wanted) was not the fo­cus of the pro­ject. Since the al­go­rithm for a stem­mer does not need to be main­tined like other types of soft­ware, I de­cided to use one out of the box. If I write my own in the fu­ture, I’ll swap it out.

* Improve the search al­go­rithm, more like Elasticsearch when data grows a lot?

* Improve the web crawler - make more ro­bust like mer­cury parser, maybe write my own

...

Read the original on github.com »

To add this web app to your iOS home screen tap the share button and select "Add to the Home Screen".

10HN is also available as an iOS App

If you visit 10HN only rarely, check out the the best articles from the past week.

If you like 10HN please leave feedback and share

Visit pancik.com for more.