Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "Linking Dependencies" during npm / yarn install really do?

For large web apps npm install resp. yarn install does take a lot of time, mostly in a step called Linking Dependencies. What is happening here? Is it fetching the dependencies of the dependencies? Or something completely different? Which files are created during this step?

like image 710
Zardoz Avatar asked Jun 04 '18 14:06

Zardoz


People also ask

Does yarn automatically install dependencies?

If you have just checked out a package from version control, you will need to install those dependencies. If you are adding dependencies for your project, then those dependencies are automatically installed during that process.

What happens when you run yarn install?

yarn install is used to install all dependencies for a project. This is most commonly used when you have just checked out code for a project, or when another developer on the project has added a new dependency that you need to pick up.

Is it OK to use npm and yarn together?

Yarn can consume the same package. json format as npm, and can install any package from the npm registry. This will lay out your node_modules folder using Yarn's resolution algorithm that is compatible with the node.

What is the difference between npm install and yarn install?

Speed and Performance As mentioned above, while NPM installs dependency packages sequentially, Yarn installs in-parallel. Because of this, Yarn performs faster than NPM when installing larger files. Both tools also offer the option of saving dependency files in the offline cache.


1 Answers

When you call yarn install, the following things happen in order:

  1. Resolution: Yarn starts resolving dependencies by making requests to the registry and recursively looking up each dependency.

  2. Downloading/Fetching: Next, Yarn looks in a global cache directory to see if the package needed has already been downloaded. If it hasn't, Yarn fetches the tarball for the package and places it in the global cache so it can work offline and won't need to download dependencies more than once. Dependencies can also be placed in source control as tarballs for full offline installs.

  3. Linking: Finally, Yarn links everything together by copying all the files needed from the global cache into the local node_modules directory after identifying what's already there and what's not there.


yarn install does take a lot of time, mostly in a step called Linking Dependencies

You should notice that Step 3: Linking is taking more time than Step 1: Resolution and Step 2: Fetching where the actual download happens. During by this step we already have things that we need ready and downloaded, then why is it taking long, did we miss anything?

Yes, COPY to local project into node_modules folder...! The reason for this is that this copy is not equivalent to copying one large 4.7GB ISO file. Instead it's multiple super small files (Don't take it light when I say multiple, it can be 15k+ files :P ), hence take a lot of time to copy. (Also, it is important to note that when you download the packages, you download one large tar file per package, whose contents should then be extracted into the cache which also takes time)

It is slower due to

  • Anti-virus: Your antivirus is sitting in the middle and doing a quick inspect (in addition to our yarn checking if it already exists) on every single file yarn is trying to copy cutting its speed by so much. If you are on Windows, try adding your project's parent folder as exception to Windows Defender.
  • Storage medium's transfer rate: SSDs can improve this speed hugely (Sorry, SSHDs and FireCudas will not help either, this is gonna be one time).

But is this efficient? Can I have it taken from the global node_modules (after creating one)?

Nope for both questions. Because of the way node works each package finds its dependencies only relative to its own location. Also because each project may want to use different versions of the same package to ensure its working properly and not broken by package updates.

Ideally, the project folder should be lean. An efficient way of doing this would be to have a global node_modules folder. Any and all requested packages are downloaded if not already present AND used from this location. Actually Ruby does it this way. Here's my global Ruby's equivalent of node_modules folder. Notice the presence of different versions of the same package for use in different projects.

enter image description here

But keep in mind that it would reduce project portability. It's a trade-off that any manager (be it rubygems or node modules) has to make. I can just copy the node project folder (which in fact may take hours because you will be copying the (local) node_modules folder as well, but I can expect it to work if I have just that project folder, as opposed to copying a ruby project would only some seconds to few minutes, as there is no local packages (or gems as they call them) folder, but running the project on different system would require those packages to be present on the global gems folder.

like image 101
Saravanabalagi Ramachandran Avatar answered Oct 14 '22 08:10

Saravanabalagi Ramachandran