Creating a movie db and I dont like the idea of giving each actor/actress and also each tag its own row as if there are 10 million moives total, each has a cast of atleast 20-30 people we will have 200-300 million rows in the table.
And it gets more complex with tags which can be unlimited per movie. So how to best store these 3 items? Ideally these can be modeled as a Many to Many but still it will have hundreds of millions of rows. Any better suggestions on storing these? I am using MySQL.
I would dump it all in a textfile but I need to link actors between movies and do some analytics also and allow users to rate actors find movies by tag, etc so need to use a DB.
10 million movies seems pretty ambitious. IMDb's current statistics show that they have less than 1.8M titles and around 3.9M people.
Having said that, I see no problem with creating a table of titles, a table of actors, and a junction table to resolve the many-to-many relationship between the two. The same holds true for tags.

It sounds perhaps a bit of premature optimization here. You could denormalize all actors into a TEXT column of some kind onto the Movie table, but your performance + search would suffer, as well as losing all benefits of relational data.
Suggest to keep the normalized schema, as you were originally thinking:
Movie (ID)
Actor (ID)
Tag (ID) --horror, comedy, etc.
MovieActor (MovieID, ActorID)
MovieTag (MovieID, TagID)
MovieActor and MovieTag.Regardless of the number of movies, or whether the data is DNA sequences: implement a design, test it, judge its performance based on your requirements (user acceptance, SLA, etc)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With