19

I am about to launch something really big. I need to prepare my server and database.

I would like to group each set of 100,000 users in separate user tables but I do not know how to associate one user trying to log in to the appropriate user table.

For instance, how would I know that user jay@mail.com is related to the user table #36?

Would it be the same to have 10 millions users in one user table or 100 of 100,000?

How does Facebook? I can't believe they would have one global user table with 950 million entries.

JNK
  • 17,858
  • 5
  • 58
  • 97
  • `I can't believe they would have one global user table with 950 million entries.` I can, its not *that* big. I have worked with larger tables. Its pretty common. The other option I would consider if you have lots of other data is a [NoSQL](http://en.wikipedia.org/wiki/NoSQL) database. – NimChimpsky Jul 31 '12 at 09:29
  • 5
    If you are planning to have a large number of users and a large amount of data, you need to hire a database specialist to design that. I wouldn't look at anyone who doesn't have at least ten years of database experience and at least 5 years of large database design experience. This is a complex subjetc that requires extensive knowledge. – HLGEM Jul 31 '12 at 15:26

5 Answers5

32

You're not going to have a billion users tomorrow and MySQL can handle several million rows without any problem. I have 5 million users in my user table and trust me, it isn't even on my radar of things to worry about.

Don't worry about sharding until you need to do it. You are attempting to optimize prematurely for a problem that may or may not ever exist and in the process, you will severely cripple the rate at which you can innovate. Be fast to launch and find the problems as they come. You cannot predict in advance what your scaling challenges will be.

When and if you ever reach this scale, you will then have quite a bit of money and resources to throw at this kind of problem.

Aaron Brown
  • 5,022
  • 23
  • 25
  • 4
    `Be fast to launch and find the problems as they come` this part is excellent. That's true. If we find problems as they come there won't be any serious problem at later times. +1 – Alireza Aug 01 '12 at 13:02
15

I'm not sure if external consultants would be the better support for your company if you are going to handle really large datasets and you need to start from the ground. Please don't get me wrong, but if ones screws up a project with so many customers, it will have PR impact on your company.

Regarding 10M tuples in one table, if you have good indexing it will be fine. We need to store several 100M tuples in one table here (sold items) which works fine on a large oracle 11g

Here is a posting from 2010 with a map of facebooks db design: Facebook database design

You may want to read the mysql documentation about partition types like this: MySQL documentation: Partinioning

MySQL supports these types:

RANGE partitioning. This type of partitioning assigns rows to partitions based on column values falling within a given range. See Section 18.2.1, “RANGE Partitioning”.

LIST partitioning. Similar to partitioning by RANGE, except that the partition is selected based on columns matching one of a set of discrete values. See Section 18.2.2, “LIST Partitioning”.

HASH partitioning. With this type of partitioning, a partition is selected based on the value returned by a user-defined expression that operates on column values in rows to be inserted into the table. The function may consist of any expression valid in MySQL that yields a nonnegative integer value. An extension to this type, LINEAR HASH, is also available. See Section 18.2.3, “HASH Partitioning”.

KEY partitioning. This type of partitioning is similar to partitioning by HASH, except that only one or more columns to be evaluated are supplied, and the MySQL server provides its own hashing function. These columns can contain other than integer values, since the hashing function supplied by MySQL guarantees an integer result regardless of the column data type. An extension to this type, LINEAR KEY, is also available. See Section 18.2.4, “KEY Partitioning”.

Goose
  • 299
  • 1
  • 3
  • 12
7

First of all, don't separate users into separate tables. It'll make the things complex and pointless. Databases like MySQL and other can work with the databases of millions of records in the same table without any problem (having the right PRIMARY KEYS set up). Use the database AUTO_INCREMENT AND PRIMARY unique key field for each user (in the main user table), so every record is unique (UID). Then in the other tables you're referencing using that unique id. Then make sure that in every table you have it set as PRIMARY KEY, it'll speed up processing of the information in the database server. You can learn from Drupal CMS how it's storing the user information. Tested in more than 10 years by millions of users and very large companies (used by large media companies, government, even biggest banks in the world). On www.drupal.org you'll find more than 1,6 millions of pages (nodes) stored in the same table and it has more than million unique visitors per month and the website works without a glitch. Everything is about the proper optimization and configuration.

After 10 millions of records, if you're not happy with the performance (after proper optimization and db config changes), then you can decide if you really want to separate users by different tables. So you can actually extend the functionality by adding new table which is having information about where the users records are kept: UID and table_name. Then in any of other tables request these information, this table will seek for the right table. But I really advise you to have one big table for users, unless you have more than 10-100 millions records. But it'll not improve the performance much (databases are designed to deal with the huge data). It's better to keep the information simple. Usually companies just decide for another database server (master and slaves), and another, then they're working together with the load balancing functionality. If you'll have those 10 millions users, you could pay for another db server, right?

See the example of user table schema in user.install file.

kenorb
  • 445
  • 1
  • 5
  • 11
3

As the other answers suggest, its is not a good idea to split the users into multiple tables. Most of the databases with indexes on the userid, can handle million rows. However, the latency per query may increase depending on the total number of entries in the index. As long as the dataset is small, you can manage with single table in normal databases.

I will try to thrown in a different idea also for your future consideration if you grow much beyond a million records or so. With such large number of customers, you dont want any downtime etc. So, there are bunch of nosql databases that you may want to look at. They will do the sharding for you instead of you managing the sharding yourself from the application. They will also give data redundancy and hence more uptime. Facebook and all heavily use memcache etc for their cache. But I am not sure what they use for their permanent store.

One important thing you should note is that you cannot do joins etc with nosql databases. So, plan for your usecase and decide. If joins and multi-record transactions are a necessity for you then nosql databases are not for you.

sunil
  • 421
  • 2
  • 2
-3

why not divide based on the alphabetical range? If you will have millions of users, create a separate table for each letter or for pair of letters (table 'a' for users with username starting with 'a'). It will be much overhead at first but since you are expecting big database and want to be able to distinguish which table should be used for particular user - i guess the alphabetical order is the obvious and easiest choice.

mnmnc
  • 119
  • 3
  • 9
    This is a super bad idea. For instance, your software will have to automatically migrate rows if users change last name....unless you stop caring about consistency. This strategy invites those types of contingencies. – randomx Jul 31 '12 at 15:34