Questions tagged [large-data]

24 questions
96
votes
11 answers

Transfer 10 TB of files from USA to UK datacenter

I am migrating my server from the USA to the UK from one data center to another. My host said I should be able to achieve 11 megabytes per second. The operating system is Windows Server 2008 at both ends. My average file size is around 100 MB and…
Paul Hinett
  • 1,205
  • 3
  • 11
  • 19
78
votes
3 answers

In Bash, are wildcard expansions guaranteed to be in order?

Is the expansion of a wildcard in Bash guaranteed to be in alphabetical order? I am forced to split a large file into 10 Mb pieces so that they can be be accepted by my Mercurial repository. So I was thinking I could use: split -b 10485760 Big.file…
Sled
  • 927
  • 1
  • 7
  • 11
9
votes
3 answers

How do large companies backup their data?

How do companies who handle large amounts of data, for example Google or Facebook, backup everything? According to this Google platform article on Wikipedia, Google has an estimated 450,000+ servers each with a 80+ GB hard disk. That's a lot of…
Olivier Lalonde
  • 753
  • 3
  • 13
  • 20
8
votes
8 answers

24TB RAID 6 configuration

I am in charge of a new website in a niche industry that stores lots of data (10+ TB per client, growing to 2 or 3 clients soon). We are considering ordering about $5000 worth of 3TB drives (10 in a RAID 6 configuration and 10 for backup), which…
Phil
  • 1,013
  • 2
  • 12
  • 16
8
votes
2 answers

How to overwrite a very large hard drive (18TB) with random data using shell commands in Linux

I would like to overwrite a very large hard drive (18TB) with random bytes, to then check smart data for reallocated sectors or other errors. Since badblocks has some limitations on number of blocks it will work with in a single run, I have tried…
Ján Lalinský
  • 282
  • 1
  • 11
7
votes
2 answers

Import a 260GB csv file into MySQL

I have a really big csv ~260GB and I want to import it into MySQL. I use the following mysql script in MacOS DROP TABLE IF EXISTS tmp_catpath_5; create table tmp_catpath_5( a1 BIGINT(20),a2 BIGINT(20),a3 BIGINT(20),a4 BIGINT(20),a5 BIGINT(20), …
jimkont
  • 173
  • 1
  • 6
5
votes
2 answers

Is rsync a good candidate for failover implementation (very large dataset)?

I have a large a set of data (+100 GB) which can be stored into files. Most of the files would be in the 5k-50k range (80%), then 50k - 500k (15%) and >500k (5%). The maximum expected size of a file is 50 MB. If necessary, large files can be split…
Jérôme Verstrynge
  • 4,787
  • 7
  • 24
  • 35
4
votes
3 answers

Can I validate a large file download piecemeal over http

I'm downloading a large file over http via wget, 1.2TB. The download takes about a week and has contained corruptions twice now (failed md5 check, which takes days to run by itself). Is there a good way to validate the file piecemeal over http using…
davidparks21
  • 928
  • 1
  • 12
  • 27
4
votes
2 answers

4TB HGST SATA drive only shows 1.62 TB in Windows Server 2012

I'm using a Supermicro X9SRE-3F motherboard with the latest BIOS and 2x 4TB drives connected to the on-board SATA controller. If I set the BIOS to RAID and create a RAID 1 array, the array shows up in the BIOS as 3.6TB. However when I boot Windows…
user136085
  • 41
  • 1
  • 2
4
votes
2 answers

Tomcat Denial of Service due to large packets

I had asked this question on ITSecurity, but I felt this question is better placed here. On a recent assesment, I found that sending large (>5 MB) requests to a tomcat server causes 100% CPU usage on the server. The simplest fix that came to mind…
sudhacker
  • 143
  • 6
2
votes
1 answer

Reading 65k rows hangs PHP/MySQL

I am developing a PHP application that processes IP adresses. I am working with mysql tables containing up to 4 billion rows. I have a script that currently needs to fetch 65536 adresses from this table and the mysql<->php interface fails to give a…
BlackPage
  • 21
  • 2
2
votes
2 answers

Increasing the size available to, or changing location of, /tmp

I am currently in charge of a server running Red Hat and a bioinformatics webapp that deals with huge files, some of which are over 100GB when uncompressed. The act of decompressing these files is done by several different programs, all of which use…
DeeDee
  • 333
  • 2
  • 7
  • 16
1
vote
2 answers

Moving very large (~100 Gb) from one server to the other

We're moving servers, and I need to transfer all the data from Server A to Server B. I have a tar.gz of about 100Gb that contains all the Server A files. I'd really like to avoid downloading the file locally on my computer, and uploading it to…
1
vote
2 answers

Is it better to use rsync over SSHFS or over CIFS as remote repository, having no option for rsyncd?

I have a NAS that is only capable of CIFS, AFS, SSH (no rsyncd capabilities, no NFS). I have to backup very large files (vm images) and I usually set a rsync server on the backup device then I do a rsync --inplace to transfer only the block level…
penzoiders
  • 63
  • 2
  • 6
1
vote
1 answer

mysqld : multiple tmpdir & balancing

Our 1Tb tempdir can sometimes be completely used by mysqld, resulting in disk-full and query errors. This can be due to lots of mid-size queries, or couple of very big queries. We have a 5Tb raid drive I could use to expand this tempdir. The manual…
1
2