Questions tagged [compiler-optimization]

Compiler optimization involves adapting a compiler to reduce run-time or object size or both. This can be accomplished using compiler arguments (i.e. CFLAGS, LDFLAGS), compiler plugins (DEHYDRA for instance) or direct modifications to the compiler (such as modifying source code).

3117 questions
2428
votes
10 answers

Why are elementwise additions much faster in separate loops than in a combined loop?

Suppose a1, b1, c1, and d1 point to heap memory, and my numerical code has the following core loop. const int n = 100000; for (int j = 0; j < n; j++) { a1[j] += b1[j]; c1[j] += d1[j]; } This loop is executed 10,000 times via another outer…
Johannes Gerer
  • 25,508
  • 5
  • 29
  • 35
2296
votes
12 answers

Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a, but the call pow(a,6) is not optimized and will actually call the library function pow,…
xis
  • 24,330
  • 9
  • 43
  • 59
1619
votes
11 answers

Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs

I was looking for the fastest way to popcount large arrays of data. I encountered a very weird effect: Changing the loop variable from unsigned to uint64_t made the performance drop by 50% on my PC. The Benchmark #include #include…
gexicide
  • 38,535
  • 21
  • 92
  • 152
983
votes
9 answers

Swift Beta performance: sorting arrays

I was implementing an algorithm in Swift Beta and noticed that the performance was very poor. After digging deeper I realized that one of the bottlenecks was something as simple as sorting arrays. The relevant part is here: let n = 1000000 var x = …
Jukka Suomela
  • 12,070
  • 6
  • 40
  • 46
511
votes
2 answers

Why do we use the volatile keyword?

Possible Duplicate: Why does volatile exist? I have never used it but I wonder why people use it? What does it exactly do? I searched the forum, I found it only C# or Java topics.
Nawaz
  • 353,942
  • 115
  • 666
  • 851
509
votes
6 answers

Why does GCC generate 15-20% faster code if I optimize for size instead of speed?

I first noticed in 2009 that GCC (at least on my projects and on my machines) have the tendency to generate noticeably faster code if I optimize for size (-Os) instead of speed (-O2 or -O3), and I have been wondering ever since why. I have managed…
Ali
  • 56,466
  • 29
  • 168
  • 265
403
votes
1 answer

Why does the Rust compiler not optimize code assuming that two mutable references cannot alias?

As far as I know, reference/pointer aliasing can hinder the compiler's ability to generate optimized code, since they must ensure the generated binary behaves correctly in the case where the two references/pointers indeed alias. For instance, in the…
Zhiyao
  • 4,152
  • 2
  • 12
  • 21
317
votes
12 answers

How to compile Tensorflow with SSE4.2 and AVX instructions?

This is the message received from running a script to check if Tensorflow is working: I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:125]…
GabrielChu
  • 6,026
  • 10
  • 27
  • 42
198
votes
5 answers

How to see which flags -march=native will activate?

I'm compiling my C++ app using GCC 4.3. Instead of manually selecting the optimization flags I'm using -march=native, which in theory should add all optimization flags applicable to the hardware I'm compiling on. But how can I check which flags is…
vartec
  • 131,205
  • 36
  • 218
  • 244
198
votes
2 answers

What is &&& operation in C

#include volatile int i; int main() { int c; for (i = 0; i < 3; i++) { c = i &&& i; printf("%d\n", c); } return 0; } The output of the above program compiled using gcc is 0 1 1 With the -Wall or…
manav m-n
  • 11,136
  • 23
  • 74
  • 97
195
votes
3 answers

Why can lambdas be better optimized by the compiler than plain functions?

In his book The C++ Standard Library (Second Edition) Nicolai Josuttis states that lambdas can be better optimized by the compiler than plain functions. In addition, C++ compilers optimize lambdas better than they do ordinary functions. (Page…
Stephan Dollberg
  • 32,985
  • 16
  • 81
  • 107
190
votes
3 answers

Why does GCC generate such radically different assembly for nearly the same C code?

While writing an optimized ftol function I found some very odd behaviour in GCC 4.6.1. Let me show you the code first (for clarity I marked the differences): fast_trunc_one, C: int fast_trunc_one(int i) { int mantissa, exponent, sign, r; …
orlp
  • 112,504
  • 36
  • 218
  • 315
183
votes
4 answers

Can I hint the optimizer by giving the range of an integer?

I am using an int type to store a value. By the semantics of the program, the value always varies in a very small range (0 - 36), and int (not a char) is used only because of the CPU efficiency. It seems like many special arithmetical optimizations…
rolevax
  • 1,670
  • 1
  • 14
  • 21
151
votes
2 answers

Limits of Nat type in Shapeless

In shapeless, the Nat type represents a way to encode natural numbers at a type level. This is used for example for fixed size lists. You can even do calculations on type level, e.g. append a list of N elements to a list of K elements and get back a…
Rüdiger Klaehn
  • 12,445
  • 3
  • 41
  • 57
150
votes
11 answers

How do I make an infinite empty loop that won’t be optimized away?

The C11 standard appears to imply that iteration statements with constant controlling expressions should not be optimized out. I'm taking my advice from this answer, which specifically quotes section 6.8.5 from the draft standard: An iteration…
nneonneo
  • 171,345
  • 36
  • 312
  • 383
1
2 3
99 100