]> git.openfabrics.org - ~shefty/rdma-dev.git/log
~shefty/rdma-dev.git
13 years agoMerge branch 'perf/stat' into perf/core
Ingo Molnar [Fri, 6 May 2011 19:07:33 +0000 (21:07 +0200)]
Merge branch 'perf/stat' into perf/core

Merge reason: the perf stat improvements are tested and ready now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf events, x86: Add SandyBridge stalled-cycles-frontend/backend events
Lin Ming [Fri, 6 May 2011 07:14:02 +0000 (07:14 +0000)]
perf events, x86: Add SandyBridge stalled-cycles-frontend/backend events

Extend the Intel SandyBridge PMU driver with definitions
for generic front-end and back-end stall events.

( As commit 3011203 "perf events, x86: Add Westmere stalled-cycles-frontend/backend
  events" says, these are only approximations. )

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1304666042-17577-1-git-send-email-ming.m.lin@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf events: Clean up definitions and initializers, update copyrights
Ingo Molnar [Wed, 4 May 2011 06:42:29 +0000 (08:42 +0200)]
perf events: Clean up definitions and initializers, update copyrights

Fix a few inconsistent style bits that were added over the past few
months.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-yv4hwf9yhnzoada8pcpb3a97@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agohw breakpoints: Move to kernel/events/
Borislav Petkov [Tue, 3 May 2011 13:26:43 +0000 (15:26 +0200)]
hw breakpoints: Move to kernel/events/

As part of the events sybsystem unification, relocate hw_breakpoint.c
into its new destination.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
13 years agoperf: Start the restructuring
Borislav Petkov [Tue, 26 Oct 2010 18:24:03 +0000 (20:24 +0200)]
perf: Start the restructuring

mv kernel/perf_event.c -> kernel/events/core.c. From there, all further
sensible splitting can happen. The idea is that due to perf_event.c
becoming pretty sizable and with the advent of the marriage with ftrace,
splitting functionality into its logical parts should help speeding up
the unification and to manage the complexity of the subsystem.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
13 years agoMerge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Ingo Molnar [Sun, 1 May 2011 17:11:42 +0000 (19:11 +0200)]
Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core

13 years agoMerge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/roste...
Ingo Molnar [Sun, 1 May 2011 17:09:39 +0000 (19:09 +0200)]
Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core

13 years agoftrace: Consolidate the function match routines for normal and mods
Steven Rostedt [Fri, 29 Apr 2011 00:32:08 +0000 (20:32 -0400)]
ftrace: Consolidate the function match routines for normal and mods

The code used for matching functions is almost identical between normal
selecting of functions and using the :mod: feature of set_ftrace_notrace.

Consolidate the two users into one function.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
13 years agoftrace: Consolidate updating of ftrace_trace_function
Steven Rostedt [Thu, 28 Apr 2011 01:43:36 +0000 (21:43 -0400)]
ftrace: Consolidate updating of ftrace_trace_function

There are three locations that perform almost identical functions in order
to update the ftrace_trace_function (the ftrace function variable that gets
called by mcount).

Consolidate these into a single function called update_ftrace_function().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
13 years agoftrace: Move record update for normal and modules into a separate function
Steven Rostedt [Tue, 26 Apr 2011 20:11:03 +0000 (16:11 -0400)]
ftrace: Move record update for normal and modules into a separate function

The updating of a function record is moved to a single function. This will allow
us to add specific changes in one location for both modules and kernel
functions.

Later patches will determine if the function record itself needs to be updated
(which enables the mcount caller), or just the ftrace_ops needs the update.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
13 years agoftrace: Remove FTRACE_FL_CONVERTED flag
Steven Rostedt [Mon, 25 Apr 2011 18:32:42 +0000 (14:32 -0400)]
ftrace: Remove FTRACE_FL_CONVERTED flag

Since we disable all function tracer processing if we detect
that a modification of a instruction had failed, we do not need
to track that the record has failed. No more ftrace processing
is allowed, and the FTRACE_FL_CONVERTED flag is pointless.

The FTRACE_FL_CONVERTED flag was used to denote records that were
successfully converted from mcount calls into nops. But if a single
record fails, all of ftrace is disabled.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
13 years agoftrace: Remove FTRACE_FL_FAILED flag
Steven Rostedt [Fri, 22 Apr 2011 03:16:46 +0000 (23:16 -0400)]
ftrace: Remove FTRACE_FL_FAILED flag

Since we disable all function tracer processing if we detect
that a modification of a instruction had failed, we do not need
to track that the record has failed. No more ftrace processing
is allowed, and the FTRACE_FL_FAILED flag is pointless.

Removing this flag simplifies some of the code, but some ftrace_disabled
checks needed to be added or move around a little.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
13 years agoftrace: Remove failures file
Steven Rostedt [Fri, 22 Apr 2011 02:59:12 +0000 (22:59 -0400)]
ftrace: Remove failures file

The failures file in the debugfs tracing directory would list the
functions that failed to convert when the old dead ftrace daemon
tried to update code but failed. Since this code is now dead along
with the daemon the failures file is useless. Remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
13 years agoftrace: Remove unnecessary disabling of irqs
Steven Rostedt [Fri, 22 Apr 2011 02:41:35 +0000 (22:41 -0400)]
ftrace: Remove unnecessary disabling of irqs

The disabling of interrupts around ftrace_update_code() was used
to protect against the evil ftrace daemon from years past. But that
daemon has long been killed. It is safe to keep interrupts enabled
while updating the initial mcount into nops.

The ftrace_mutex is also held which keeps other users at bay.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
13 years agoftrace: Make FTRACE_WARN_ON() work in if condition
Steven Rostedt [Fri, 29 Apr 2011 14:36:31 +0000 (10:36 -0400)]
ftrace: Make FTRACE_WARN_ON() work in if condition

Let FTRACE_WARN_ON() be used as a stand alone statement or
inside a conditional: if (FTRACE_WARN_ON(x))

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
13 years agoftrace: Only update the function code on write to filter files
Steven Rostedt [Sat, 30 Apr 2011 02:35:33 +0000 (22:35 -0400)]
ftrace: Only update the function code on write to filter files

If function tracing is enabled, a read of the filter files will
cause the call to stop_machine to update the function trace sites.
It should only call stop_machine on write.

Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
13 years agoperf stat: Tell user about unsupported events in the list
David Ahern [Fri, 29 Apr 2011 22:04:15 +0000 (16:04 -0600)]
perf stat: Tell user about unsupported events in the list

Similar to perf-record, tell user about unsupported events
that will not be counted if invoked in verbose mode.

e.g.,

 $ perf stat -e dTLB-prefetch-misses -v -- sleep 1
 dTLB-prefetch-misses event is not supported by the kernel.
 dTLB-prefetch-misses: 0 0 0

 Performance counter stats for 'sleep 1':

     <not counted> dTLB-prefetch-misses

        1.001884783  seconds time elapsed

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1304114655-10600-1-git-send-email-dsahern@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf list: Fix max event string size
Ingo Molnar [Fri, 29 Apr 2011 20:52:42 +0000 (22:52 +0200)]
perf list: Fix max event string size

Recent stalled-cycles event names were larger than the 40 chars printout
used by perf list.

Extend that, make it robust for future extensions and also adjust alignments
in face of wider event names.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n009io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf events, x86: Add Westmere stalled-cycles-frontend/backend events
Ingo Molnar [Sat, 30 Apr 2011 07:14:54 +0000 (09:14 +0200)]
perf events, x86: Add Westmere stalled-cycles-frontend/backend events

Extend the Intel Westmere PMU driver with definitions for generic front-end and
back-end stall events.

( These are only approximations. )

Reported-by: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n008io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Fail softly on unsupported events
Ingo Molnar [Fri, 29 Apr 2011 14:11:03 +0000 (16:11 +0200)]
perf stat: Fail softly on unsupported events

David Ahern reported this perf stat failure:

> # /tmp/build-perf/perf stat -- sleep 1
>   Error: stalled-cycles-frontend event is not supported.
>   Fatal: Not all events could be opened.
>
> This is a Dell R410 with an E5620 processor.

Fail in a softer fashion on unknown/unsupported events.

Reported-by: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n006io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Leave more room for percentages
Ingo Molnar [Sat, 30 Apr 2011 07:03:15 +0000 (09:03 +0200)]
perf stat: Leave more room for percentages

Triple digit percentages do not fit otherwise.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n005io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Adjust stall cycles warning percentages
Ingo Molnar [Fri, 29 Apr 2011 12:16:18 +0000 (14:16 +0200)]
perf stat: Adjust stall cycles warning percentages

Adjust to color thresholds to better match the percentages seen in
real workloads. Both are now a bit more sensitive.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n004io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Analyze front-end and back-end stall counts
Ingo Molnar [Fri, 29 Apr 2011 11:49:08 +0000 (13:49 +0200)]
perf stat: Analyze front-end and back-end stall counts

Sample output:

 Performance counter stats for './loop_1b':

        873.691065 task-clock               #    1.000 CPUs utilized
                 1 context-switches         #    0.000 M/sec
                 1 CPU-migrations           #    0.000 M/sec
                96 page-faults              #    0.000 M/sec
     2,012,637,222 cycles                   #    2.304 GHz                      (66.58%)
     1,001,397,911 stalled-cycles-frontend  #   49.76% frontend cycles idle     (66.58%)
         7,523,398 stalled-cycles-backend   #    0.37%  backend cycles idle     (66.76%)
     2,004,551,046 instructions             #    1.00  insns per cycle
                                            #    0.50  stalled cycles per insn  (66.80%)
     1,001,304,992 branches                 # 1146.063 M/sec                    (66.76%)
            39,453 branch-misses            #    0.00% of all branches          (66.64%)

        0.874046121  seconds time elapsed

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n003io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf tools: Add front-end and back-end stalled cycles support
Ingo Molnar [Fri, 29 Apr 2011 12:41:28 +0000 (14:41 +0200)]
perf tools: Add front-end and back-end stalled cycles support

Update perf tooling to deal with front-end and back-end stalled cycles events.

Add both the default 'perf stat' output.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n002io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf, x86: Add new stalled cycles events for Intel and AMD CPUs
Ingo Molnar [Fri, 29 Apr 2011 12:17:19 +0000 (14:17 +0200)]
perf, x86: Add new stalled cycles events for Intel and AMD CPUs

Extend the Intel and AMD event definitions with generic front-end and
back-end stall events.

( These are only approximations - suggestions are welcome for better events. )

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n001io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf events: Add generic front-end and back-end stalled cycle event definitions
Ingo Molnar [Fri, 29 Apr 2011 11:19:47 +0000 (13:19 +0200)]
perf events: Add generic front-end and back-end stalled cycle event definitions

Add two generic hardware events: front-end and back-end stalled cycles.

These events measure conditions when the CPU is executing code but its
capabilities are not fully utilized. Understanding such situations and
analyzing them is an important sub-task of code optimization workflows.

Both events limit performance: most front end stalls tend to be caused
by branch misprediction or instruction fetch cachemisses, backend
stalls can be caused by various resource shortages or inefficient
instruction scheduling.

Front-end stalls are the more important ones: code cannot run fast
if the instruction stream is not being kept up.

An over-utilized back-end can cause front-end stalls and thus
has to be kept an eye on as well.

The exact composition is very program logic and instruction mix
dependent.

We use the terms 'stall', 'front-end' and 'back-end' loosely and
try to use the best available events from specific CPUs that
approximate these concepts.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n000io7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Fix compatibility behavior
Ingo Molnar [Thu, 28 Apr 2011 06:48:42 +0000 (08:48 +0200)]
perf stat: Fix compatibility behavior

Instead of failing on an unknown event, when new perf stat is run on
older kernels:

  $ ./perf stat true
  Error: open_counter returned with 22 (Invalid argument). /bin/dmesg
  may provide additional information.

  Fatal: Not all events could be opened.

Just ignore EINVAL and ENOSYS, we'll print the results as not counted:

 Performance counter stats for 'true':

          0.239483 task-clock               #    0.493 CPUs utilized
                 0 context-switches         #    0.000 M/sec
                 0 CPU-migrations           #    0.000 M/sec
                86 page-faults              #    0.359 M/sec
           704,766 cycles                   #    2.943 GHz
     <not counted> stalled-cycles
           381,961 instructions             #    0.54  insns per cycle
            69,626 branches                 #  290.735 M/sec
             4,594 branch-misses            #    6.60% of all branches

        0.000485883  seconds time elapsed

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio5hjpn3dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Add --sync/-S option
Ingo Molnar [Thu, 28 Apr 2011 16:17:11 +0000 (18:17 +0200)]
perf stat: Add --sync/-S option

--sync will tell perf stat to run sync() before starting a command.

This allows IO-heavy tests to be used with --repeat, without one
iteration impacting the other.

Elapsed time will stabilize for example:

  before:        3.971525714  seconds time elapsed  ( +-  8.56% )
  after:         3.211098537  seconds time elapsed  ( +-  1.52% )

So measurements will be more accurate.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn1dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf event, x86: Use better stalled cycles metric
Ingo Molnar [Thu, 28 Apr 2011 09:16:44 +0000 (11:16 +0200)]
perf event, x86: Use better stalled cycles metric

Use the UOPS_EXECUTED.*,c=1,i=1 event on Intel CPUs - it is a rather
good indicator of CPU execution stalls, more sensitive and more inclusive
than the 0xa2 resource stalls event (which does not count nearly as many
stall types).

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn2dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf, x86, nmi: Move LVT un-masking into irq handlers
Don Zickus [Wed, 27 Apr 2011 10:32:33 +0000 (06:32 -0400)]
perf, x86, nmi: Move LVT un-masking into irq handlers

It was noticed that P4 machines were generating double NMIs for
each perf event.  These extra NMIs lead to 'Dazed and confused'
messages on the screen.

I tracked this down to a P4 quirk that said the overflow bit had
to be cleared before re-enabling the apic LVT mask.  My first
attempt was to move the un-masking inside the perf nmi handler
from before the chipset NMI handler to after.

This broke Nehalem boxes that seem to like the unmasking before
the counters themselves are re-enabled.

In order to keep this change simple for 2.6.39, I decided to
just simply move the apic LVT un-masking to the beginning of all
the chipset NMI handlers, with the exception of Pentium4's to
fix the double NMI issue.

Later on we can move the un-masking to later in the handlers to
save a number of 'extra' NMIs on those particular chipsets.

I tested this change on a P4 machine, an AMD machine, a Nehalem
box, and a core2quad box.  'perf top' worked correctly along
with various other small 'perf record' runs.  Anything high
stress breaks all the machines but that is a different problem.

Thanks to various people for testing different versions of this
patch.

Reported-and-tested-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CC: Cyrill Gorcunov <gorcunov@gmail.com>
13 years agoperf stat: Fix printout vertical alignment
Ingo Molnar [Thu, 28 Apr 2011 00:57:53 +0000 (02:57 +0200)]
perf stat: Fix printout vertical alignment

Before:

 |
 | Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
 |
 |        71,321,607 instructions:u           #    0.42  insns per cycle  ( +-  0.00% )
 |       168,040,009 cycles:u                 #    0.000 GHz                      ( +-  0.81% )
 |
 |        1.468002368  seconds time elapsed  ( +-  1.33% )
 |

After:

 |
 | Performance counter stats for '/home/mingo/hackbench 20' (5 runs):
 |
 |        71,321,607 instructions:u           #    0.42  insns per cycle          ( +-  0.00% )
 |       168,040,009 cycles:u                 #    0.000 GHz                      ( +-  0.81% )
 |
 |        1.468002368  seconds time elapsed  ( +-  1.33% )
 |

The last column (stddev noise) is properly aligned, vertically.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-7y40wib8n1eqio7hjpn0dsrm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoMerge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Ingo Molnar [Wed, 27 Apr 2011 08:38:30 +0000 (10:38 +0200)]
Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core

Conflicts:
include/linux/perf_event.h

Merge reason: pick up the latest jump-label enhancements, they are cooked ready.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoMerge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/roste...
Ingo Molnar [Wed, 27 Apr 2011 08:31:29 +0000 (10:31 +0200)]
Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent

13 years agoperf stat: Add -d/--detailed flag to run with a lot of events
Ingo Molnar [Wed, 27 Apr 2011 11:50:47 +0000 (13:50 +0200)]
perf stat: Add -d/--detailed flag to run with a lot of events

Add the new -d/--detailed flag, which generates a pretty detailed event list:

 Performance counter stats for './hackbench 10' (10 runs):

       1514.287888 task-clock               #   10.897 CPUs utilized            ( +-  3.05% )
            39,698 context-switches         #    0.026 M/sec                    ( +- 12.19% )
             8,147 CPU-migrations           #    0.005 M/sec                    ( +- 16.55% )
            17,918 page-faults              #    0.012 M/sec                    ( +-  0.37% )
     2,944,504,050 cycles                   #    1.944 GHz                      ( +-  3.89% )  (32.60%)
     1,043,971,283 stalled-cycles           #   35.45% of all cycles are idle   ( +-  5.22% )  (44.48%)
     1,655,906,768 instructions             #    0.56  insns per cycle
                                            #    0.63  stalled cycles per insn  ( +-  1.95% )  (55.09%)
       338,832,373 branches                 #  223.757 M/sec                    ( +-  1.96% )  (64.47%)
         3,892,416 branch-misses            #    1.15% of all branches          ( +-  5.49% )  (73.12%)
       606,410,482 L1-dcache-loads          #  400.459 M/sec                    ( +-  1.29% )  (71.21%)
        31,204,395 L1-dcache-load-misses    #    5.15% of all L1-dcache hits    ( +-  3.04% )  (60.43%)
         3,922,751 LLC-loads                #    2.590 M/sec                    ( +-  6.80% )  (46.87%)
         5,037,288 LLC-load-misses          #    3.327 M/sec                    ( +-  3.56% )  (13.00%)

        0.138966828  seconds time elapsed  ( +-  4.11% )

This can be used "at a glance" for narrower analysis.

-d can also be used in addition to other -e events, to further expand an event list.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-cxs98quixs3qyvdqx3goojc4@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Print out miss/hit ratio for L1 data-cache events
Ingo Molnar [Wed, 27 Apr 2011 11:25:24 +0000 (13:25 +0200)]
perf stat: Print out miss/hit ratio for L1 data-cache events

Print out this kind of l1-dcache-misses percentage:

 Performance counter stats for './bw_tcp localhost':

    29,956,262,201 cycles                   #    3.002 GHz                      (scaled from 85.14%)
     8,255,209,558 stalled-cycles           #   27.56% of all cycles are idle   (scaled from 86.56%)
     1,206,130,308 l1-dcache-misses         #   40.49% of all L1-dcache hits    (scaled from 86.30%)
     2,978,756,779 l1-dcache-refs           #  298.512 M/sec                    (scaled from 70.02%)
     8,861,956,159 instructions             #    0.30  insns per cycle
                                            #    0.93  stalled cycles per insn  (scaled from 84.27%)
     1,644,306,068 branches                 #  164.782 M/sec                    (scaled from 86.43%)
        74,778,443 branch-misses            #    4.55% of all branches          (scaled from 70.69%)
       9978.695711 task-clock               #    0.693 CPUs utilized

       14.404347983  seconds time elapsed

And color the result depending on the severity of cache-trashing.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-54gmz0zymaid84zcs7joq02p@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Print branch misses warning colors
Ingo Molnar [Wed, 27 Apr 2011 10:16:10 +0000 (12:16 +0200)]
perf stat: Print branch misses warning colors

Print the missed-branches percentage with different warning level ASCII colors,
as the percentage passes the 5%/10%/20% thresholds.

These thresholds are set to relatively low levels, because on most CPUs even a
moderate percentage of branch-misses already shows up as a slowdown.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-ybqukg7p86leiup7gl03ecgk@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Print stalled cycles warning colors
Ingo Molnar [Wed, 27 Apr 2011 03:39:24 +0000 (05:39 +0200)]
perf stat: Print stalled cycles warning colors

Print the stalled-cycles percentage with different warning level ASCII colors,
as the percentage passes the 25%/50%/75% thresholds.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-e25zz44rcms7mu9az4fu5zp0@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Fix -nan% output in perf stat noise printouts
Ingo Molnar [Wed, 27 Apr 2011 03:35:39 +0000 (05:35 +0200)]
perf stat: Fix -nan% output in perf stat noise printouts

Before:

                 0 CPU-migrations           #    0.000 M/sec                    ( +-  -nan% )

After:

                 0 CPU-migrations           #    0.000 M/sec                    ( +-  0.00% )

Also factor out the noise printing function.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-z89h2v1bk1mikcbsf7e6v34q@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Add stalled cycles to the default output
Ingo Molnar [Wed, 27 Apr 2011 03:20:22 +0000 (05:20 +0200)]
perf stat: Add stalled cycles to the default output

The new default output looks like this:

 Performance counter stats for './loop_1b_instructions':

        236.010686 task-clock               #    0.996 CPUs utilized
                 0 context-switches         #    0.000 M/sec
                 0 CPU-migrations           #    0.000 M/sec
                99 page-faults              #    0.000 M/sec
       756,487,646 cycles                   #    3.205 GHz
       354,938,996 stalled-cycles           #   46.92% of all cycles are idle
     1,001,403,797 instructions             #    1.32  insns per cycle
                                            #    0.35  stalled cycles per insn
       100,279,773 branches                 #  424.895 M/sec
            12,646 branch-misses            #    0.013 % of all branches

        0.236902540  seconds time elapsed

We dropped cache-refs and cache-misses and added stalled-cycles - this is a
more generic "how well utilized is the CPU" metric.

If the stalled-cycles ratio is too high then more specific measurements can be
taken to figure out the source of the inefficiency.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-pbpl2l4mn797s69bclfpwkwn@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Add stalled cycles accounting, prettify the resulting output
Ingo Molnar [Wed, 27 Apr 2011 02:34:16 +0000 (04:34 +0200)]
perf stat: Add stalled cycles accounting, prettify the resulting output

Add stalled cycles accounting and use it to print the "cycles stalled per
instruction" value.

Also change the unit of the cycles output from M/sec to GHz - this is more
intuitive.

Prettify the output to:

 Performance counter stats for './loop_1b_instructions':

        239.775036 task-clock               #    0.997 CPUs utilized
       761,903,912 cycles                   #    3.178 GHz
       356,620,620 stalled-cycles           #   46.81% of all cycles are idle
     1,001,578,351 instructions             #    1.31  insns per cycle
                                            #    0.36  stalled cycles per insn
            14,782 cache-references         #    0.062 M/sec
             5,694 cache-misses             #   38.520 % of all cache refs

        0.240493656  seconds time elapsed

Also adjust the --repeat output to make the percentages align vertically:

 Performance counter stats for './loop_1b_instructions' (10 runs):

        236.096793 task-clock               #    0.997 CPUs utilized             ( +-   0.011% )
       756,553,086 cycles                   #    3.204 GHz                       ( +-   0.002% )
       354,942,692 stalled-cycles           #   46.92% of all cycles are idle    ( +-   0.008% )
     1,001,389,700 instructions             #    1.32  insns per cycle
                                            #    0.35  stalled cycles per insn   ( +-   0.000% )
            10,166 cache-references         #    0.043 M/sec                     ( +-   0.742% )
               468 cache-misses             #    4.608 % of all cache refs       ( +-  13.385% )

        0.236874136  seconds time elapsed   ( +- 0.01% )

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-uapziqny39601apdmmhoz7hk@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Factor our shadow stats
Ingo Molnar [Wed, 27 Apr 2011 02:36:37 +0000 (04:36 +0200)]
perf stat: Factor our shadow stats

Create update_shadow_stats() which is then used in both read_counter_aggr()
and read_counter().

This not only simplifies the code but also fixes a bug: HW_CACHE_REFERENCES
was not updated in read_counter().

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-9uc55z3g88r47exde7zxjm6p@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Make all displayed event names parseable as well
Ingo Molnar [Wed, 27 Apr 2011 02:24:57 +0000 (04:24 +0200)]
perf stat: Make all displayed event names parseable as well

Right now we display this by default:

          0.202204 task-clock-msecs         #      0.282 CPUs
                 0 context-switches         #      0.000 M/sec
                 0 CPU-migrations           #      0.000 M/sec
                85 page-faults              #      0.420 M/sec

The task-clock-msecs event cannot actually be passed back as an
event name, the event name we recognize is 'task-clock'.

So change the output of the cpu-clock and task-clock events
to be idempotent.

( Units should be printed out in the right-side column, if needed. )

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-lexrnbzy09asscgd4f7oac4i@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Fail more clearly when an invalid modifier is specified
Ingo Molnar [Wed, 27 Apr 2011 02:06:33 +0000 (04:06 +0200)]
perf stat: Fail more clearly when an invalid modifier is specified

Currently we fail without printing any error message on "perf stat -e task-clock-msecs".

The reason is that the task-clock event is matched and the "-msecs" postfix is assumed
to be an event modifier - but is not recognized.

This patch changes the code to be more informative:

 $ perf stat -e task-clock-msecs true
 invalid event modifier: '-msecs'
 Run 'perf list' for a list of valid events and modifiers

And restructures the return value of parse_event_modifier() to allow
the printing of all variants of invalid event modifiers.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-wlaw3dvz1ly6wple8l52cfca@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf tools: Accept case-insensitive symbolic event variants
Ingo Molnar [Wed, 27 Apr 2011 01:55:40 +0000 (03:55 +0200)]
perf tools: Accept case-insensitive symbolic event variants

We currently fail on something like '-e CPU-migrations', with:

  invalid or unsupported event: 'CPU-migrations'

While 'CPU-migrations' is how we actually print out the event
in the default perf stat output:

 Performance counter stats for 'true':

          0.202204 task-clock-msecs         #      0.282 CPUs
                 0 context-switches         #      0.000 M/sec
                 0 CPU-migrations           #      0.000 M/sec

So change the matching to be case-insensitive.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-omcm3edjjtx83a4kh2e244se@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Print cache misses as percentage
Ingo Molnar [Wed, 27 Apr 2011 01:42:18 +0000 (03:42 +0200)]
perf stat: Print cache misses as percentage

Before:

       113,393,041 cache-references         #     83.636 M/sec
         7,052,454 cache-misses             #      5.202 M/sec

After:

       112,589,441 cache-references         #     87.925 M/sec
         6,556,354 cache-misses             #      5.823 %

misses/hits percentages are more expressive than absolute numbers
or rates.

(Also prettify the CPUs printout line to not have a trailing whitespace.)

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-axm28f43x439bl41zkvfzd63@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf stat: Print stalled cycles percentage
Ingo Molnar [Sun, 24 Apr 2011 13:05:10 +0000 (15:05 +0200)]
perf stat: Print stalled cycles percentage

Print:

           611,527 cycles
           400,553 instructions             # (  0.71 instructions per cycle )
            77,809 stalled-cycles           # ( 12.71% of all cycles )

        0.000610987  seconds time elapsed

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/n/tip-fd6x8r1cpyb6zhlrc4ix8m45@git.kernel.org
13 years agoperf events, x86: Mark constrant tables read mostly
Ingo Molnar [Wed, 27 Apr 2011 10:02:04 +0000 (12:02 +0200)]
perf events, x86: Mark constrant tables read mostly

Various constraint tables were not marked read-mostly.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-wpqwwvmhxucy5e718wnamjiv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf events: Add stalled cycles generic event - PERF_COUNT_HW_STALLED_CYCLES
Ingo Molnar [Sun, 24 Apr 2011 06:18:31 +0000 (08:18 +0200)]
perf events: Add stalled cycles generic event - PERF_COUNT_HW_STALLED_CYCLES

The new PERF_COUNT_HW_STALLED_CYCLES event tries to approximate
cycles the CPU does nothing useful, because it is stalled on a
cache-miss or some other condition.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-fue11vymwqsoo5to72jxxjyl@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoMerge branch 'perf/urgent' into perf/stat
Ingo Molnar [Tue, 26 Apr 2011 17:36:14 +0000 (19:36 +0200)]
Merge branch 'perf/urgent' into perf/stat

Merge reason: We want to queue up dependent changes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf events, x86: Work around the Nehalem AAJ80 erratum
Ingo Molnar [Wed, 27 Apr 2011 09:51:41 +0000 (11:51 +0200)]
perf events, x86: Work around the Nehalem AAJ80 erratum

On Nehalem CPUs the retired branch-misses event can be completely bogus,
when there are no branch-misses occuring. When there are a lot of branch
misses then the count is pretty accurate. Still, this leaves us with an
event that over-counts a lot.

Detect this erratum and work it around by using BR_MISP_EXEC.ANY events.
These will also count speculated branches but still it's a lot more
precise in practice than the architectural event.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-yyfg0bxo9jsqxd6a0ovfny27@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf, x86: Fix BTS condition
Peter Zijlstra [Tue, 26 Apr 2011 11:24:33 +0000 (13:24 +0200)]
perf, x86: Fix BTS condition

Currently the x86 backend incorrectly assumes that any BRANCH_INSN
with sample_period==1 is a BTS request. This is not true when we do
frequency driven profiling such as 'perf record -e branches'.

Solves this error:

  $ perf record -e branches ./array
  Error: sys_perf_event_open() syscall returned with 95 (Operation not supported).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: "Metzger, Markus T" <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-rd2y4ct71hjawzz6fpvsy9hg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs...
Linus Torvalds [Tue, 26 Apr 2011 02:01:12 +0000 (19:01 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  eCryptfs: Flush dirty pages in setattr
  eCryptfs: Handle failed metadata read in lookup
  eCryptfs: Add reference counting to lower files
  eCryptfs: dput dentries returned from dget_parent
  eCryptfs: Remove extra d_delete in ecryptfs_rmdir

13 years agoMerge branch 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Tue, 26 Apr 2011 02:00:55 +0000 (19:00 -0700)]
Merge branch 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson

* 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson:
  rtc: fix coh901331 startup crash
  mach-ux500: fix i2c0 device setup regression

13 years agoSELINUX: Make selinux cache VFS RCU walks safe
Eric Paris [Mon, 25 Apr 2011 20:26:29 +0000 (16:26 -0400)]
SELINUX: Make selinux cache VFS RCU walks safe

Now that the security modules can decide whether they support the
dcache RCU walk or not it's possible to make selinux a bit more
RCU friendly.  The SELinux AVC and security server access decision
code is RCU safe.  A specific piece of the LSM audit code may not
be RCU safe.

This patch makes the VFS RCU walk retry if it would hit the non RCU
safe chunk of code.  It will normally just work under RCU.  This is
done simply by passing the VFS RCU state as a flag down into the
avc_audit() code and returning ECHILD there if it would have an issue.

Based-on-patch-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoadd hlist_bl_lock/unlock helpers
Christoph Hellwig [Mon, 25 Apr 2011 18:01:36 +0000 (14:01 -0400)]
add hlist_bl_lock/unlock helpers

Now that the whole dcache_hash_bucket crap is gone, go all the way and
also remove the weird locking layering violations for locking the hash
buckets.  Add hlist_bl_lock/unlock helpers to move the locking into the
list abstraction instead of requiring each caller to open code it.
After all allowing for the bit locks is the whole point of these helpers
over the plain hlist variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agobit_spinlock: don't play preemption games inside the busy loop
Linus Torvalds [Tue, 26 Apr 2011 01:10:58 +0000 (18:10 -0700)]
bit_spinlock: don't play preemption games inside the busy loop

When we are waiting for the bit-lock to be released, and are looping
over the 'cpu_relax()' should not be doing anything else - otherwise we
miss the point of trying to do the whole 'cpu_relax()'.

Do the preemption enable/disable around the loop, rather than inside of
it.

Noticed when I was looking at the code generation for the dcache
__d_drop usage, and the code just looked very odd.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoeCryptfs: Flush dirty pages in setattr
Tyler Hicks [Fri, 22 Apr 2011 18:08:00 +0000 (13:08 -0500)]
eCryptfs: Flush dirty pages in setattr

After 57db4e8d73ef2b5e94a3f412108dff2576670a8a changed eCryptfs to
write-back caching, eCryptfs page writeback updates the lower inode
times due to the use of vfs_write() on the lower file.

To preserve inode metadata changes, such as 'cp -p' does with
utimensat(), we need to flush all dirty pages early in
ecryptfs_setattr() so that the user-updated lower inode metadata isn't
clobbered later in writeback.

https://bugzilla.kernel.org/show_bug.cgi?id=33372

Reported-by: Rocko <rockorequin@hotmail.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
13 years agoeCryptfs: Handle failed metadata read in lookup
Tyler Hicks [Tue, 15 Mar 2011 19:54:00 +0000 (14:54 -0500)]
eCryptfs: Handle failed metadata read in lookup

When failing to read the lower file's crypto metadata during a lookup,
eCryptfs must continue on without throwing an error. For example, there
may be a plaintext file in the lower mount point that the user wants to
delete through the eCryptfs mount.

If an error is encountered while reading the metadata in lookup(), the
eCryptfs inode's size could be incorrect. We must be sure to reread the
plaintext inode size from the metadata when performing an open() or
setattr(). The metadata is already being read in those paths, so this
adds minimal performance overhead.

This patch introduces a flag which will track whether or not the
plaintext inode size has been read so that an incorrect i_size can be
fixed in the open() or setattr() paths.

https://bugs.launchpad.net/bugs/509180

Cc: <stable@kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
13 years agoeCryptfs: Add reference counting to lower files
Tyler Hicks [Thu, 14 Apr 2011 20:35:11 +0000 (15:35 -0500)]
eCryptfs: Add reference counting to lower files

For any given lower inode, eCryptfs keeps only one lower file open and
multiplexes all eCryptfs file operations through that lower file. The
lower file was considered "persistent" and stayed open from the first
lookup through the lifetime of the inode.

This patch keeps the notion of a single, per-inode lower file, but adds
reference counting around the lower file so that it is closed when not
currently in use. If the reference count is at 0 when an operation (such
as open, create, etc.) needs to use the lower file, a new lower file is
opened. Since the file is no longer persistent, all references to the
term persistent file are changed to lower file.

Locking is added around the sections of code that opens the lower file
and assign the pointer in the inode info, as well as the code the fputs
the lower file when all eCryptfs users are done with it.

This patch is needed to fix issues, when mounted on top of the NFSv3
client, where the lower file is left silly renamed until the eCryptfs
inode is destroyed.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
13 years agoeCryptfs: dput dentries returned from dget_parent
Tyler Hicks [Tue, 12 Apr 2011 16:23:09 +0000 (11:23 -0500)]
eCryptfs: dput dentries returned from dget_parent

Call dput on the dentries previously returned by dget_parent() in
ecryptfs_rename(). This is needed for supported eCryptfs mounts on top
of the NFSv3 client.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
13 years agoeCryptfs: Remove extra d_delete in ecryptfs_rmdir
Tyler Hicks [Tue, 12 Apr 2011 16:21:36 +0000 (11:21 -0500)]
eCryptfs: Remove extra d_delete in ecryptfs_rmdir

vfs_rmdir() already calls d_delete() on the lower dentry. That was being
duplicated in ecryptfs_rmdir() and caused a NULL pointer dereference
when NFSv3 was the lower filesystem.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
13 years agoMerge branch 'dcache-cleanup'
Linus Torvalds [Sun, 24 Apr 2011 15:51:15 +0000 (08:51 -0700)]
Merge branch 'dcache-cleanup'

* dcache-cleanup:
  vfs: get rid of insane dentry hashing rules

13 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzi...
Linus Torvalds [Sun, 24 Apr 2011 15:45:37 +0000 (08:45 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: ahci_start_engine compliant to AHCI spec
  ata: pata_at91.c bugfix for initial_timing initialisation
  ata: pata_at91.c bugfix for high master clock
  ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs
  ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs
  libata: Pioneer DVR-216D can't do SETXFER
  ahci: don't enable port irq before handler is registered
  libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
  libata: Kill unused ATA_DFLAG_{H|D}IPM flags
  ahci: EM supported message type sysfs attribute

13 years agoMerge branch 'for-linus' of git://git.infradead.org/ubifs-2.6
Linus Torvalds [Sun, 24 Apr 2011 15:42:15 +0000 (08:42 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6

* 'for-linus' of git://git.infradead.org/ubifs-2.6:
  UBIFS: fix master node recovery
  UBIFS: fix false assertion warning in case of I/O failures
  UBIFS: fix false space checking failure

13 years agolibata: ahci_start_engine compliant to AHCI spec
Jian Peng [Sat, 23 Apr 2011 06:58:10 +0000 (23:58 -0700)]
libata: ahci_start_engine compliant to AHCI spec

At the end of section 10.1 of AHCI spec (rev 1.3), it states

Software shall not set PxCMD.ST to 1 until it is determined that
a functoinal device is present on the port as determined by
PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h

Even though most AHCI host controller works without this check,
specific controller will fail under this condition.

Signed-off-by: Jian Peng <jipeng2005@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoata: pata_at91.c bugfix for initial_timing initialisation
Igor Plyatov [Mon, 28 Mar 2011 12:56:14 +0000 (16:56 +0400)]
ata: pata_at91.c bugfix for initial_timing initialisation

The "struct ata_timing" must contain 10 members, but ".dmack_hold" member was
forgotten for "initial_timing" initialisation. This patch fixes such a problem.

Signed-off-by: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoata: pata_at91.c bugfix for high master clock
Igor Plyatov [Mon, 28 Mar 2011 12:56:15 +0000 (16:56 +0400)]
ata: pata_at91.c bugfix for high master clock

The AT91SAM9 microcontrollers with master clock higher then 105 MHz
and PIO0, have overflow of the NCS_RD_PULSE value in the MSB. This
lead to "NCS_RD_PULSE" pulse longer then "NRD_CYCLE" pulse and driver
does not detect ATA device.

Signed-off-by: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs
Seth Heasley [Wed, 20 Apr 2011 15:45:20 +0000 (08:45 -0700)]
ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs

The previously submitted patch was word-wrapped.

This patch adds the AHCI-mode SATA DeviceIDs for the Intel Panther Point PCH.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs
Seth Heasley [Wed, 20 Apr 2011 15:43:37 +0000 (08:43 -0700)]
ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs

The previously submitted patch was word-wrapped.

This patch adds the IDE-mode SATA DeviceIDs for the Intel Panther
Point PCH.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agolibata: Pioneer DVR-216D can't do SETXFER
Jeff Mahoney [Tue, 19 Apr 2011 15:13:32 +0000 (11:13 -0400)]
libata: Pioneer DVR-216D can't do SETXFER

 Commit 4a5610a04d415ed94af75bb1159d2621d62c8328 fixed an issue with
 the Pioneer DVR-212D not handling SETXFER correctly. An openSUSE user
 reported a similar issue with his DVR-216D that the NOSETXFER horkage
 worked around for him as well.

 This patch adds the DVR-216D (1.08) to the horkage list for NOSETXFER.

 The issue was reported at:
 https://bugzilla.novell.com/show_bug.cgi?id=679143

Reported-by: Volodymyr Kyrychenko <vladimir.kirichenko@gmail.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoahci: don't enable port irq before handler is registered
Maxime Bizon [Wed, 16 Mar 2011 13:58:32 +0000 (14:58 +0100)]
ahci: don't enable port irq before handler is registered

The ahci_pmp_attach() & ahci_pmp_detach() unmask port irqs, but they
are also called during port initialization, before ahci host irq
handler is registered. On ce4100 platform, this sometimes triggers
"irq 4: nobody cared" message when loading driver.

Fixed this by not touching the register if the port is in frozen
state, and mark all uninitialized port as frozen.

Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agolibata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
Tejun Heo [Wed, 16 Mar 2011 10:14:55 +0000 (11:14 +0100)]
libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65

NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM
is used.  Implement ATA_FLAG_NO_DIPM and apply it.

This problem was reported by Stefan Bader in the following thread.

 http://thread.gmane.org/gmane.linux.ide/48841

stable: applicable to 2.6.37 and 38.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agolibata: Kill unused ATA_DFLAG_{H|D}IPM flags
Tejun Heo [Wed, 16 Mar 2011 10:14:25 +0000 (11:14 +0100)]
libata: Kill unused ATA_DFLAG_{H|D}IPM flags

ATA_DFLAG_{H|D}IPM flags are no longer used.  Kill them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agoahci: EM supported message type sysfs attribute
Hannes Reinecke [Fri, 4 Mar 2011 08:54:52 +0000 (09:54 +0100)]
ahci: EM supported message type sysfs attribute

This patch adds an sysfs attribute 'em_message_supported' to the
ahci host device which prints out the supported enclosure management
message types.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
13 years agokconfig: Avoid buffer underrun in choice input
Ben Hutchings [Sat, 23 Apr 2011 17:42:56 +0000 (18:42 +0100)]
kconfig: Avoid buffer underrun in choice input

Commit 40aee729b350 ('kconfig: fix default value for choice input')
fixed some cases where kconfig would select the wrong option from a
choice with a single valid option and thus enter an infinite loop.

However, this broke the test for user input of the form 'N?', because
when kconfig selects the single valid option the input is zero-length
and the test will read the byte before the input buffer.  If this
happens to contain '?' (as it will in a mips build on Debian unstable
today) then kconfig again enters an infinite loop.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@kernel.org [2.6.17+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agovfs: get rid of insane dentry hashing rules
Linus Torvalds [Sun, 24 Apr 2011 14:58:46 +0000 (07:58 -0700)]
vfs: get rid of insane dentry hashing rules

The dentry hashing rules have been really quite complicated for a long
while, in odd ways.  That made functions like __d_drop() very fragile
and non-obvious.

In particular, whether a dentry was hashed or not was indicated with an
explicit DCACHE_UNHASHED bit.  That's despite the fact that the hash
abstraction that the dentries use actually have a 'is this entry hashed
or not' model (which is a simple test of the 'pprev' pointer).

The reason that was done is because we used the normal 'is this entry
unhashed' model to mark whether the dentry had _ever_ been hashed in the
dentry hash tables, and that logic goes back many years (commit
b3423415fbc2: "dcache: avoid RCU for never-hashed dentries").

That, in turn, meant that __d_drop had totally different unhashing logic
for the dentry hash table case and for the anonymous dcache case,
because in order to use the "is this dentry hashed" logic as a flag for
whether it had ever been on the RCU hash table, we had to unhash such a
dentry differently so that we'd never think that it wasn't 'unhashed'
and wouldn't be free'd correctly.

That's just insane.  It made the logic really hard to follow, when there
were two different kinds of "unhashed" states, and one of them (the one
that used "list_bl_unhashed()") really had nothing at all to do with
being unhashed per se, but with a very subtle lifetime rule instead.

So turn all of it around, and make it logical.

Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether
the dentry is on the hash chains or not, use the hash chain unhashed
logic for that.  Suddenly "d_unhashed()" just uses "list_bl_unhashed()",
and everything makes sense.

And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit.
If we ever insert the dentry into the dentry hash table so that it is
visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now
needs the RCU lifetime rules.  Now suddently that test at dentry free
time makes sense too.

And because unhashing now is sane and doesn't depend on where the dentry
got unhashed from (because the dentry hash chain details doesn't have
some subtle side effects), we can re-unify the __d_drop() logic and use
common code for the unhashing.

Also fix one more open-coded hash chain bit_spin_lock() that I missed in
the previous chain locking cleanup commit.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoperf events, x86, P4: Fix typo in comment
Justin P. Mattock [Fri, 22 Apr 2011 17:08:52 +0000 (10:08 -0700)]
perf events, x86, P4: Fix typo in comment

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1303492132-3004-1-git-send-email-justinmattock@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoMerge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspe...
Linus Torvalds [Sun, 24 Apr 2011 05:35:16 +0000 (22:35 -0700)]
Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Add missing syscore_suspend() and syscore_resume() calls
  PM: Fix error code paths executed after failing syscore_suspend()

13 years agovfs: get rid of 'struct dcache_hash_bucket' abstraction
Linus Torvalds [Sun, 24 Apr 2011 05:32:03 +0000 (22:32 -0700)]
vfs: get rid of 'struct dcache_hash_bucket' abstraction

It's a useless abstraction for 'hlist_bl_head', and it doesn't actually
help anything - quite the reverse.  All the users end up having to know
about the hlist_bl_head details anyway, using 'struct hlist_bl_node *'
etc. So it just makes the code look confusing.

And the cost of it is extra '&b->head' syntactic noise, but more
importantly it spuriously makes the hash table dentry list look
different from the per-superblock DCACHE_DISCONNECTED dentry list.

As a result, the code ended up using ad-hoc locking for one case and
special helper functions for what is really another totally identical
case in the very same function.

Make it all look and work the same.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Fri, 22 Apr 2011 23:19:19 +0000 (16:19 -0700)]
Merge branch 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6

* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
  tty/n_gsm: fix bug in CRC calculation for gsm1 mode
  serial/imx: read cts state only after acking cts change irq
  parport_pc.c: correctly release the requested region for the IT887x

13 years agoSECURITY: Move exec_permission RCU checks into security modules
Andi Kleen [Fri, 22 Apr 2011 00:23:19 +0000 (17:23 -0700)]
SECURITY: Move exec_permission RCU checks into security modules

Right now all RCU walks fall back to reference walk when CONFIG_SECURITY
is enabled, even though just the standard capability module is active.
This is because security_inode_exec_permission unconditionally fails
RCU walks.

Move this decision to the low level security module. This requires
passing the RCU flags down the security hook. This way at least
the capability module and a few easy cases in selinux/smack work
with RCU walks with CONFIG_SECURITY=y

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Fri, 22 Apr 2011 21:59:07 +0000 (14:59 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Fix unused warnings when !SND_HDA_NEEDS_RESUME
  ALSA: hda - Add a fix-up for Acer dmic with ALC271x codec
  ASoC: add a module alias to the FSI driver
  ALSA: emu10k1 - Fix "Music" controls to "Synth" controls in documents
  ARM: s3c2440: gta02; Register dfbmcs320 device for BT audio interface
  ASoC: codecs: JZ4740: Fix OOPS
  ASoC: Fix output PGA enabling in wm_hubs CODECs
  ASoC: sn95031: decorate function with __devexit_p()
  ASoC: SAMSUNG: Fix the inverted clocks handling for pcm driver
  ASoC: sst_platform: Fix lock acquring
  ASoC: fsi: driver safely remove for against irq
  ASoC: fsi: modify vague PM control on probe
  ASoC: fsi: take care in failing case of dai register
  MAINTAINERS: Update Samsung ASoC maintainer's id
  ASoC: WM8903: HP and Line out PGA/mixer DAPM fixes
  ASoC: Set left channel volume update bits for WM8994
  ASoC: fix config error path
  ASoC: check channel mismatch between cpu_dai and codec_dai
  ASoC: Tegra: Suspend/resume support

13 years agoMerge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 22 Apr 2011 18:31:27 +0000 (11:31 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Update/fix Intel Nehalem cache events
  perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
  x86, perf event: Turn off unstructured raw event access to offcore registers
  perf: Support Xeon E7's via the Westmere PMU driver

13 years agoMerge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 22 Apr 2011 18:31:21 +0000 (11:31 -0700)]
Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  xtensa: Fixup irq conversion fallout and nmi_count

13 years agoperf, x86: Update/fix Intel Nehalem cache events
Peter Zijlstra [Fri, 22 Apr 2011 11:39:56 +0000 (13:39 +0200)]
perf, x86: Update/fix Intel Nehalem cache events

Change the Nehalem cache events to use retired memory instruction counters
(similar to Westmere), this greatly improves the provided stats.

Using:

main ()
{
        int i;

        for (i = 0; i < 1000000000; i++) {
                asm("mov (%%rsp), %%rbx;"
                    "mov %%rbx, (%%rsp);" : : : "rbx");
        }
}

We find:

 $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores
  Performance counter stats for './loop_1b_loads+stores' (10 runs):
      4,000,081,056 instructions:u           #      0.000 IPC ( +-   0.000% )
      4,999,502,846 l1-dcache-loads:u          ( +-   0.008% )
      1,000,034,832 l1-dcache-stores:u         ( +-   0.000% )
         1.565184942  seconds time elapsed   ( +-   0.005% )

The 5b is surprising - we'd expect 1b:

 $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores
  Performance counter stats for './loop_1b_loads+stores' (10 runs):
      4,000,081,054 instructions:u           #      0.000 IPC ( +-   0.000% )
      1,000,021,961 r10b:u                     ( +-   0.000% )
      1,000,030,951 l1-dcache-stores:u         ( +-   0.000% )
         1.565055422  seconds time elapsed   ( +-   0.003% )

Which this patch thus fixes.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
Cyrill Gorcunov [Thu, 21 Apr 2011 15:03:21 +0000 (11:03 -0400)]
perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow

It's not enough to simply disable event on overflow the
cpuc->active_mask should be cleared as well otherwise counter
may stall in "active" even in real being already disabled (which
potentially may lead to the situation that user may not use this
counter further).

Don pointed out that:

 " I also noticed this patch fixed some unknown NMIs
   on a P4 when I stressed the box".

Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf, x86: P4 PMU -- Use perf_sample_data_init helper
Cyrill Gorcunov [Thu, 21 Apr 2011 15:03:20 +0000 (11:03 -0400)]
perf, x86: P4 PMU -- Use perf_sample_data_init helper

Instead of opencoded assignments better to use
perf_sample_data_init helper.

Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303398203-2918-2-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoMerge branch 'linus' into perf/core
Ingo Molnar [Fri, 22 Apr 2011 08:19:26 +0000 (10:19 +0200)]
Merge branch 'linus' into perf/core

Merge reason: Pick up upstream fixes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agox86, perf event: Turn off unstructured raw event access to offcore registers
Ingo Molnar [Fri, 22 Apr 2011 06:44:38 +0000 (08:44 +0200)]
x86, perf event: Turn off unstructured raw event access to offcore registers

Andi Kleen pointed out that the Intel offcore support patches were merged
without user-space tool support to the functionality:

 |
 | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
 | user space bits were not. This made it impossible to set the extra mask
 | and actually do the OFFCORE profiling
 |

Andi submitted a preliminary patch for user-space support, as an
extension to perf's raw event syntax:

 |
 | Some raw events -- like the Intel OFFCORE events -- support additional
 | parameters. These can be appended after a ':'.
 |
 | For example on a multi socket Intel Nehalem:
 |
 |    perf stat -e r1b7:20ff -a sleep 1
 |
 | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
 | that measures any access to DRAM on another socket.
 |

But this kind of usability is absolutely unacceptable - users should not
be expected to type in magic, CPU and model specific incantations to get
access to useful hardware functionality.

The proper solution is to expose useful offcore functionality via
generalized events - that way users do not have to care which specific
CPU model they are using, they can use the conceptual event and not some
model specific quirky hexa number.

We already have such generalization in place for CPU cache events,
and it's all very extensible.

"Offcore" events measure general DRAM access patters along various
parameters. They are particularly useful in NUMA systems.

We want to support them via generalized DRAM events: either as the
fourth level of cache (after the last-level cache), or as a separate
generalization category.

That way user-space support would be very obvious, memory access
profiling could be done via self-explanatory commands like:

  perf record -e dram ./myapp
  perf record -e dram-remote ./myapp

... to measure DRAM accesses or more expensive cross-node NUMA DRAM
accesses.

These generalized events would work on all CPUs and architectures that
have comparable PMU features.

( Note, these are just examples: actual implementation could have more
  sophistication and more parameter - as long as they center around
  similarly simple usecases. )

Now we do not want to revert *all* of the current offcore bits, as they
are still somewhat useful for generic last-level-cache events, implemented
in this commit:

  e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere

But we definitely do not yet want to expose the unstructured raw events
to user-space, until better generalization and usability is implemented
for these hardware event features.

( Note: after generalization has been implemented raw offcore events can be
  supported as well: there can always be an odd event that is marginally
  useful but not useful enough to generalize. DRAM profiling is definitely
  *not* such a category so generalization must be done first. )

Furthermore, PERF_TYPE_RAW access to these registers was not intended
to go upstream without proper support - it was a side-effect of the above
e994d7d23a0b commit, not mentioned in the changelog.

As v2.6.39 is nearing release we go for the simplest approach: disable
the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
kernel and becomes an ABI.

Once proper structure is implemented for these hardware events and users
are offered usable solutions we can revisit this issue.

Reported-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoperf: Support Xeon E7's via the Westmere PMU driver
Andi Kleen [Thu, 21 Apr 2011 23:48:35 +0000 (16:48 -0700)]
perf: Support Xeon E7's via the Westmere PMU driver

There's a new model number public, 47, for Xeon E7 (aka Westmere EX).

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: a.p.zijlstra@chello.nl
Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
13 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
Linus Torvalds [Thu, 21 Apr 2011 17:50:56 +0000 (10:50 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
  block: don't propagate unlisted DISK_EVENTs to userland
  elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too

13 years agoide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
Tejun Heo [Thu, 21 Apr 2011 17:43:59 +0000 (19:43 +0200)]
ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd

check_events() implementations in both ide-gd and ide-cd are
inadequate for in-kernel event polling.  Both generate media change
events continuously when certain conditions are met causing infinite
event loop between the driver and userland event handler.

As disk event now supports suppression of unlisted events, simply
de-listing DISK_EVENT_MEDIA_CHANGE from disk->events resolves the
problem.  Internal handling around media revalidation will behave the
same while userland will fall back to userland event polling after
detecting the device doesn't support disk events.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoblock: don't propagate unlisted DISK_EVENTs to userland
Tejun Heo [Thu, 21 Apr 2011 17:43:58 +0000 (19:43 +0200)]
block: don't propagate unlisted DISK_EVENTs to userland

DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
internal event for revalidation of removeable devices.  Some legacy
drivers don't implement proper event detection and continuously
generate events under certain circumstances.  For example, ide-cd
generates media changed continuously if there's no media in the drive,
which can lead to infinite loop of events jumping back and forth
between the driver and userland event handler.

This patch updates disk event infrastructure such that it never
propagates events not listed in disk->events to userland.  Those
events are processed the same for internal purposes but uevent
generation is suppressed.

This also ensures that userland only gets events which are advertised
in the @events sysfs node lowering risk of confusion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoelevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too
Jens Axboe [Thu, 21 Apr 2011 17:28:35 +0000 (19:28 +0200)]
elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too

The sort insert is the one that goes to the IO scheduler. With
the SORT_MERGE addition, we could bypass IO scheduler setup
but still ask the IO scheduler to insert the request. This would
cause an oops on switching IO schedulers through the sysfs
interface, unless the disk just happened to be idle while it
occured.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoMerge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
Linus Torvalds [Thu, 21 Apr 2011 17:01:26 +0000 (10:01 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: fix duplicate message output

13 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 21 Apr 2011 17:01:03 +0000 (10:01 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS
  Revert "x86, NUMA: Fix fakenuma boot failure"

13 years agoraid5: fix build error, sector_t usage
Randy Dunlap [Thu, 21 Apr 2011 16:07:26 +0000 (09:07 -0700)]
raid5: fix build error, sector_t usage

Change <sectors> from unsigned long long to sector_t.
This matches its source field.

  ERROR: "__udivdi3" [drivers/md/raid456.ko] undefined!

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
Linus Torvalds [Thu, 21 Apr 2011 16:58:42 +0000 (09:58 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  virtio: console: Enable call to hvc_remove() on console port remove
  virtio_pci: Prevent double-free of pci regions after device hot-unplug
  virtio: Decrement avail idx on buffer detach

13 years agoMerge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Thu, 21 Apr 2011 16:57:56 +0000 (09:57 -0700)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  agp: fix arbitrary kernel memory writes
  agp: fix OOM and buffer overflow
  drm/radeon/kms: fix IH writeback on r6xx+ on big endian machines

13 years agoMerge branch 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keith...
Linus Torvalds [Thu, 21 Apr 2011 16:57:13 +0000 (09:57 -0700)]
Merge branch 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6

* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6:
  drm/i915: Initialise g4x watermarks for disabled pipes
  drm/i915: Sanitize the output registers after resume
  drm/i915/tv: Fix modeset flickering introduced in 7f58aabc3
  drm/i915/tv: Only poll for TV connections
  drm/i915/tv: Remember the detected TV type