Home » Mailing lists » Devel » [PATCH 0/7] Profiling sleep times (v3)
[PATCH 0/7] Profiling sleep times (v3) [message #44255] |
Mon, 28 November 2011 09:03 |
Andrew Vagin
Messages: 28 Registered: November 2011
|
Junior Member |
|
|
Do you want to know where your code waits locks for a long time?
Yes! It's for you. This feature helps you to find bottlenecks.
It's not artificial task. Once one of my colleague was investigating a
scalability problem. He pressed sysrq-t some times and tried to merge
call-chains by hand. But perf can do that.
Problem:
The problem is that events sched_stat_* contain call-chains of
non-target tasks.
About month ago I sent series of patches:
[PATCH 0/3] trace: add ability to collect call chains of non current task.
Peter and Frederic explained me, that this solve isn't good and will be
better to make it in userspace.
Now it's in userspace. This series expands "perf inject" to be able to
merge sched_switch events and sched_stat* events. sched_switch events
contain correct call-chains and sched_stat contains a correct time
slices.
v2:
* Removed all known issues. Now it works completely.
* Improved usability of sched-stat scripts according with Arun's comments.
v3: fixed accoding to comments from David Ahem
Andrew Vagin (7):
perf: use event_name() to get an event name
perf: add ability to record event period
perf: add ability to change event according to sample (v2)
perf: teach "perf inject" to work with files
perf: teach perf inject to merge sched_stat_* and sched_switch events
perf: add scripts for profiling sleep times (v2)
event: add tracepoint for accounting block time
include/trace/events/sched.h | 7 ++
kernel/sched_fair.c | 2 +
tools/perf/builtin-inject.c | 120 ++++++++++++++++++++++-
tools/perf/builtin-record.c | 5 +
tools/perf/scripts/python/bin/sched-stat-record | 65 ++++++++++++
tools/perf/scripts/python/bin/sched-stat-report | 5 +
tools/perf/util/event.h | 2 +
tools/perf/util/evsel.c | 74 ++++++++++++++
tools/perf/util/header.c | 2 +-
tools/perf/util/session.h | 9 ++
10 files changed, 288 insertions(+), 3 deletions(-)
create mode 100644 tools/perf/scripts/python/bin/sched-stat-record
create mode 100644 tools/perf/scripts/python/bin/sched-stat-report
|
|
|
[PATCH 3/7] perf: add ability to change event according to sample (v2) [message #44256 is a reply to message #44255] |
Mon, 28 November 2011 09:03 |
Andrew Vagin
Messages: 28 Registered: November 2011
|
Junior Member |
|
|
It's opposition of perf_session__parse_sample.
v2: fixed mistakes which David Arhen found
Signed-off-by: Andrew Vagin <avagin@openvz.org>
---
tools/perf/util/event.h | 2 +
tools/perf/util/evsel.c | 74 +++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/session.h | 9 +++++
3 files changed, 85 insertions(+), 0 deletions(-)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 357a85b..0493101 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -187,5 +187,7 @@ const char *perf_event__name(unsigned int id);
int perf_event__parse_sample(const union perf_event *event, u64 type,
int sample_size, bool sample_id_all,
struct perf_sample *sample, bool swapped);
+int perf_event__change_sample(union perf_event *event, u64 type,
+ const struct perf_sample *data, bool swapped);
#endif /* __PERF_RECORD_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index e426264..d697568 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -494,3 +494,77 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
return 0;
}
+
+int perf_event__change_sample(union perf_event *event, u64 type,
+ const struct perf_sample *data, bool swapped)
+{
+ u64 *array;
+
+ /*
+ * used for cross-endian analysis. See git commit 65014ab3
+ * for why this goofiness is needed.
+ */
+ union {
+ u64 val64;
+ u32 val32[2];
+ } u;
+
+ array = event->sample.array;
+
+ if (type & PERF_SAMPLE_IP) {
+ event->ip.ip = data->ip;
+ array++;
+ }
+
+ if (type & PERF_SAMPLE_TID) {
+ u.val32[0] = data->pid;
+ u.val32[1] = data->tid;
+ if (swapped) {
+ /* undo swap of u64, then swap on individual u32s */
+ u.val32[0] = bswap_32(u.val32[0]);
+ u.val32[1] = bswap_32(u.val32[1]);
+ u.val64 = bswap_64(u.val64);
+ }
+
+ *array = u.val64;
+ array++;
+ }
+
+ if (type & PERF_SAMPLE_TIME) {
+ *array = data->time;
+ array++;
+ }
+
+ if (type & PERF_SAMPLE_ADDR) {
+ *array = data->addr;
+ array++;
+ }
+
+ if (type & PERF_SAMPLE_ID) {
+ *array = data->id;
+ array++;
+ }
+
+ if (type & PERF_SAMPLE_STREAM_ID) {
+ *array = data->stream_id;
+ array++;
+ }
+
+ if (type & PERF_SAMPLE_CPU) {
+ u.val32[0] = data->cpu;
+ if (swapped) {
+ /* undo swap of u64, then swap on individual u32s */
+ u.val32[0] = bswap_32(u.val32[0]);
+ u.val64 = bswap_64(u.val64);
+ }
+ *array = u.val64;
+ array++;
+ }
+
+ if (type & PERF_SAMPLE_PERIOD) {
+ *array = data->period;
+ array++;
+ }
+
+ return 0;
+}
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 6e393c9..444f121 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -167,6 +167,15 @@ static inline int perf_session__parse_sample(struct perf_session *session,
session->header.needs_swap);
}
+static inline int perf_session__change_sample(struct perf_session *session,
+ union perf_event *event,
+ const struct perf_sample *sample)
+{
+ return perf_event__change_sample(event, session->sample_type,
+ sample,
+ session->header.needs_swap);
+}
+
struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
unsigned int type);
--
1.7.1
|
|
|
[PATCH 2/7] perf: add ability to record event period [message #44257 is a reply to message #44255] |
Mon, 28 November 2011 09:03 |
Andrew Vagin
Messages: 28 Registered: November 2011
|
Junior Member |
|
|
Signed-off-by: Andrew Vagin <avagin@openvz.org>
---
tools/perf/builtin-record.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6ab58cc..e3b7fc4 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -62,6 +62,7 @@ static bool no_samples = false;
static bool sample_address = false;
static bool sample_time = false;
static bool no_buildid = false;
+static bool period = false;
static bool no_buildid_cache = false;
static struct perf_evlist *evsel_list;
@@ -185,6 +186,9 @@ static void config_attr(struct perf_evsel *evsel, struct perf_evlist *evlist)
if (evlist->nr_entries > 1)
attr->sample_type |= PERF_SAMPLE_ID;
+ if (period)
+ attr->sample_type |= PERF_SAMPLE_PERIOD;
+
/*
* We default some events to a 1 default interval. But keep
* it a weak assumption overridable by the user.
@@ -803,6 +807,7 @@ const struct option record_options[] = {
OPT_BOOLEAN('d', "data", &sample_address,
"Sample addresses"),
OPT_BOOLEAN('T', "timestamp", &sample_time, "Sample timestamps"),
+ OPT_BOOLEAN('P', "period", &period, "Sample period"),
OPT_BOOLEAN('n', "no-samples", &no_samples,
"don't sample"),
OPT_BOOLEAN('N', "no-buildid-cache", &no_buildid_cache,
--
1.7.1
|
|
|
[PATCH 5/7] perf: teach perf inject to merge sched_stat_* and sched_switch events [message #44258 is a reply to message #44255] |
Mon, 28 November 2011 09:03 |
Andrew Vagin
Messages: 28 Registered: November 2011
|
Junior Member |
|
|
You may want to know where and how long a task is sleeping. A callchain
may be found in sched_switch and a time slice in stat_iowait, so I add
handler in perf inject for merging this events.
My code saves sched_switch event for each process and when it meets
stat_iowait, it reports the sched_switch event, because this event
contains a correct callchain. By another words it replaces all
stat_iowait events on proper sched_switch events.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
---
tools/perf/builtin-inject.c | 87 +++++++++++++++++++++++++++++++++++++++++++
1 files changed, 87 insertions(+), 0 deletions(-)
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 8df8b71..d0af19d 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -12,6 +12,8 @@
#include "util/debug.h"
#include "util/parse-options.h"
+#include "util/trace-event.h"
+
static char const *input_name = "-";
static const char *output_name = "-";
@@ -20,6 +22,7 @@ static int output;
static u64 bytes_written = 0;
static bool inject_build_ids;
+static bool inject_sched_stat;
static int perf_event__repipe_synth(union perf_event *event,
struct perf_session *session __used)
@@ -179,6 +182,85 @@ repipe:
return 0;
}
+struct event_entry
+{
+ struct list_head list;
+ u32 pid;
+ union perf_event event[0];
+};
+
+static LIST_HEAD(samples);
+
+static int perf_event__sched_stat(union perf_event *event,
+ struct perf_sample *sample,
+ struct perf_evsel *evsel __used,
+ struct perf_session *session)
+{
+ int type;
+ struct event *e;
+ const char *evname = NULL;
+ uint32_t size;
+ struct event_entry *ent;
+ union perf_event *event_sw = NULL;
+ struct perf_sample sample_sw;
+ int sched_process_exit;
+
+ size = event->header.size;
+
+ type = trace_parse_common_type(sample->raw_data);
+ e = trace_find_event(type);
+ if (e)
+ evname = e->name;
+
+ sched_process_exit = !strcmp(evname, "sched_process_exit");
+
+ if (!strcmp(evname, "sched_switch") || sched_process_exit) {
+ list_for_each_entry(ent, &samples, list)
+ if (sample->pid == ent->pid)
+ break;
+
+ if (&ent->list != &samples) {
+ list_del(&ent->list);
+ free(ent);
+ }
+
+ if (sched_process_exit)
+ return 0;
+
+ ent = malloc(size + sizeof(struct event_entry));
+ ent->pid = sample->pid;
+ memcpy(&ent->event, event, size);
+ list_add(&ent->list, &samples);
+ return 0;
+
+ } else if (!strncmp(evname, "sched_stat_", 11)) {
+ u32 pid;
+
+ pid = raw_field_value(e, "pid", sample->raw_data);
+
+ list_for_each_entry(ent, &samples, list) {
+ if (pid == ent->pid)
+ break;
+ }
+
+ if (&ent->list == &samples) {
+ pr_debug("Could not find sched_switch for pid %u\n", pid);
+ return 0;
+ }
+
+ event_sw = &ent->event[0];
+ perf_session__parse_sample(session, event_sw, &sample_sw);
+ sample_sw.period = sample->period;
+ sample_sw.time = sample->time;
+ perf_session__change_sample(session, event_sw, &sample_sw);
+ perf_event__repipe(event_sw, &sample_sw, session);
+ return 0;
+ }
+
+ perf_event__repipe(event, sample, session);
+
+ return 0;
+}
struct perf_event_ops inject_ops = {
.sample = perf_event__repipe_sample,
.mmap = perf_event__repipe,
@@ -214,6 +296,9 @@ static int __cmd_inject(void)
inject_ops.mmap = perf_event__repipe_mmap;
inject_ops.fork = perf_event__repipe_task;
inject_ops.tracing_data = perf_event__repipe_tracing_data;
+ } else if (inject_sched_stat) {
+ inject_ops.sample = perf_event__sched_stat;
+ inject_ops.ordered_samples = true;
}
session = perf_session__new(input_name, O_RDONLY, false, true, &inject_ops);
@@ -241,6 +326,8 @@ static const char * const report_usage[] = {
static const struct option options[] = {
OPT_BOOLEAN('b', "build-ids", &inject_build_ids,
"Inject build-ids into the output stream"),
+ OPT_BOOLEAN('s', "sched-stat", &inject_sched_stat,
+ "correct call-chains for shed-stat-*"),
OPT_STRING('i', "input", &input_name, "file",
"input file name"),
OPT_STRING('o', "output", &output_name, "file",
--
1.7.1
|
|
|
|
[PATCH 4/7] perf: teach "perf inject" to work with files [message #44260 is a reply to message #44255] |
Mon, 28 November 2011 09:03 |
Andrew Vagin
Messages: 28 Registered: November 2011
|
Junior Member |
|
|
Before this patch "perf inject" can only handle data from pipe.
I want to use "perf inject" for reworking events. Look at my following patch.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
---
tools/perf/builtin-inject.c | 33 +++++++++++++++++++++++++++++++--
1 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 8dfc12b..8df8b71 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -13,7 +13,12 @@
#include "util/parse-options.h"
-static char const *input_name = "-";
+static char const *input_name = "-";
+static const char *output_name = "-";
+static int pipe_output = 0;
+static int output;
+static u64 bytes_written = 0;
+
static bool inject_build_ids;
static int perf_event__repipe_synth(union perf_event *event,
@@ -25,12 +30,14 @@ static int perf_event__repipe_synth(union perf_event *event,
size = event->header.size;
while (size) {
- int ret = write(STDOUT_FILENO, buf, size);
+ int ret = write(output, buf, size);
if (ret < 0)
return -errno;
size -= ret;
buf += ret;
+
+ bytes_written += ret;
}
return 0;
@@ -213,8 +220,14 @@ static int __cmd_inject(void)
if (session == NULL)
return -ENOMEM;
+ if (!pipe_output)
+ lseek(output, session->header.data_offset, SEEK_SET);
ret = perf_session__process_events(session, &inject_ops);
+ if (!pipe_output) {
+ session->header.data_size += bytes_written;
+ perf_session__write_header(session, session->evlist, output, true);
+ }
perf_session__delete(session);
return ret;
@@ -228,6 +241,10 @@ static const char * const report_usage[] = {
static const struct option options[] = {
OPT_BOOLEAN('b', "build-ids", &inject_build_ids,
"Inject build-ids into the output stream"),
+ OPT_STRING('i', "input", &input_name, "file",
+ "input file name"),
+ OPT_STRING('o', "output", &output_name, "file",
+ "output file name"),
OPT_INCR('v', "verbose", &verbose,
"be more verbose (show build ids, etc)"),
OPT_END()
@@ -243,6 +260,18 @@ int cmd_inject(int argc, const char **argv, const char *prefix __used)
if (argc)
usage_with_options(report_usage, options);
+ if (!strcmp(output_name, "-")) {
+ pipe_output = 1;
+ output = STDOUT_FILENO;
+ } else {
+ output = open(output_name, O_CREAT| O_WRONLY | O_TRUNC,
+ S_IRUSR | S_IWUSR);
+ if (output < 0) {
+ perror("failed to create output file");
+ exit(-1);
+ }
+ }
+
if (symbol__init() < 0)
return -1;
--
1.7.1
|
|
|
[PATCH 7/7] event: add tracepoint for accounting block time [message #44261 is a reply to message #44255] |
Mon, 28 November 2011 09:03 |
Andrew Vagin
Messages: 28 Registered: November 2011
|
Junior Member |
|
|
This tracepoint shows how long a task is sleeping in uninterruptible state.
E.g.
It may show how long and where a mutex is waited.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
---
include/trace/events/sched.h | 7 +++++++
kernel/sched_fair.c | 2 ++
2 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 959ff18..be077cf 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -331,6 +331,13 @@ DEFINE_EVENT(sched_stat_template, sched_stat_iowait,
TP_ARGS(tsk, delay));
/*
+ * Tracepoint for accounting block time (time the task is in uninterruptible).
+ */
+DEFINE_EVENT(sched_stat_template, sched_stat_block,
+ TP_PROTO(struct task_struct *tsk, u64 delay),
+ TP_ARGS(tsk, delay));
+
+/*
* Tracepoint for accounting runtime (time the task is executing
* on a CPU).
*/
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 5c9e679..0d7b156 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -907,6 +907,8 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
trace_sched_stat_iowait(tsk, delta);
}
+ trace_sched_stat_block(tsk, delta);
+
/*
* Blocking time is in units of nanosecs, so shift by
* 20 to get a milliseconds-range estimation of the
--
1.7.1
|
|
|
Re: [PATCH 7/7] event: add tracepoint for accounting block time [message #44264 is a reply to message #44261] |
Mon, 28 November 2011 11:42 |
Peter Zijlstra
Messages: 61 Registered: September 2006
|
Member |
|
|
On Mon, 2011-11-28 at 12:03 +0300, Andrew Vagin wrote:
> This tracepoint shows how long a task is sleeping in uninterruptible state.
>
> E.g.
> It may show how long and where a mutex is waited.
Fair enough, makes one wonder how much it would take to make
account_scheduler_latency() go away..
> Signed-off-by: Andrew Vagin <avagin@openvz.org>
> ---
> include/trace/events/sched.h | 7 +++++++
> kernel/sched_fair.c | 2 ++
> 2 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index 959ff18..be077cf 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -331,6 +331,13 @@ DEFINE_EVENT(sched_stat_template, sched_stat_iowait,
> TP_ARGS(tsk, delay));
>
> /*
> + * Tracepoint for accounting block time (time the task is in uninterruptible).
> + */
> +DEFINE_EVENT(sched_stat_template, sched_stat_block,
> + TP_PROTO(struct task_struct *tsk, u64 delay),
> + TP_ARGS(tsk, delay));
> +
> +/*
> * Tracepoint for accounting runtime (time the task is executing
> * on a CPU).
> */
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 5c9e679..0d7b156 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -907,6 +907,8 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
> trace_sched_stat_iowait(tsk, delta);
> }
>
> + trace_sched_stat_block(tsk, delta);
> +
> /*
> * Blocking time is in units of nanosecs, so shift by
> * 20 to get a milliseconds-range estimation of the
|
|
|
|
|
|
|
Re: [PATCH 7/7] event: add tracepoint for accounting block time [message #44594 is a reply to message #44272] |
Mon, 28 November 2011 14:31 |
Arnaldo Carvalho de M[2]
Messages: 17 Registered: November 2011
|
Junior Member |
|
|
Em Mon, Nov 28, 2011 at 06:02:27AM -0800, Arjan van de Ven escreveu:
> On 11/28/2011 3:42 AM, Peter Zijlstra wrote:
> > On Mon, 2011-11-28 at 12:03 +0300, Andrew Vagin wrote:
> >> This tracepoint shows how long a task is sleeping in uninterruptible state.
> >> E.g.
> >> It may show how long and where a mutex is waited.
> > Fair enough, makes one wonder how much it would take to make
> > account_scheduler_latency() go away..
> I would *love* to switch latencytop to using trace points / perf events.
> But as long as this just means I get yelled at more for using "internal
> ABIs" and the like at various occasions, I'm rather hesitant to turn
> more tools into using perf.
Have you read the the discussion with Robert Richter about that?
https://lkml.org/lkml/2011/11/24/237
perf_evlist is what you call perf_bundle and perf_evsel is what you call
perf_event in powertop.
That part of the API should be ok for wider use and is in fact exported
in the python binding.
I'm rearranging my tmp.perf/trace4 branch into perf/core to ask for
merging, that is a step in the direction of having the 'perf tool' class
added to what will become libperf.so.
I almost embarked into an attempt to make powertop use it, but there are
other stuff to do first, like making the tracepoint based tools already
in perf stop using long if-strcmp(evname, "foo")-call-process_foo_event
(we have IDs, a hash, better use that, etc).
The 'strace' will be Ingo & tglx's 'trace' tool using these changes,
should be done in a way that shows how to use the resulting abstractions
in libperf.so, that together with the other tools already in the kernel
(kmem, lock, etc).
> (and we all know what the next steps to resolve this are, they just have
> not happened yet; not all hope is lost)
Sure its not :-)
- Arnaldo
|
|
|
Re: [PATCH 7/7] event: add tracepoint for accounting block time [message #44595 is a reply to message #44273] |
Mon, 28 November 2011 14:59 |
Arnaldo Carvalho de M[2]
Messages: 17 Registered: November 2011
|
Junior Member |
|
|
Em Mon, Nov 28, 2011 at 06:43:35AM -0800, Arjan van de Ven escreveu:
> > perf_evlist is what you call perf_bundle and perf_evsel is what you call
> > perf_event in powertop.
> > That part of the API should be ok for wider use and is in fact exported
> > in the python binding.
>
> I don't care about the snake language.
I haven't suggested that you use any other language than C.
What I tried to say was that the way the python binding was written
provides an initial libperf.so, just that it is not exposed yet.
That is, the subset of C files there is (or should be) untangled from
all the rest of tools/perf.
> frankly all that's missing is a "safe" accessor library as Steve has
> promised will appear.
> that library really needs to be a proper shared library and not come
> from/with the kernel package, so that distributions can independently
> package it properly.
I can probably have something like:
[acme@emilia linux]$ make help | grep perf
perf-tar-src-pkg - Build perf-3.2.0-rc1.tar source tarball
perf-targz-src-pkg - Build perf-3.2.0-rc1.tar.gz source tarball
perf-tarbz2-src-pkg - Build perf-3.2.0-rc1.tar.bz2 source tarball
perf-tarxz-src-pkg - Build perf-3.2.0-rc1.tar.xz source tarball
[acme@emilia linux]$
i.e.:
$ make libperf-tar-src.pkg
> (and this obviously needs to at least look at the things that the
> systemd guys pointed us at at the kernel summit)
Yes, I'll take that into account.
> I'm not interested if the code for a library is somewhere deep in the
> kernel source code, not installed by default in distros (or tied in with
> loads of other mess) and/or uses the kernel makefiles/etc like perf does.
Right, I'm working on that, don't know when it will get to the point
you'd consider using it, but I'll try to address your concerns in the
process.
- Arnaldo
|
|
|
|
Re: [PATCH 3/7] perf: add ability to change event according to sample (v2) [message #44598 is a reply to message #44256] |
Tue, 06 December 2011 14:19 |
Arnaldo Carvalho de M[2]
Messages: 17 Registered: November 2011
|
Junior Member |
|
|
Em Mon, Nov 28, 2011 at 12:03:31PM +0300, Andrew Vagin escreveu:
> It's opposition of perf_session__parse_sample.
>
> v2: fixed mistakes which David Arhen found
Ok, I'm taking this one, David, can I added an 'Acked-by: you"? Or even
"reviewed-by:" ?
I'm just changing 'data' to 'sample', data is way to vague, I kept it
for a while in the past just to reduce patch size, but this is something
completely new, so better use 'sample'.
- Arnaldo
> Signed-off-by: Andrew Vagin <avagin@openvz.org>
> ---
> tools/perf/util/event.h | 2 +
> tools/perf/util/evsel.c | 74 +++++++++++++++++++++++++++++++++++++++++++++
> tools/perf/util/session.h | 9 +++++
> 3 files changed, 85 insertions(+), 0 deletions(-)
>
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 357a85b..0493101 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -187,5 +187,7 @@ const char *perf_event__name(unsigned int id);
> int perf_event__parse_sample(const union perf_event *event, u64 type,
> int sample_size, bool sample_id_all,
> struct perf_sample *sample, bool swapped);
> +int perf_event__change_sample(union perf_event *event, u64 type,
> + const struct perf_sample *data, bool swapped);
>
> #endif /* __PERF_RECORD_H */
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index e426264..d697568 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -494,3 +494,77 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
>
> return 0;
> }
> +
> +int perf_event__change_sample(union perf_event *event, u64 type,
> + const struct perf_sample *data, bool swapped)
> +{
> + u64 *array;
> +
> + /*
> + * used for cross-endian analysis. See git commit 65014ab3
> + * for why this goofiness is needed.
> + */
> + union {
> + u64 val64;
> + u32 val32[2];
> + } u;
> +
> + array = event->sample.array;
> +
> + if (type & PERF_SAMPLE_IP) {
> + event->ip.ip = data->ip;
> + array++;
> + }
> +
> + if (type & PERF_SAMPLE_TID) {
> + u.val32[0] = data->pid;
> + u.val32[1] = data->tid;
> + if (swapped) {
> + /* undo swap of u64, then swap on individual u32s */
> + u.val32[0] = bswap_32(u.val32[0]);
> + u.val32[1] = bswap_32(u.val32[1]);
> + u.val64 = bswap_64(u.val64);
> + }
> +
> + *array = u.val64;
> + array++;
> + }
> +
> + if (type & PERF_SAMPLE_TIME) {
> + *array = data->time;
> + array++;
> + }
> +
> + if (type & PERF_SAMPLE_ADDR) {
> + *array = data->addr;
> + array++;
> + }
> +
> + if (type & PERF_SAMPLE_ID) {
> + *array = data->id;
> + array++;
> + }
> +
> + if (type & PERF_SAMPLE_STREAM_ID) {
> + *array = data->stream_id;
> + array++;
> + }
> +
> + if (type & PERF_SAMPLE_CPU) {
> + u.val32[0] = data->cpu;
> + if (swapped) {
> + /* undo swap of u64, then swap on individual u32s */
> + u.val32[0] = bswap_32(u.val32[0]);
> + u.val64 = bswap_64(u.val64);
> + }
> + *array = u.val64;
> + array++;
> + }
> +
> + if (type & PERF_SAMPLE_PERIOD) {
> + *array = data->period;
> + array++;
> + }
> +
> + return 0;
> +}
> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
> index 6e393c9..444f121 100644
> --- a/tools/perf/util/session.h
> +++ b/tools/perf/util/session.h
> @@ -167,6 +167,15 @@ static inline int perf_session__parse_sample(struct perf_session *session,
> session->header.needs_swap);
> }
>
> +static inline int perf_session__change_sample(struct perf_session *session,
> + union perf_event *event,
> + const struct perf_sample *sample)
> +{
> + return perf_event__change_sample(event, session->sample_type,
> + sample,
> + session->header.needs_swap);
> +}
> +
> struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
> unsigned int type);
>
> --
> 1.7.1
|
|
|
Re: [PATCH 3/7] perf: add ability to change event according to sample (v2) [message #44599 is a reply to message #44598] |
Tue, 06 December 2011 14:24 |
Arnaldo Carvalho de M[2]
Messages: 17 Registered: November 2011
|
Junior Member |
|
|
Em Tue, Dec 06, 2011 at 12:19:42PM -0200, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Nov 28, 2011 at 12:03:31PM +0300, Andrew Vagin escreveu:
> > It's opposition of perf_session__parse_sample.
> >
> > v2: fixed mistakes which David Arhen found
>
> Ok, I'm taking this one, David, can I added an 'Acked-by: you"? Or even
> "reviewed-by:" ?
>
> I'm just changing 'data' to 'sample', data is way to vague, I kept it
> for a while in the past just to reduce patch size, but this is something
> completely new, so better use 'sample'.
Also I'm changing it from perf_event__change_sample() to
perf_event__synthesize_sample() as that is what is happening, you're not
taking some perf_sample and changing it just a bit, but synthesizing it
completely.
Since synthesize is used elsewhere when we synthesize PERF_RECORD_MMAP,
etc for existing threads, I'll use it here too.
> - Arnaldo
>
> > Signed-off-by: Andrew Vagin <avagin@openvz.org>
> > ---
> > tools/perf/util/event.h | 2 +
> > tools/perf/util/evsel.c | 74 +++++++++++++++++++++++++++++++++++++++++++++
> > tools/perf/util/session.h | 9 +++++
> > 3 files changed, 85 insertions(+), 0 deletions(-)
> >
> > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> > index 357a85b..0493101 100644
> > --- a/tools/perf/util/event.h
> > +++ b/tools/perf/util/event.h
> > @@ -187,5 +187,7 @@ const char *perf_event__name(unsigned int id);
> > int perf_event__parse_sample(const union perf_event *event, u64 type,
> > int sample_size, bool sample_id_all,
> > struct perf_sample *sample, bool swapped);
> > +int perf_event__change_sample(union perf_event *event, u64 type,
> > + const struct perf_sample *data, bool swapped);
> >
> > #endif /* __PERF_RECORD_H */
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index e426264..d697568 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -494,3 +494,77 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
> >
> > return 0;
> > }
> > +
> > +int perf_event__change_sample(union perf_event *event, u64 type,
> > + const struct perf_sample *data, bool swapped)
> > +{
> > + u64 *array;
> > +
> > + /*
> > + * used for cross-endian analysis. See git commit 65014ab3
> > + * for why this goofiness is needed.
> > + */
> > + union {
> > + u64 val64;
> > + u32 val32[2];
> > + } u;
> > +
> > + array = event->sample.array;
> > +
> > + if (type & PERF_SAMPLE_IP) {
> > + event->ip.ip = data->ip;
> > + array++;
> > + }
> > +
> > + if (type & PERF_SAMPLE_TID) {
> > + u.val32[0] = data->pid;
> > + u.val32[1] = data->tid;
> > + if (swapped) {
> > + /* undo swap of u64, then swap on individual u32s */
> > + u.val32[0] = bswap_32(u.val32[0]);
> > + u.val32[1] = bswap_32(u.val32[1]);
> > + u.val64 = bswap_64(u.val64);
> > + }
> > +
> > + *array = u.val64;
> > + array++;
> > + }
> > +
> > + if (type & PERF_SAMPLE_TIME) {
> > + *array = data->time;
> > + array++;
> > + }
> > +
> > + if (type & PERF_SAMPLE_ADDR) {
> > + *array = data->addr;
> > + array++;
> > + }
> > +
> > + if (type & PERF_SAMPLE_ID) {
> > + *array = data->id;
> > + array++;
> > + }
> > +
> > + if (type & PERF_SAMPLE_STREAM_ID) {
> > + *array = data->stream_id;
> > + array++;
> > + }
> > +
> > + if (type & PERF_SAMPLE_CPU) {
> > + u.val32[0] = data->cpu;
> > + if (swapped) {
> > + /* undo swap of u64, then swap on individual u32s */
> > + u.val32[0] = bswap_32(u.val32[0]);
> > + u.val64 = bswap_64(u.val64);
> > + }
> > + *array = u.val64;
> > + array++;
> > + }
> > +
> > + if (type & PERF_SAMPLE_PERIOD) {
> > + *array = data->period;
> > + array++;
> > + }
> > +
> > + return 0;
> > +}
> > diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
> > index 6e393c9..444f121 100644
> > --- a/tools/perf/util/session.h
> > +++ b/tools/perf/util/session.h
> > @@ -167,6 +167,15 @@ static inline int perf_session__parse_sample(struct perf_session *session,
> > session->header.needs_swap);
> > }
> >
> > +static inline int perf_session__change_sample(struct perf_session *session,
> > + union perf_event *event,
> > + const struct perf_sample *sample)
> > +{
> > + return perf_event__change_sample(event, session->sample_type,
> > + sample,
> > + session->header.needs_swap);
> > +}
> > +
> > struct perf_evsel *perf_session__find_first_evtype(struct perf_session *session,
> > unsigned int type);
> >
> > --
> > 1.7.1
|
|
|
Re: [PATCH 2/7] perf: add ability to record event period [message #44602 is a reply to message #44257] |
Tue, 06 December 2011 14:02 |
Arnaldo Carvalho de M[2]
Messages: 17 Registered: November 2011
|
Junior Member |
|
|
Em Mon, Nov 28, 2011 at 12:03:30PM +0300, Andrew Vagin escreveu:
> Signed-off-by: Andrew Vagin <avagin@openvz.org>
> ---
> tools/perf/builtin-record.c | 5 +++++
> 1 files changed, 5 insertions(+), 0 deletions(-)
This patch doesn't apply in current tip/perf/core as this logic was
moved to perf_evsel/perf_evlist, so that it can be used by other tools.
Please add the 'period' variable to struct perf_record_opts and the
sample_type or'ing to perf_evsel__config().
I suggest you use https://github.com/acmel/linux/commits/perf/core for
that.
- Arnaldo
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 6ab58cc..e3b7fc4 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -62,6 +62,7 @@ static bool no_samples = false;
> static bool sample_address = false;
> static bool sample_time = false;
> static bool no_buildid = false;
> +static bool period = false;
> static bool no_buildid_cache = false;
> static struct perf_evlist *evsel_list;
>
> @@ -185,6 +186,9 @@ static void config_attr(struct perf_evsel *evsel, struct perf_evlist *evlist)
> if (evlist->nr_entries > 1)
> attr->sample_type |= PERF_SAMPLE_ID;
>
> + if (period)
> + attr->sample_type |= PERF_SAMPLE_PERIOD;
> +
> /*
> * We default some events to a 1 default interval. But keep
> * it a weak assumption overridable by the user.
> @@ -803,6 +807,7 @@ const struct option record_options[] = {
> OPT_BOOLEAN('d', "data", &sample_address,
> "Sample addresses"),
> OPT_BOOLEAN('T', "timestamp", &sample_time, "Sample timestamps"),
> + OPT_BOOLEAN('P', "period", &period, "Sample period"),
> OPT_BOOLEAN('n', "no-samples", &no_samples,
> "don't sample"),
> OPT_BOOLEAN('N', "no-buildid-cache", &no_buildid_cache,
> --
> 1.7.1
|
|
|
Re: [PATCH 0/7] Profiling sleep times (v3) [message #44608 is a reply to message #44405] |
Tue, 06 December 2011 13:45 |
Arnaldo Carvalho de M[2]
Messages: 17 Registered: November 2011
|
Junior Member |
|
|
Em Tue, Dec 06, 2011 at 11:15:07AM +0400, Andrey Vagin escreveu:
> Hello Arnaldo,
>
> I found, that you commit patches for perf. Could you review this part?
>
> Arun Sharma said, that the second versions of patches works ok for him.
> (Arun is the first user of this functionality after me.)
I'm going to check if the patches still apply and then review,
- Arnaldo
> Thanks.
>
> On 11/28/2011 01:03 PM, Andrew Vagin wrote:
> >Do you want to know where your code waits locks for a long time?
> >Yes! It's for you. This feature helps you to find bottlenecks.
> >
> >It's not artificial task. Once one of my colleague was investigating a
> >scalability problem. He pressed sysrq-t some times and tried to merge
> >call-chains by hand. But perf can do that.
> >
> >Problem:
> >The problem is that events sched_stat_* contain call-chains of
> >non-target tasks.
> >About month ago I sent series of patches:
> >
> >[PATCH 0/3] trace: add ability to collect call chains of non current task.
> >
> >Peter and Frederic explained me, that this solve isn't good and will be
> >better to make it in userspace.
> >
> >Now it's in userspace. This series expands "perf inject" to be able to
> >merge sched_switch events and sched_stat* events. sched_switch events
> >contain correct call-chains and sched_stat contains a correct time
> >slices.
> >
> >v2:
> >* Removed all known issues. Now it works completely.
> >* Improved usability of sched-stat scripts according with Arun's comments.
> >v3: fixed accoding to comments from David Ahem
> >
> >Andrew Vagin (7):
> > perf: use event_name() to get an event name
> > perf: add ability to record event period
> > perf: add ability to change event according to sample (v2)
> > perf: teach "perf inject" to work with files
> > perf: teach perf inject to merge sched_stat_* and sched_switch events
> > perf: add scripts for profiling sleep times (v2)
> > event: add tracepoint for accounting block time
> >
> > include/trace/events/sched.h | 7 ++
> > kernel/sched_fair.c | 2 +
> > tools/perf/builtin-inject.c | 120 ++++++++++++++++++++++-
> > tools/perf/builtin-record.c | 5 +
> > tools/perf/scripts/python/bin/sched-stat-record | 65 ++++++++++++
> > tools/perf/scripts/python/bin/sched-stat-report | 5 +
> > tools/perf/util/event.h | 2 +
> > tools/perf/util/evsel.c | 74 ++++++++++++++
> > tools/perf/util/header.c | 2 +-
> > tools/perf/util/session.h | 9 ++
> > 10 files changed, 288 insertions(+), 3 deletions(-)
> > create mode 100644 tools/perf/scripts/python/bin/sched-stat-record
> > create mode 100644 tools/perf/scripts/python/bin/sched-stat-report
> >
|
|
|
Re: [PATCH 0/7] Profiling sleep times (v3) [message #44612 is a reply to message #44405] |
Wed, 07 December 2011 20:33 |
Arun Sharma
Messages: 2 Registered: December 2011
|
Junior Member |
|
|
On 12/5/11 11:15 PM, Andrey Vagin wrote:
> Arun Sharma said, that the second versions of patches works ok for him.
> (Arun is the first user of this functionality after me.)
Yes - Andrey's patches (v2) have been functional for me when used via:
perf record -agP -e sched:sched_switch --filter "prev_state == 1 ||
prev_state == 2" -e sched:sched_stat_sleep,sched:sched_stat_iowait --
sleep 3
mv perf.data{,.old}; perf inject -s -i perf.data.old -o perf.data
perf report --stdio -g graph --sort pid -C command-name
There are two major issues though:
* The above command lines cause us to collect way too much data and perf
can't keep up on a busy server
Eg:
Warning:
Processed 55182 events and lost 13 chunks!
Check IO/CPU overload!
* Requires root access
I suspect there is a way to stash the delay information available at
enqueue_sleeper() into the task_struct (or some place else) and make it
available at the sched:sched_switch tracepoint. This would solve both of
the problems above, but I don't have a patch yet.
-Arun
|
|
|
Re: [PATCH 3/7] perf: add ability to change event according to sample (v2) [message #44620 is a reply to message #44598] |
Tue, 06 December 2011 14:57 |
David Ahern
Messages: 2 Registered: December 2011
|
Junior Member |
|
|
On 12/06/2011 07:19 AM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Nov 28, 2011 at 12:03:31PM +0300, Andrew Vagin escreveu:
>> It's opposition of perf_session__parse_sample.
>>
>> v2: fixed mistakes which David Arhen found
>
> Ok, I'm taking this one, David, can I added an 'Acked-by: you"? Or even
> "reviewed-by:" ?
>
> I'm just changing 'data' to 'sample', data is way to vague, I kept it
> for a while in the past just to reduce patch size, but this is something
> completely new, so better use 'sample'.
>
> - Arnaldo
>
>> Signed-off-by: Andrew Vagin <avagin@openvz.org>
>> ---
>> tools/perf/util/event.h | 2 +
>> tools/perf/util/evsel.c | 74 +++++++++++++++++++++++++++++++++++++++++++++
>> tools/perf/util/session.h | 9 +++++
>> 3 files changed, 85 insertions(+), 0 deletions(-)
>>
>> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
>> index 357a85b..0493101 100644
>> --- a/tools/perf/util/event.h
>> +++ b/tools/perf/util/event.h
>> @@ -187,5 +187,7 @@ const char *perf_event__name(unsigned int id);
>> int perf_event__parse_sample(const union perf_event *event, u64 type,
>> int sample_size, bool sample_id_all,
>> struct perf_sample *sample, bool swapped);
>> +int perf_event__change_sample(union perf_event *event, u64 type,
>> + const struct perf_sample *data, bool swapped);
>>
>> #endif /* __PERF_RECORD_H */
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index e426264..d697568 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -494,3 +494,77 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
>>
>> return 0;
>> }
>> +
>> +int perf_event__change_sample(union perf_event *event, u64 type,
>> + const struct perf_sample *data, bool swapped)
>> +{
>> + u64 *array;
>> +
>> + /*
>> + * used for cross-endian analysis. See git commit 65014ab3
>> + * for why this goofiness is needed.
>> + */
>> + union {
>> + u64 val64;
>> + u32 val32[2];
>> + } u;
>> +
>> + array = event->sample.array;
>> +
>> + if (type & PERF_SAMPLE_IP) {
>> + event->ip.ip = data->ip;
>> + array++;
>> + }
>> +
>> + if (type & PERF_SAMPLE_TID) {
>> + u.val32[0] = data->pid;
>> + u.val32[1] = data->tid;
>> + if (swapped) {
>> + /* undo swap of u64, then swap on individual u32s */
Comment is from the parse sample code; this is the inverse of that code
so the comment needs to be updated here.
>> + u.val32[0] = bswap_32(u.val32[0]);
>> + u.val32[1] = bswap_32(u.val32[1]);
>> + u.val64 = bswap_64(u.val64);
>> + }
>> +
>> + *array = u.val64;
>> + array++;
>> + }
>> +
>> + if (type & PERF_SAMPLE_TIME) {
>> + *array = data->time;
>> + array++;
>> + }
>> +
>> + if (type & PERF_SAMPLE_ADDR) {
>> + *array = data->addr;
>> + array++;
>> + }
>> +
>> + if (type & PERF_SAMPLE_ID) {
>> + *array = data->id;
>> + array++;
>> + }
>> +
>> + if (type & PERF_SAMPLE_STREAM_ID) {
>> + *array = data->stream_id;
>> + array++;
>> + }
>> +
>> + if (type & PERF_SAMPLE_CPU) {
>> + u.val32[0] = data->cpu;
>> + if (swapped) {
>> + /* undo swap of u64, then swap on individual u32s */
Ditto on that comment.
I did not test it, but the logic is in the inverse of parse_sample so it
should be correct.
Arnaldo: you could use remove the 2 comment lines on the commit since
there are no logic impacts.
Reviewed-by: David Ahern <dsahern@gmail.com>
David
|
|
|
Goto Forum:
Current Time: Fri Oct 18 09:40:45 GMT 2024
Total time taken to generate the page: 0.09365 seconds
|