1===================================================
2Adding reference counters (krefs) to kernel objects
3===================================================
4
5:Author: Corey Minyard <minyard@acm.org>
6:Author: Thomas Hellstrom <thellstrom@vmware.com>
7
8A lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and
9presentation on krefs, which can be found at:
10
11  - http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf
12  - http://www.kroah.com/linux/talks/ols_2004_kref_talk/
13
14Introduction
15============
16
17krefs allow you to add reference counters to your objects.  If you
18have objects that are used in multiple places and passed around, and
19you don't have refcounts, your code is almost certainly broken.  If
20you want refcounts, krefs are the way to go.
21
22To use a kref, add one to your data structures like::
23
24    struct my_data
25    {
26	.
27	.
28	struct kref refcount;
29	.
30	.
31    };
32
33The kref can occur anywhere within the data structure.
34
35Initialization
36==============
37
38You must initialize the kref after you allocate it.  To do this, call
39kref_init as so::
40
41     struct my_data *data;
42
43     data = kmalloc(sizeof(*data), GFP_KERNEL);
44     if (!data)
45            return -ENOMEM;
46     kref_init(&data->refcount);
47
48This sets the refcount in the kref to 1.
49
50Kref rules
51==========
52
53Once you have an initialized kref, you must follow the following
54rules:
55
561) If you make a non-temporary copy of a pointer, especially if
57   it can be passed to another thread of execution, you must
58   increment the refcount with kref_get() before passing it off::
59
60       kref_get(&data->refcount);
61
62   If you already have a valid pointer to a kref-ed structure (the
63   refcount cannot go to zero) you may do this without a lock.
64
652) When you are done with a pointer, you must call kref_put()::
66
67       kref_put(&data->refcount, data_release);
68
69   If this is the last reference to the pointer, the release
70   routine will be called.  If the code never tries to get
71   a valid pointer to a kref-ed structure without already
72   holding a valid pointer, it is safe to do this without
73   a lock.
74
753) If the code attempts to gain a reference to a kref-ed structure
76   without already holding a valid pointer, it must serialize access
77   where a kref_put() cannot occur during the kref_get(), and the
78   structure must remain valid during the kref_get().
79
80For example, if you allocate some data and then pass it to another
81thread to process::
82
83    void data_release(struct kref *ref)
84    {
85	struct my_data *data = container_of(ref, struct my_data, refcount);
86	kfree(data);
87    }
88
89    void more_data_handling(void *cb_data)
90    {
91	struct my_data *data = cb_data;
92	.
93	. do stuff with data here
94	.
95	kref_put(&data->refcount, data_release);
96    }
97
98    int my_data_handler(void)
99    {
100	int rv = 0;
101	struct my_data *data;
102	struct task_struct *task;
103	data = kmalloc(sizeof(*data), GFP_KERNEL);
104	if (!data)
105		return -ENOMEM;
106	kref_init(&data->refcount);
107
108	kref_get(&data->refcount);
109	task = kthread_run(more_data_handling, data, "more_data_handling");
110	if (task == ERR_PTR(-ENOMEM)) {
111		rv = -ENOMEM;
112	        kref_put(&data->refcount, data_release);
113		goto out;
114	}
115
116	.
117	. do stuff with data here
118	.
119    out:
120	kref_put(&data->refcount, data_release);
121	return rv;
122    }
123
124This way, it doesn't matter what order the two threads handle the
125data, the kref_put() handles knowing when the data is not referenced
126any more and releasing it.  The kref_get() does not require a lock,
127since we already have a valid pointer that we own a refcount for.  The
128put needs no lock because nothing tries to get the data without
129already holding a pointer.
130
131Note that the "before" in rule 1 is very important.  You should never
132do something like::
133
134	task = kthread_run(more_data_handling, data, "more_data_handling");
135	if (task == ERR_PTR(-ENOMEM)) {
136		rv = -ENOMEM;
137		goto out;
138	} else
139		/* BAD BAD BAD - get is after the handoff */
140		kref_get(&data->refcount);
141
142Don't assume you know what you are doing and use the above construct.
143First of all, you may not know what you are doing.  Second, you may
144know what you are doing (there are some situations where locking is
145involved where the above may be legal) but someone else who doesn't
146know what they are doing may change the code or copy the code.  It's
147bad style.  Don't do it.
148
149There are some situations where you can optimize the gets and puts.
150For instance, if you are done with an object and enqueuing it for
151something else or passing it off to something else, there is no reason
152to do a get then a put::
153
154	/* Silly extra get and put */
155	kref_get(&obj->ref);
156	enqueue(obj);
157	kref_put(&obj->ref, obj_cleanup);
158
159Just do the enqueue.  A comment about this is always welcome::
160
161	enqueue(obj);
162	/* We are done with obj, so we pass our refcount off
163	   to the queue.  DON'T TOUCH obj AFTER HERE! */
164
165The last rule (rule 3) is the nastiest one to handle.  Say, for
166instance, you have a list of items that are each kref-ed, and you wish
167to get the first one.  You can't just pull the first item off the list
168and kref_get() it.  That violates rule 3 because you are not already
169holding a valid pointer.  You must add a mutex (or some other lock).
170For instance::
171
172	static DEFINE_MUTEX(mutex);
173	static LIST_HEAD(q);
174	struct my_data
175	{
176		struct kref      refcount;
177		struct list_head link;
178	};
179
180	static struct my_data *get_entry()
181	{
182		struct my_data *entry = NULL;
183		mutex_lock(&mutex);
184		if (!list_empty(&q)) {
185			entry = container_of(q.next, struct my_data, link);
186			kref_get(&entry->refcount);
187		}
188		mutex_unlock(&mutex);
189		return entry;
190	}
191
192	static void release_entry(struct kref *ref)
193	{
194		struct my_data *entry = container_of(ref, struct my_data, refcount);
195
196		list_del(&entry->link);
197		kfree(entry);
198	}
199
200	static void put_entry(struct my_data *entry)
201	{
202		mutex_lock(&mutex);
203		kref_put(&entry->refcount, release_entry);
204		mutex_unlock(&mutex);
205	}
206
207The kref_put() return value is useful if you do not want to hold the
208lock during the whole release operation.  Say you didn't want to call
209kfree() with the lock held in the example above (since it is kind of
210pointless to do so).  You could use kref_put() as follows::
211
212	static void release_entry(struct kref *ref)
213	{
214		/* All work is done after the return from kref_put(). */
215	}
216
217	static void put_entry(struct my_data *entry)
218	{
219		mutex_lock(&mutex);
220		if (kref_put(&entry->refcount, release_entry)) {
221			list_del(&entry->link);
222			mutex_unlock(&mutex);
223			kfree(entry);
224		} else
225			mutex_unlock(&mutex);
226	}
227
228This is really more useful if you have to call other routines as part
229of the free operations that could take a long time or might claim the
230same lock.  Note that doing everything in the release routine is still
231preferred as it is a little neater.
232
233The above example could also be optimized using kref_get_unless_zero() in
234the following way::
235
236	static struct my_data *get_entry()
237	{
238		struct my_data *entry = NULL;
239		mutex_lock(&mutex);
240		if (!list_empty(&q)) {
241			entry = container_of(q.next, struct my_data, link);
242			if (!kref_get_unless_zero(&entry->refcount))
243				entry = NULL;
244		}
245		mutex_unlock(&mutex);
246		return entry;
247	}
248
249	static void release_entry(struct kref *ref)
250	{
251		struct my_data *entry = container_of(ref, struct my_data, refcount);
252
253		mutex_lock(&mutex);
254		list_del(&entry->link);
255		mutex_unlock(&mutex);
256		kfree(entry);
257	}
258
259	static void put_entry(struct my_data *entry)
260	{
261		kref_put(&entry->refcount, release_entry);
262	}
263
264Which is useful to remove the mutex lock around kref_put() in put_entry(), but
265it's important that kref_get_unless_zero is enclosed in the same critical
266section that finds the entry in the lookup table,
267otherwise kref_get_unless_zero may reference already freed memory.
268Note that it is illegal to use kref_get_unless_zero without checking its
269return value. If you are sure (by already having a valid pointer) that
270kref_get_unless_zero() will return true, then use kref_get() instead.
271
272Krefs and RCU
273=============
274
275The function kref_get_unless_zero also makes it possible to use rcu
276locking for lookups in the above example::
277
278	struct my_data
279	{
280		struct rcu_head rhead;
281		.
282		struct kref refcount;
283		.
284		.
285	};
286
287	static struct my_data *get_entry_rcu()
288	{
289		struct my_data *entry = NULL;
290		rcu_read_lock();
291		if (!list_empty(&q)) {
292			entry = container_of(q.next, struct my_data, link);
293			if (!kref_get_unless_zero(&entry->refcount))
294				entry = NULL;
295		}
296		rcu_read_unlock();
297		return entry;
298	}
299
300	static void release_entry_rcu(struct kref *ref)
301	{
302		struct my_data *entry = container_of(ref, struct my_data, refcount);
303
304		mutex_lock(&mutex);
305		list_del_rcu(&entry->link);
306		mutex_unlock(&mutex);
307		kfree_rcu(entry, rhead);
308	}
309
310	static void put_entry(struct my_data *entry)
311	{
312		kref_put(&entry->refcount, release_entry_rcu);
313	}
314
315But note that the struct kref member needs to remain in valid memory for a
316rcu grace period after release_entry_rcu was called. That can be accomplished
317by using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu()
318before using kfree, but note that synchronize_rcu() may sleep for a
319substantial amount of time.
320